Evolution, Rationality and Cognition
Evolutionary thinking has expanded in the latter decades, spreading from its trad...
12 downloads
688 Views
839KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Evolution, Rationality and Cognition
Evolutionary thinking has expanded in the latter decades, spreading from its traditional stronghold – the explanation of speciation and adaptation in biology – to new domains including the human sciences. The essays in this collection attest to the illuminating power of evolutionary thinking when applied to the understanding of the human mind. The contributors to Evolution, Rationality and Cognition use an evolutionary standpoint to approach the nature of the human mind, including both cognitive and behavioural functions. Cognitive science is by its nature an interdisciplinary subject and the essays use a variety of disciplines including the philosophy of science, the philosophy of mind, game theory, robotics and computational neuroanatomy to investigate the workings of the mind. The topics covered by the essays range from general methodological issues to long-standing philosophical problems such as how rational human beings actually are. This book will be of interest across a number of fields, including philosophy, evolutionary theory and cognitive science. António Zilhão is Associate Professor in Philosophy at the University of Lisbon.
Routledge studies in the philosophy of science
1 Cognition, Evolution and Rationality A cognitive science for the twenty-first century Edited by António Zilhão
Evolution, Rationality and Cognition A cognitive science for the twenty-first century Edited by António Zilhão
First published 2005 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Simultaneously published in the USA and Canada by Routledge 270 Madison Ave, New York, NY 10016 Routledge is an imprint of the Taylor & Francis Group
This edition published in the Taylor & Francis e-Library, 2006. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” © 2005 António Zilhão editorial matter and selection; the contributors their contributions All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book has been requested
ISBN 0-203-01291-7 Master e-book ISBN
ISBN 0-415-36260-1 (Print Edition)
Contents
List of illustrations List of contributors Preface
vii ix x
Editor’s introduction
1
ANTÓNIO ZILHÃO
PART I
Evolution 1
Intelligent design is untestable: what about natural selection?
15 17
ELLIOTT SOBER
2
Social learning and the Baldwin effect
40
DAVID PAPINEAU
3
Signals, evolution, and the explanatory power of transient information
61
BRIAN SKYRMS
PART II
Rationality
83
4
85
Untangling the evolution of mental representation PETER GODFREY-SMITH
vi
Contents
5
Innateness and brain-wiring optimization: non-genomic nativism
103
CHRISTOPHER CHERNIAK
6
Evolution and the origins of the rational
113
INMAN HARVEY
PART III
Cognition 7
How to get around by mind and body: spatial thought, spatial action
133 135
BARBARA TVERSKY
8
Simulation and the evolution of mindreading
148
CHANDRA SEKHAR SRIPADA AND ALVIN I. GOLDMAN
9
Enhancing and augmenting human reasoning
162
TIM VAN GELDER
Index
182
Illustrations
Figures 1.1 In the process of SPD, the population begins at t0 with a sharp value 1.2 In the process of PD, the population begins at t0 with a sharp value 1.3 Which hypothesis, SPD or PD, confers the higher probability on the observed present phenotype? 1.4 Given the observed fur length of present day polar bears and their close relatives, what is the best estimate of the trait values of the ancestors A1, A2, . . ., A5? 1.5 Two problems in which one has to estimate the character of an ancestor, based on the observed value of one or more descendants 3.1 Aumann’s stag hunt 3.2 Krep’s stag hunt 3.3 Evolution of information 3.4 Evolution of correlation I 3.5 Evolution of correlation II 3.6 Evolution of bargaining behaviors 5.1 Complex biological structure arising directly from basic physics 5.2 Runscreen for “Tensarama,” a force-directed placement algorithm for optimizing layout of ganglia of the nematode Caenorhabditis elegans 5.3 Turing machine program that has been the contender for title of five-state “busy-beaver” – maximally productive TM program – without challenge for over a decade
24 25 26 29 30 65 67 75 76 77 77 105 106 108
viii Illustrations
Tables 1.1 Which hypothesis, Design or Chance, confers the greater probability on the observation that the watch is made of metal and glass? 1.2 Which hypothesis, Design or Chance, confers the greater probability on the observation that vertebrates have a camera eye? 1.3 If polar bears now have fur that is 10 centimeters long, does the hypothesis of SPD or the hypothesis of PD render that outcome more probable? 1.4 When a population evolves from its initial state I to its present state P, how will that trajectory be related to the putative optimal phenotype O specified by the hypothesis of SPD?
20 21 23
27
Contributors
Christopher Cherniak is Professor of Philosophy at the Department of Philosophy – Committee on History and Philosophy of Science of the University of Maryland at College Park. Peter Godfrey-Smith is Associate Professor of Philosophy at Harvard University. Alvin I. Goldman is Professor of Philosophy and Research Scientist in Cognitive Science at Rutgers University. Inman Harvey is Senior Lecturer at the School of Cognitive and Computing Sciences of the University of Sussex and Senior Researcher at the Centre for Computational Neuroscience and Robotics at the same University. David Papineau is Professor of Philosophy of Science at King’s College London. Brian Skyrms is Professor of Logic and Philosophy of Science at the School of Social Sciences of the University of California at Irvine. Elliott Sober is Hans Reichenbach Professor of Philosophy and Henry Villas Research Professor at the University of Wisconsin, Visiting Professor at the London School of Economics and Political Science, Fellow of the American Academy of Arts and Sciences and President of the Philosophy of Science Association. Chandra Sekhar Sripada completed an M.D. and an internship in psychiatry; he currently studies the philosophies of cognitive science and biology at Rutgers University. Barbara Tversky is Professor of Psychology at Stanford University. Tim van Gelder is Associate Professor (Principal Fellow) at the Department of Philosophy of the University of Melbourne and Director of the Australian Thinking Skills Institute. António Zilhão is Associate Professor at the Department of Philosophy of the University of Lisbon.
Preface
The set of nine essays collected in this volume constitutes the proceedings of the Second International Cognitive Science Conference, jointly organized in the city of Oporto, Portugal, by the Portuguese Philosophical Society and the Abel Salazar Association, between 27 September and 29 September 2002. All of them were invited as original contributions. The essay by Brian Skyrms was in the meantime published in the journal Philosophy of Science (69 (3): 407–28, 2002). I would like to thank both the author and the publisher, The University of Chicago Press, for their permission to reprint it in this volume. Other acknowledgements are also due to a number of other people and institutions. First, I would like to thank Luísa Garcia Fernandes and the crew she gathered around the Abel Salazar Association, in Oporto, for their wonderful job in making sure that all went well with the logistics of the event. André Abath provided also invaluable help at different stages of the organization of the conference. I would also like to express my gratitude to Professor Emeritus M.D. Nuno Grande, a distinguished member of the Oporto Medicine Faculty, for his own support and the support of the University of Oporto and the City Council he was able to mobilize. And of course for the support of the association he leads – an association that bears the name of Abel Salazar, the Portuguese medical scientist, polymath and enthusiastic supporter of scientific philosophy under whose aegis the conference was placed. The Calouste Gulbenkian Foundation (FCG), The Portuguese-American Foundation for Development (FLAD) and the Portuguese Foundation for the Support of Science and Technology (FCT) all contributed with funding without which the conference could not have taken place. Special acknowledgements are due to David Papineau for his early support of the idea and for his advice and suggestions, and to Tony Bruce and Terry Clague for having welcomed the proposal of publishing this book as a volume in the series Routledge Studies in the Philosophy of Science. Finally, I am most pleased to thank all the speakers and contributors of essays for their coming to Oporto in 2002, and for their scientific effort, personal cooperation and remarkable patience. The Portuguese Philosophical Society’s First International Cognitive
Preface xi Science Conference took place in Lisbon in May 1998; its proceedings were published by the Oxford University Press in 2001 under the title The Foundations of Cognitive Science. Both the First and the Second International Cognitive Science Conferences were generally held by those who attended them to have been major scientific events. They brought to Portugal an impressive array of prestigious cognitive scientists. This succession created the beginnings of a tradition. I trust this tradition in the making will be honoured by the current Direction of the Portuguese Philosophical Society with the organization of the Third International Cognitive Science Conference in 2006. António Zilhão Lisbon, Portugal
Editor’s introduction António Zilhão
The essays collected in this volume constitute the proceedings of the Second International Cognitive Science Conference, jointly organized in the city of Oporto, Portugal, by the Portuguese Philosophical Society and the Abel Salazar Association. All the papers read at this conference, held in September 2002, were invited contributions. The contributors are among the top world researchers in evolutionary thinking and cognitive science. The theme of the conference was Evolution, Rationality and Cognition: A cognitive science for the twenty-first century – also the title of this collection. The collection contains nine original essays. They cover a wide range of issues belonging to different provinces of knowledge. The issues covered vary from the evolutionary mechanisms that underlie the emergence of complex adaptive behaviours to the systematic errors in spatial memory and judgement that have been found in recent psychological research; from the optimization of the wiring layout of nervous systems to the status of folk psychology. The provinces of knowledge touched upon include philosophy of science, philosophy of biology, philosophy of mind, game theory, cognitive psychology, computational neuroanatomy, computer science and robotics. These essays constitute no random collection. Although the domains of enquiry these researchers work on differ widely, their thinking is united by a theoretical standpoint that shapes their essays essentially, namely, the evolutionary standpoint. This is the standpoint according to which the idea of evolution, besides explaining speciation and adaptation in biology, as it has been traditionally acknowledged, also has a tremendously illuminating power in the human and behavioural sciences. This power is appropriately expressed in the motto Brian Skyrms included in the conclusion of his game-theoretical essay below: “Evolution matters!” This community of approach ensures thus a unity that is much deeper than the apparent diversity brought about by the use of vocabularies and conceptual apparatuses belonging to scientific and philosophical disciplines as disparate as those mentioned above. The collection is broken up into three major parts, each comprising of three essays. Part I deals with general questions of evolutionary theory. Part
2
António Zilhão
II focuses on the issue of rationality. Part III tackles some particular cognitive problems. The collection begins with a broad methodological essay, namely, Elliott Sober’s “Intelligent design is untestable: what about natural selection?” This is an essay that combines general topics in epistemology and philosophy of science with more specific topics in the philosophy of biology. The issue Sober addresses in his essay is: What are the criteria in terms of which it is possible to distinguish between what are adaptive hypotheses with real scientific value and what is mere adaptive storytelling? This is an issue adaptive thinking has to deal with right from the start. It is therefore a good way of starting an approach to the theme Evolution, Rationality and Cognition. Elliott Sober starts his essay with a set of methodological claims. First, he claims that in order to evaluate any empirical hypothesis one has to determine its likelihood value. Second, he claims that the likelihood value of a given hypothesis is to be cashed out as the probability of the available evidence given the hypothesis. Third, he claims that testing a hypothesis essentially requires testing it against competitors. The corollary of these three methodological claims is the further claim that a tested hypothesis will prevail if its likelihood value is greater than the likelihood value of the rival hypotheses. Sober then tells us that these sound methodological principles were usually ignored by intelligent design theorists. These tended to use the “What else could it be?” type of rhetorical question in order to drive their point home. But not all did so. Sober points out that enlightened intelligent design theorists such as Arbuthnot (1667–1735) and Paley (1743–1805) did realize the methodological fault contained in the “What else could it be?” type of argument. These British creationists took chance to be the only competitor hypothesis imaginable; they therefore claimed to have proven the soundness of the design hypothesis by claiming that it had a higher likelihood value than chance. Although it is undoubtedly true that the likelihood value of chance producing complex adaptive design is very low, this proof fails because, according to Sober, it is simply not possible to ascribe any value to the likelihood of the design hypothesis in the absence of independent evidence concerning the characteristics of the designer. And if it is not possible to ascribe any likelihood value to the design hypothesis, it is not possible to claim that such a value is higher than the likelihood value of chance either, no matter how small the latter might be. Thus, contrary to the claims of Arbuthnot and Paley, the design argument, as they formulated it, is simply untestable. This fact notwithstanding, Sober claims that the methodological lesson of Arbuthnot and Paley should not be forgotten by evolution theorists. However, it is not uncommon for evolutionists to use the “What else could it be?” rhetorical question when arguing for selectionist explanations of adaptive complexity. Sober thinks this is unfortunate. He then points out that there is a modern equivalent to the hypothesis of chance within the evolutionary framework, namely, the hypothesis of random genetic drift. The
Editor’s introduction 3 central contention of Sober’s essay is then the following: evolution theorists should make sure that the likelihood value of their explanations of traits by natural selection is actually greater than the likelihood value of the alternative explanation according to which the trait to be explained happens to be the outcome of a process of pure random genetic drift. In the remainder of his essay, Sober illustrates by means of particular examples how an analysis of the comparative likelihood of a selectionist and of a pure drift hypothesis purporting to explain the presence of a particular trait in a species could be done. In the course of this analysis, he stresses two crucial points. First, the range of the concept of complexity should not be implicitly assumed to be congruent with the range of the concept of optimality, as is frequently happens. Sober argues that no matter how complex a trait is, there may be independent evidence that it is not an optimal adaptation; and, if this is the case, its presence in the organism may confer a greater likelihood to the pure drift hypothesis rather than to the hypothesis of natural selection. Thus, complexity by itself is no sure evidence for natural selection. Second, frequently the relevant auxiliary information needed to carry out a likelihood analysis of a selectionist explanation of a trait against its competitors will simply not be available. Sober then concludes his essay by advising evolution theorists to learn to live with this possibility and to strive for more modest goals when this information is indeed not available. After the discussion of broad methodological issues in evolutionary theory, we turn to matters more specifically related to the study of complex adaptive behaviour. In “Social learning and the Baldwin effect”, David Papineau deals with a particularly difficult problem evolution theorists have to face when they try to understand the display of some particular succession of complex behaviours by an animal species. This problem is: How could such a succession ever have come about if each of the behaviours by itself would do no good to the animals and if it is impossible to imagine that the whole succession of behaviours came into being simultaneously? Papineau tries to find an answer to this puzzle by appealing to the so-called “Baldwin effect”. The “Baldwin effect” was proposed more than one hundred years ago by the American psychologist James Mark Baldwin as a Darwinian mechanism that, under some conditions, might seem to corroborate the Lamarckian hypothesis that acquired characteristics could be inherited. How does the Baldwin effect work? The idea is the following. Imagine a population of animals well adapted to a particular environment. Suppose that, for some reason, the environment changes. Because of this change, some of the animals’ typical behavioural strategies cease to be adaptive. Suppose now that some members of the population are able to learn during their lifetime new behaviours that fit their new environment. These individuals will then have a much better chance to survive and reproduce than those that were not able to learn the new behavioural strategies. Moreover, if the offspring of these individuals is able to learn the new tricks from their
4
António Zilhão
parents, then they will also have a much better chance to survive and reproduce than the offspring of those who have not learned the new tricks, and so on and so forth. Baldwin’s idea is then that, under such circumstances, the population will have the chance to undergo genetic mutations that will allow the animals to display the new behavioural strategies without learning. There are two problems involved with Baldwin’s hypothesis. The first is that it is not at all clear why the new successful behavioural strategies should become innate. If the population is able to learn them and to transmit this acquired knowledge to the next generation, what advantage could it gain from getting them genetically fixed? Losing flexibility is not supposed to be a good thing. The second problem: Even assuming that there is some advantage in getting the new behavioural strategies genetically fixed, why would the mutations allowing this genetic fixation to occur be more likely to happen in the individuals having learned the new strategies than in any others? Baldwin seems to have never provided a convincing answer to the first question. As to the second question, Baldwin’s answer is implicit in the above description of the effect bearing his name. According to him, the animals capable of learning would be more likely to undergo the right mutations than the others simply because the animals unable to learn would be driven to extinction before they had any chance to undergo any mutations. The learning of the new behaviours would thus create, according to an expression coined by Godfrey-Smith, a “breathing space” that would provide enough time for the right mutations to occur and disseminate across the population of learners. Such an answer, however, seems to rely on a view of natural selection as a process that works by killing off whole legions of maladapted organisms. However, the appropriate view of natural selection is as a process that affects the reproductive rates of populations. Although phenomena of mass extinction are indeed possible, they seem to be the exception rather than the rule. Be this as it may, Baldwin provides us with no intrinsic reason why we should expect that the acquisition of the new behaviours by learning would in any way contribute to the selection of the genes that would render them innate (besides, of course, by keeping the organisms alive and thus keeping all options open). In his essay, Papineau argues that there are indeed mechanisms – those of genetic assimilation and niche construction – in terms of which it is possible to find a convincing answer to this latter question. He claims further that there are cases of social learning in which these mechanisms of genetic assimilation and niche construction can be seen to operate. He then proceeds to analyse particular cases of social learning in some animal species and argues that these cases provide us also with an answer to the first question above: What advantage is there in genetically fixing a behavioural trait that can be learned? Thus, according to Papineau, the consideration of these cases allows us to
Editor’s introduction 5 understand how “Baldwin effect” phenomena might account for at least some of the more mind-boggling evolutionary processes: those by means of which successions of innate complex adaptive behaviours can arise by natural selection. The last of the three essays included in Part I of this volume is Brian Skyrms’s “Signals, evolution, and the explanatory power of transient information”. This is an essay in evolutionary game theory. It is a contribution to an account of how communication systems might evolve in populations of differential replicators. In his famous 1969 essay “Convention”, David Lewis was able to show how a simple communication system can be modelled as a game-theoretical equilibrium and how such an equilibrium can remain stable in a population if all of its members share a common and identical interest in communicating the right information and if both common knowledge of the structure of the game and of rationality is assumed. The original selection of the signalling equilibrium embodying the communication system was, in turn, accounted for in terms of saliency. Criticisms of Lewis’s model pointed out that, on the one hand, his assumptions of common knowledge of the structure of the game and of rationality were too strong to be empirically credible and that, on the other hand, some convincing story needed to be told about how any particular signalling system became salient in the first place. In his previous work, Skyrms showed that these criticisms can be met if the gametheoretical approach to signalling systems is conceived of in evolutionary terms rather than in terms of rational choice. Within the evolutionary framework, neither the strong assumptions of common knowledge of the structure of the game and of rationality nor salience are needed. An equilibrium may be simultaneously reached and selected among many other possible equilibria by the sheer dynamics of the process of differential reproduction. One of Lewis’s assumptions remained undisputed though, namely, the assumption that all members of the relevant population share a common and identical interest in the occurrence of successful communication. But this assumption admits also being challenged as unrealistic. The Israeli evolutionary biologist Zahavi addressed this challenge. He concentrated his attention on the study of costly signals and pointed out that informative signalling is also bound to evolve under circumstances of unequal interests if we take the meaning of the signals to be the showing off that the sender is able to pay the cost of sending them. In his contribution to this volume, Skyrms goes one step further and challenges the idea that costliness is required for the emergence of meaningfulness under circumstances of unequal interests. He runs computer simulations of the evolutionary dynamics of different Stag Hunt and bargaining games to which costless pre-play signalling devoid of any pre-existent meaning was added. According to rational choice theory, such signals should never get any informative content at all and should thus remain completely ineffective.
6
António Zilhão
The results obtained in Skyrms’s simulations contradict the expectations brought about by rational choice theory. Equilibria that would otherwise emerge are destabilized by the introduction of costless signalling and surprising new equilibria are created. Moreover, the relative magnitude of the original basins of attraction is also considerably shifted. Unless some alternative explanation is presented that is able to account for these effects, the results Skyrms obtained in his simulations seem to vindicate the thesis that costless signalling may become informative under conditions of unequal interests. If an evolutionary understanding of the emergence of human languages is to be achieved, this is an extremely important result. The second part of the volume begins with Peter Godfrey-Smith’s “Untangling the evolution of mental representation”. What is at stake in his essay is the ontogenetic onset of rationality. Godfrey-Smith begins tackling this issue by discussing the status of folk psychology and the nature of semantic properties. He tries to clarify this much debated problem by introducing an alternative understanding of the so-called “theory–theory” approach and by suggesting a new way of regarding the relation that obtains between folk psychology and our inner cognitive mechanisms. The debate on this topic traditionally revolves around two issues. First, the issue of knowing what is the right way to account for our folkpsychological practices of interpreting actions as intentional; second, the issue of knowing what is the extent to which these practices accurately reflect the details of our inner cognitive mechanisms. Two views dominate this debate: the nativist view and the so-called “interpretation stance” view. According to the former view, folk psychology reflects a competence for the understanding of our conspecifics as intentional creatures we are innately endowed with; moreover, this competence is supposed to tell us something substantive about the underlying mechanisms subserving intentional action. According to the latter view, folk-psychological practices of action-interpretation are just a behaviour-dependent way of rationalizing our actions and they tell us nothing substantive about the cognitive mechanisms in question. This view is held by, e.g., Daniel Dennett. The former view admits being divided in turn into two main sub-views: the theory–theory approach and the simulationist approach. The theory–theory approach, held by, e.g., Jerry Fodor, claims that folk psychology is a descriptive theory innately realized in a module of our minds which is basically true of the inner cognitive mechanisms subserving our actions; the simulationist approach, held by, e.g., Alvin Goldman, claims that the folk-psychological interpretive competences we display result from an innate simulation ability by means of the exercise of which we end up understanding the mental lives of others by assuming that they undergo the same mental processes we do when we place ourselves in the situations they find themselves in. Godfrey-Smith’s own alternative to the theory–theory approach consists in considering folk psychology to be a model, in the science-philosophical sense of the term, rather than a theory. As a model, folk psychology should
Editor’s introduction 7 be understood as an abstract structure, definable in terms of a characteristic set of elements and interrelations between them. Thus, by the age they begin to reason in intentional terms, children would not be displaying the command of a sophisticated theory of rationality; they would rather be acquiring a competence to reason according to such a loosely defined structure. Seen as a model, folk psychology is also not supposed to determine its own interpretation. Godfrey-Smith therefore thinks that the folk-psychological model is in fact compatible with almost all interpretations of it which have been put forth in the philosophical literature. The other most contentious issue in the debate regarding the status of folk psychology and the nature of semantic properties is the determination of the relation it has with the underlying cognitive mechanisms of the human mind. In this respect, Godfrey-Smith makes two distinct suggestions. The first is that if we assume, as all parties in the debate seem to do, that folk-psychological explanations have been around in our interpretive practices for a long time, then it will probably be the case that they have exerted some impact upon our cognitive mechanisms (and vice versa). The justification for this conclusion is simple: the cognitive mechanisms in question are meant to guide us in our social interactions; the environment in which these social interactions have been consistently taking place is an environment in which the expectations of others towards us and their explanations of our behaviour play a pre-eminent role; therefore, these cognitive mechanisms were exposed to natural selection in an environment shaped by folk-psychological practices. Thus, some sort of co-evolution of the two traits is to be expected, and it is highly unlikely that none of them somehow reflects the other. The second suggestion is about the precise nature of this reflection and is bound to be highly controversial. Contrary to standard theory-theorists, who claim that folk psychology results from an innate module of our mind that gets triggered in the course of the maturational process when children are around four years of age, Godfrey-Smith puts forth a sort of neo-Whorfian view according to which folk psychology exists primarily as a social and linguistic practice. However, as a consequence of the evolutionary interaction mentioned above, children rewire substantially the structure of their social thinking along folk-psychological lines by the age of four. It is such a rewiring that makes folk psychology true of them from then on. That is, the onset of rationality takes place at this stage as the consequence of a process of internalization. There is a sense in which Godfrey-Smith’s proposal might be seen as reminiscent of Dennett’s view of human consciousness. As a matter of fact, according to the latter, consciousness is the result of a massive reprogramming of the child’s brain. This reprogramming is, in turn, induced by the child’s submission to socially produced linguistic inputs. We might thus say that, according to Godfrey-Smith, the explanatory model and the inner reality end up matching each other, not because the explanatory model describes accurately a pre-existent reality it was meant to
8
António Zilhão
cognize, but rather because the inner reality transforms itself in order to adapt to a social reality shaped in agreement with the explanatory model. The scepticism towards the standard theory–theory approach to individual rationality, apparent in Godfrey-Smith’s essay, is further developed in Christopher Cherniak’s essay. Having produced previously an extensive body of work in computational neuroanatomy, Christopher Cherniak is concerned with the following question: What is the right level of structural complexity at which talk of optimization in a nervous system makes evolutionary sense? In his essay titled “Innateness and brain-wiring optimization: non-genomic nativism” he uses formal tools developed in the area of Computer Science called component place optimization in order to conclude that such a talk is best suited to the hardware domain of wiring layout rather than to the software domain of abstract cognitive structure. The optimization observed in the wiring layout of different organisms’ nervous systems cries out for an explanation. How is it that it might have come about? At this stage, Cherniak presents us with a curious analogy between computationalist views in the philosophy of mind and a fundamental assumption of modern genetics. The idea that the mind is best viewed as an abstract software structure, typical of functionalism, correlates well with the idea that the genome is a program that codes for the construction of a whole organism. However, just as the former, the latter idea needs to face some hard questions. One of them is: How much information is it really possible to compress in a genetic code? This question becomes even more poignant if we restrict our attention to particular organs. For instance, how much specific information for brain building can actually be coded in a genetic code? Cherniak’s answer to this question is that probably not that much. He estimates that “the amount of brain-specific DNA available might amount to as little information as is contained in a desk dictionary (about 50,000 entries) i.e., ⬵100 Mb total”. Note that he is talking about the human brain here; arguably, the most complex physical structure known in the universe. How is this possible? Cherniak advances a bold thesis to answer this question. According to him, a significant part of an organism’s anatomical structure is accounted for by optimization processes that are generated directly from underlying physical processes with no genomic intermediation. He speaks of a “division of labour” existing between the genome and these more basic physical processes. Nativists typically insist that the mind is no tabula rasa; Cherniak takes the underlying intuition a few steps further. According to him, the information contained in a genome does not fluctuate in some sort of ethereal information space either; rather it is inscribed in a particular type of matter already containing significant structural information; otherwise, it would have no chance of being effective. The extension of Cherniak’s thesis to the rationality debate in cognitive science leads him to stress how profoundly hardware and software engineering differ. Moreover, this difference is, according to him, responsible for
Editor’s introduction 9 the abyss that separates the performances of the two domains over the last fifty years. As he puts it, quite crudely: “If hardware had developed as has AI, we would still be using abacuses and sliderules – computers would merely be exotic laboratory confections.” He concludes his essay with a prediction that will not be particularly welcomed by the supporters of traditional cognitive science, i.e., What the future, dominated by hardware engineering, has to offer us is probably the production of intelligent behaviour from opaque dynamical processes in which no states of the mechanisms admit being neatly identifiable as representations or logical rules for processing them. Harvey’s essay complements Cherniak’s well. As a matter of fact, in “Evolution and the origins of the rational”, Harvey contends that mainstream philosophy of mind and cognitive science got trapped in a conceptual dead end because of careless use of intentional language in empirical research. He quotes Wittgenstein’s famous metaphor of the fly trapped in the fly-bottle to describe the situation. And, in tune with Wittgenstein’s therapeutic recommendations, he equally contends that the way to get rid of the insoluble mind-philosophical mysteries that, according to him, plague these views is by eschewing mentalist language altogether in the course of applied research. His contribution to this volume is an essay in evolutionary robotics. In it he tries to show how we can examine our assumptions regarding our everyday and philosophical uses of the language of intentionality and rationality by creating through evolution artificial life forms and observing their behaviour and interaction with each other. On the one hand, we might say that Harvey views his own work, developed within the dynamical systems approach, as being carried out from the perspective Cherniak refers to as the perspective typical of hardware engineering as opposed to that typical of software engineering. On the other hand, however, he explicitly describes it as being an approach to cognition that admits being seen as a sort of “philosophy of mind with a screwdriver”. Such an approach, he contends, is more challenging than its armchair counterpart because its assumptions are tested in the construction of real physical devices. These devices are also subject to processes of artificial selection that mimic the Darwinian mechanisms existent in nature. The upshot of such processes is the evolution of animated creatures that are capable of simple goal-directed behaviours such as avoiding obstacles or approaching and fleeing targets. And, following in the wake of Cherniak’s prediction, he explicitly contends that the inner architecture of the mechanisms subserving adaptive behaviour in evolved creatures is simply too opaque to be usefully described in our usual intentionality laden cognitive vocabulary. The production of these artificial animated creatures plays a double theoretical role then. On the one hand, it is used to challenge the reasonableness of certain theoretical assumptions and preconceptions of the mainstream view by showing that their physical world implementation is simply either
10
António Zilhão
not feasible or just not appropriate. On the other hand, it can be used to prove that intelligent adaptive behaviour can be elicited from physically implemented cognitive architectures in which the intent to isolate some discrete states as being the representations, beliefs or desires of the evolved device is simply hopeless. Moreover, given that these architectures were evolved out of completely random genotypes of artificial DNA by the pressures of artificial selection alone, the implication is that real-world cognitive capacities evolved in the earth’s biosphere by a process of natural selection should be subserved by cognitive architectures of the same type. That is, this work can be used for mind philosophical purposes as an argument by analogy. Of course, such a program is limited, for the moment at least, to the analysis, understanding and reproduction of relatively simple adaptive behaviours, such as those one is bound to find in bacteria, insects and other “inferior” animals. But Harvey claims that this is the sensible approach to take: only beginning small and simple can one hope to achieve a proper understanding of the large and complex. The attempt to sidestep this stage of research by mainstream cognitive science and to try to model highly complicated patterns of human intelligent behaviour right from the start is, according to Harvey, another major factor in what he considers to be the quasi-paralysis afflicting research in traditional computationalist AI these days. The third and last part of the volume comprises three essays dealing with specific cognitive questions. These are space cognition, emotion recognition and the psychology of reasoning. Part III begins with Barbara Tversky’s essay “How to get around by mind and body: spatial thought, spatial action” – an essay in the psychology of space cognition. This essay deals with the systematic errors that have been found in spatial memory and judgement and sketches a way of accounting for them. According to Barbara Tversky, the space of navigation has been studied by two research communities, each of them approaching the subject from a rather different angle. She calls one of these communities the mind community and the other the body community. According to her, psychologists belonging to the mind community have been concerned mainly with gathering knowledge from the analysis of spatial judgements made explicitly by human subjects. Still according to her, psychologists belonging to the body community have been concerned mainly with gathering knowledge from the analysis of animal spatial behaviour. Startlingly, these two communities seem to have been arriving at contradictory conclusions. As a matter of fact, whereas the mind community produces study after study in which more and more systematic errors in the spatial judgements of agents are unveiled, the body community constantly emphasizes the extreme accuracy and fine-tuning of animal spatial behaviour. This is a striking contrast, crying out for analysis and explanation. Barbara Tversky’s essay is an attempt at providing us with one such explanation.
Editor’s introduction 11 Barbara Tversky claims that people think about space in a nongeometrical and non-isotropic way. As a matter of fact, according to her, one of the main aspects of people’s spatial thinking is its hierarchical organization. This form of organization of space has a rationale: it helps keeping track of correlations in memory, it helps retrieving them from there, and it also facilitates spatial inference. However, there is a trade-off here. Comprehensiveness and complete accuracy are sacrificed for manageability. A consequence of this trade-off is that once the structure of a particular organization of spatial thought is understood, it is possible to frame spatial questions in such a way that, in order to answer them rightly, the subject must violate the organizing hierarchy structuring his own spatial thinking. Not surprisingly, these answers are more often than not answered wrongly and subjects are led to fall into contradiction. Barbara Tversky claims that this is precisely the phenomenon the mind community has been unveiling. Hierarchical organization is just one of the several non-geometric and nonisotropic aspects of people’s spatial perception and thinking though. There are others, leading to different kinds of systematic error in people’s spatial judgements. What should be crucial here, however, is, according to Barbara Tversky, the realization that these errors are not inform; rather, they stem from characteristic patterns. The presence of spatial thinking in humans is not to be evolutionarily accounted for in terms of a need to answer correctly tricky questionnaires imagined by clever psychologists. Rather, it is there in order to allow us to be able to get back home, to find out the way to places where food is or to trace back escape routes from predators or enemies. As the body community has been consistently showing, animals, human or otherwise, are extremely good at doing this. Barbara Tversky tells us that they tend to explain this performance in terms of both local sensory-motor couplings and an important reliance on local environmental cues that help correcting them. In view of this, she claims that spatial thought cannot be adequately understood independently of spatial action. She then goes on to assert that, once such a coupled understanding is achieved, the theoretician must realize that the accuracy that is sacrificed by the sort of structure that underlies the organization of general spatial thinking is promoted contextually by the interaction of the agent with the environment. However, the fine-tuning brought about by the interaction with the environment does not affect the general mechanisms that are mobilized in order to produce idle “armchair” spatial judgements. This, in turn, explains why systematic errors in these judgements persist despite the existence of selective pressures that promote accuracy of navigation. Sripada’s and Goldman’s joint essay titled “Simulation and the evolution of mindreading” deals with an already mentioned cognitive ability humans display: the ability to understand intentionally the behaviour of their conspecifics. They call this ability “mindreading”. More specifically, Sripada and Goldman are interested in two particular questions associated with
12
António Zilhão
mindreading: How do people do it? and What might be the evolutionary background for the development of one such ability in us? As previously mentioned, there is an ongoing mind-philosophical debate between supporters of different approaches to “mindreading”. The proper way of answering the first question above is obviously the focus of the dispute. Sripada and Goldman’s essay is meant to be a contribution to this debate in that they provide an argument in favour of one of these approaches – the so-called “simulation theory”. However, they do not address here the topic that tends to be most hotly discussed in connection with this debate, namely, the topic of propositional attitude ascription. Neither is their argument a general one. Rather, it is restricted to the analysis of only one of different mindreading tasks, namely, face-based emotion recognition. This is the reason why their essay is included in Part III of this collection instead of being presented in association with Godfrey-Smith’s, Cherniak’s and Harvey’s essays in Part II. Face-based emotion recognition is the ability to ascribe the experiencing of particular emotions to other humans when one is confronted with their facial expressions. Sripada and Goldman’s simulationist claim is then the following. Face-based emotion recognition is best accounted for in terms of a simulation process by means of which the reproduction of an emotional state gets triggered in the mind of the human observer when he is confronted with the emotionally laden facial expression of another human. The observer then ascribes to the target of his observation the experiencing of the emotional state he actually enacted in his own mind. Sripada and Goldman’s claim is a purely empirical one. They therefore present experimental evidence in order to support it. The evidence in question is of two different kinds. First, the analysis of clinical stories of braindamaged patients which became unable to feel a certain number of basic emotions (namely, fear, anger or disgust) as a consequence of their lesions. According to Sripada and Goldman, these clinical stories show more than impairment in experiencing the appropriate emotions under the appropriate circumstances. They also show that these patients became unable to detect these emotions in other people’s faces too. Second, an fMRI study of the experiencing of a particular emotion (disgust) by normal subjects. According to Sripada and Goldman, the neuroimaging produced in this study shows that the same areas of the brain are activated both when the subjects are experiencing disgust and when they are observing facial expressions of other people undergoing the experience of disgust. To what extent can it be said that this evidence supports the authors’ claim? As Sober put it in the opening essay of this collection, testing an empirical hypothesis is testing it against its competitors in order to determine which of them has a greater likelihood value. The competitor hypothesis against which Sripada and Goldman are measuring their own hypothesis is the already mentioned “theory–theory” hypothesis. The question should then be rephrased thus: To what extent can it be said that the authors’ claim
Editor’s introduction 13 confers a greater probability on the observations than the claim of the supporters of the “theory–theory”? Sripada and Goldman argue that the empirical findings they collected are precisely those that should be expected to be found under the assumption of the truth of their hypothesis. That is, they claim that it is a consequence of their hypothesis that the same mechanisms should be mobilized in order both to undergo a mental state and to detect it in a conspecific. They thus argue that the probability of the combined impairment given their hypothesis is extremely high. Similar considerations apply for the appraisal of the neuroimaging evidence. On the other hand, Sripada and Goldman claim that assuming the truth of the rival hypothesis does not probabilify the evidence to any relevant degree, given the distinction drawn by the theory–theory supporters between the information-based nature of the cognitive procedures assumed to be at work in the process of mindreading and the non-information based nature of the processes that are assumed to underlie the experiencing of basic emotions. Besides resorting to clinical evidence, Sripada and Goldman also put forth an evolutionary argument in order to both back up their claim and to try to answer the second question above. This argument is twofold. On the one hand they claim that, contrary to theory-building, a simulation routine admits being understood as a fast and frugal heuristic, in the sense of the term coined by Gigerenzer and Todd. As such, it works by taking advantage of a stable property of the usual human environment (namely, the fact that other humans, endowed with similar cognitive apparatuses, are part of it), in order to deliver a reliable cognitive behaviour at low computational costs. On the other hand, they claim that there is a plausible route for the evolution of a simulation routine in humans, namely, an exaptation for the purpose of mindreading of the processes underlying the well studied phenomenon of emotion contagion. Again, according to the authors, no such plausible evolutionary route can be foreseen for a theory-based approach to face-based emotion recognition. Finally, we get to the last essay in this collection. It is Tim van Gelder’s “Enhancing and augmenting human reasoning”. In this essay, van Gelder emphasizes not so much how the cognitive sciences help us reach an understanding of the human mind but rather how they may help us improve its capabilities. In particular, van Gelder is interested in the improvement of actual human reasoning. According to him, this is a desideratum that may best be achieved by a proper use of computer-supported argument mapping. Computer-supported argument mapping is a software package for producing and manipulating graphical presentations of reasoning structures in a computer screen. According to van Gelder, this tool helps improve people’s reasoning skills in two ways. First, it helps enhance whatever reasoning skills people may already have, namely, those skills they unconsciously display in their everyday arguments and inferences. Second, it allows people to augment their reasoning abilities by helping them to perform more accurately and more extensively in this domain.
14
António Zilhão
Whether or not computer-supported argument mapping has the beneficial effects in human inferential performance van Gelder claims it to have, is an empirical question that can only be decided by amassing large bodies of experimental data. Van Gelder refers some supportive psychological studies performed at the University of Melbourne in which the impact of the introduction of this tool in the teaching of critical thinking was actually measured. However, as he himself acknowledges, a more extensive research is still needed. Given that his conviction is that the conclusions of such a future extensive research will only strengthen the partial results already observed at the University of Melbourne, van Gelder proceeds by trying to find an explanation for the beneficial effects the pedagogical use of this software package is supposed to have on actual reasoning skills. The explanation he comes up with is that these effects are a consequence of the more “embodied” character reasoning acquires when it is represented by means of computer-supported argument mapping techniques. This explanation is in turn to be understood against a theoretical background according to which our reasoning abilities were developed not ab ovo but by the requisitioning of the more basic sensory-motor abilities of our minds for this new job. The use of colours, lines, shapes, spatial distributions and other graphical devices in order to represent argument structure makes us feel more “at home” in reasoning tasks precisely because these are the aspects of the world our minds were primarily designed to attend to. Conversely, most people face difficulties when trying to reason properly in abstract terms because, in the absence of such an embodiment, traditional intellective procedures are felt as “foreign” by their minds. Van Gelder claims that radical changes in the equipment we use to help us reason may have not only the effect of advancing our capabilities but also the effect of transforming our minds. Thus he claims that, to this extent, these changes are bound to acquire a role in the evolution of human nature. Now, one such radical change has indeed happened in the past, namely, the introduction of writing. Van Gelder concludes his essay by suggesting that the regular use of computer-supported argument mapping will have as dramatic an effect on our future intellectual lives as the regular use of alphabetic writing had on the intellectual lives of our ancestors over three thousand years ago. I do not know how accurate this contention might be. But its boldness certainly closes this volume nicely.
Part I
Evolution
1
Intelligent design is untestable What about natural selection? Elliott Sober1
The argument from design is best understood as a likelihood inference. Its Achilles heel is our lack of knowledge concerning the aims and abilities that the putative designer would have; in consequence, it is impossible to determine whether the observations are more probable under the design hypothesis than they are under the hypothesis of chance. Hypotheses about the role played by natural selection in the history of life also can be evaluated within a likelihood framework, and here too there are auxiliary assumptions that need to be in place if the likelihoods of selection and chance are to be compared. I describe some problems that arise in connection with the project of obtaining independent evidence concerning those auxiliary assumptions.
1 What else could it be? Defenders of the design argument sometimes ask “What else could it be?” when they observe a complex adaptive feature. The question is rhetorical; the point of asking it is to assert that intelligent design is the only mechanism that could possibly bring about the adaptations we observe. Contemporary evolutionists sometimes ask the same question, but with a different rhetorical point. Whereas intelligent design seems to some to be the only game in town, natural selection seems to others to be the only possible scientific explanation of adaptive complexity. I propose to argue that intelligent design theorists and evolutionists are both wrong when they argue in this way. Whenever a hypothesis confers a probability on the observations without deductively entailing them, evaluating how well supported the hypothesis is requires that one consider alternatives. Testing the hypothesis requires testing it against competitors. Developing this point leads to a recognition of the crucial mistake that undermines the design argument. The question then arises as to whether evolutionary hypotheses about the process of natural selection fall prey to the same error. Although I’ll begin by emphasizing the parallelism between intelligent design and natural selection, I emphatically do not think that they are on a par. The relevant point of difference is that intelligent design, as a claim about the adaptive features of organisms, is, at least as it has been developed
18
Elliott Sober
so far, an untestable hypothesis. Hypotheses describing the role of natural selection, on the other hand, can be tested. But how they are to be tested is an interesting question, as we shall see.
2 Likelihood and intelligent design As mentioned, “What else could it be?” is a rhetorical question, whose point is to assert that some favored mechanism is the only one that could possibly produce what we observe. This line of reasoning has a familiar deductive pattern, namely modus tollens: If H were false, O could not be true. O is true. (MT) —————————————— ⬖ H is true. Despite the allure of this line of reasoning, many defenders of the design argument have recognized that it is misguided. One of my favorite versions of the argument is due to John Arbuthnot (1710), who was clear about this point. Arbuthnot tabulated birth records in London over 82 years and noticed that in each year, slightly more sons than daughters were born. Realizing that boys die in greater numbers than girls, he saw that the slight bias in the sex ratio at birth gradually subsides until there are equal numbers of males and females at the age of marriage. Arbuthnot took this to be evidence of intelligent design; God, in his benevolence, wanted each man to have a wife and each woman to have a husband. To draw this conclusion, Arbuthnot considered what he took to be the relevant competing hypothesis – that the sex ratio at birth is determined by a chance process. Arbuthnot had something very specific in mind when he spoke of chance; he meant that each birth has a probability of of being a boy and a probability of of being a girl. Under the chance hypothesis, a preponderance of boys in a given year has the same probability as a preponderance of girls; there is, in addition, a third possibility that has a very small probability (e) – namely, that there should be exactly as many boys as girls in a given year: Pr(more boys than girls are born in a given year ⱍ Chance) ⫽ Pr(more girls than boys are born in a given year ⱍ Chance) ⬎ Pr(equal numbers of boys and girls are born in a given year ⱍ Chance) ⫽ e Thus, the probability that more boys than girls will be born in a given year, according to the Chance hypothesis, is a little less than . The Chance hypothesis therefore entails that the probability of there being more boys than girls in each of the 82 years is less than ()82 (Stigler 1986: 225–226). Arbuthnot did not use modus tollens to defend intelligent design; rather, he constructed a likelihood inference:
Intelligent design is untestable 19 Pr(Data ⱍ Intelligent Design) is very high. Pr(Data ⱍ Chance) ⬍ ()82. (L) —————————————— The Data strongly favor Intelligent Design over Chance. Arbuthnot used a principle that later came to be called “The Law of Likelihood” (Hacking 1965; Edwards 1972; Royall 1997): the data lend more support to the hypothesis that confers on them the greater probability. Here and in what follows, I use the terms “likelihood” and “likely” in the technical sense introduced by R.A. Fisher (1925). The likelihood of a hypothesis is not the probability it has in the light of the evidence; rather, it is the probability that the evidence has, given the hypothesis. Don’t confuse Pr(Data ⱍ H) with Pr(H ⱍ Data); the former is H’s likelihood, while the latter is H’s posterior probability. Understood in this way, Arbuthnot’s argument does not purport to show that the sex ratio data he assembled was probably due to intelligent design. To obtain that result, he’d need further assumptions concerning the prior probabilities of the two hypotheses.2 I omit these in my reconstruction of the design argument because I don’t see how they can be understood as objective quantities. The likelihood version of the design argument is modest. As just noted, it declines to draw conclusions about the probabilities of hypotheses. But it is modest in a second respect – it does not claim to evaluate all possible hypotheses. Arbuthnot considered Design and Chance, but could not have addressed the question of how Darwinian theory might explain the sex ratio. This puzzled Darwin (1872) and was successfully analyzed by R.A. Fisher (1930) and then by W.D. Hamilton (1967). Thus, even if Arbuthnot is right that Design “beats” Chance, it remains open that some third hypothesis might trump Design. There is no way to survey all possible explanations; we can do no more than consider the hypotheses that are available. The idea that there is a form of argument that sweeps all possible explanations from the field, save one, is an illusion.3 I conclude that the first premise in the modus tollens version of the design argument is false. It is false that Intelligent Design is the only process that could possibly produce the adaptations we observe. Long before Darwin, Chance was on the table as a possible candidate, and after 1859 the hypothesis of evolution by natural selection provided a third possibility. In saying this, I am not commenting on which of these three explanations is best. I am merely making a logical point. What we observe is possible according to all three hypotheses. We can’t use modus tollens in this instance. Rather, we need to employ a comparative principle; the Law of Likelihood seems eminently suited to that task.
20
Elliott Sober
3 What’s wrong with the design argument? To explain what is wrong with the design argument as an explanation of the complex adaptive features that we observe in organisms, it is useful to consider an application of this style of reasoning that works just fine. Here I have in mind William Paley’s (1802) famous example of the watch found on the heath. Construed as a likelihood inference, Paley’s argument aims to establish two claims – that the watch’s characteristics would be highly probable if the watch were built by an intelligent designer and that the characteristics would be very improbable if the watch were the product of chance. The latter claim I concede. But why are we so sure that the watch would probably have the features we observe if it were built by an intelligent designer? To clarify this question, let’s examine Table 1.1, which illustrates a set of possibilities concerning the abilities and desires that the putative designer of the watch might have had. The cell entries represent which hypothesis – intelligent design or chance – confers the higher probability on the watch’s being made of metal and glass.4 Which hypothesis wins this likelihood competition depends on which row and column is correct.5 The observation that the watch is made of metal and glass would be highly probable if the designer wanted to make a watch out of metal and glass and had the knowhow to do so, but not otherwise. If we have no knowledge of what these goals and abilities would be, we will not be able to compare the likelihoods of the two hypotheses. The question we are now considering did not stop Paley in his tracks, nor should it have done. It is not an unfathomable mystery what goals and abilities the putative designer would have if the designer is a human designer. When Paley imagined walking across the heath and finding a watch, he already knew that his fellow Englishmen are able to build artifacts out of metal and glass and are rather inclined to do so. This is why he was entitled to assert that the probability of the observations, given the hypothesis of intelligent design, is reasonably high. Table 1.1 Which hypothesis, Design or Chance, confers the greater probability on the observation that the watch is made of metal and glass? That depends on the abilities and desires that the putative designer would have if he existed Desires: what does the putative designer want the watch to be made of?
Abilities: what materials does the putative designer know how to use?
Metal and glass
Not metal and glass
Metal and glass
Design
Chance
Not metal and glass
Chance
Chance
Intelligent design is untestable 21 The situation with respect to the eye that vertebrates have is radically different. If an intelligent designer made this object, what is the probability that it would have the various features we observe? The probability would be extremely low if the designer in question were an eighteenth-century Englishman. But we all know that Paley had in mind a very different kind of designer. The problem is that this designer’s radical otherness put Paley in a corner from which he was unable to escape. He was in no position to say what this designer’s goals and abilities and raw materials would be, and so he was unable to assess the likelihood of the design hypothesis in this case. The problem that Paley faced in his discussion of the eye is depicted in Table 1.2. If the putative designer were able to make the eye that vertebrates have (a “camera eye”) and wanted to do so, then Design would have a higher likelihood than Chance. But if the designer were unable to do this, or if he were able to do whatever he pleased but preferred giving vertebrates the compound eye now found in many insects, Chance would beat Design. Paley had no independent information about which row and which column is true (nor even about which are more probable and which are less). Thus, Paley’s analogy between the watch and the eye is deeply misleading. In the case of the watch, we have independent knowledge of the characteristics the watch’s designer would have if the watch were, in fact, made by an intelligent designer. This is precisely what we lack in the case of the eye. It does no good simply to invent assumptions about raw materials and desires and abilities; what is needed is independent evidence about them. Paley emphasizes in Natural Theology that he intends the design argument to establish no more than the existence of an intelligent designer, and that it is a separate question what characteristics that designer actually has. His argument runs into trouble because these two issues are not as separate as Paley would have liked.6 The criticism I have just described of the design argument does not require us to consider Darwinian theory as an alternative explanation. We do Table 1.2 Which hypothesis, Design or Chance, confers the greater probability on the observation that vertebrates have a camera eye? That depends on the abilities and desires that the putative designer would have if he existed Desires: what kind of eye does the putative designer want vertebrates to have?
Abilities: what kind of eye is the putative designer able to give to vertebrates?
A camera eye
A compound eye
A camera eye
Design
Chance
Only a compound eye
Chance
Chance
22
Elliott Sober
not need an alternative explanation of the adaptive contrivances of organisms to see that the intelligent design hypothesis – at least as it was developed by Arbuthnot and Paley, and as it is put forward by present-day intelligent design theorists – is untestable.7
4 The parallel challenge for selectionist explanations Just as hypotheses that postulate an intelligent designer cannot be justified by saying that no other process could possibly give rise to the adaptive features we observe, the same is true of hypotheses that appeal to the process of natural selection. In this case as well, we need to compare the likelihood of the hypothesis of natural selection with the likelihood of one or more alternative explanations. One obvious alternative is the idea of chance, which in modern evolutionary theory takes the form of the hypothesis of random genetic drift. The drift hypothesis says (roughly) that the alternative traits present in a lineage have nearly identical fitnesses and that the frequencies of traits in the population change by random walk. Here we may repeat what Arbuthnot said about chance in connection with sex ratio – it is very improbable (though not impossible) that the vertebrate eye should have the features we observe, if it arose by random genetic drift. We now need to consider what the probability of the eye’s features are, if the eye was produced by natural selection. That turns out to depend on further assumptions. Of course, these further assumptions do not concern the raw materials, goals, and abilities that a putative designer might have. To make it easier to explain what these further assumptions are, I’m going to change examples for a while – from the much beloved vertebrate eye to the fact that polar bears have fur that is, let us say, 10 centimeters long. I’ll return to the eye later on and describe how lessons drawn from thinking about bear fur apply to it. First, I need to clarify the two hypotheses I want to compare. I will assume that evolution takes place in a finite population. This means that there is an element of drift in the evolutionary process, regardless of what else is going on. The question is whether selection also played a role. So we have two hypotheses – pure drift (PD) and selection plus drift (SPD). Were the alternative traits identical in fitness or were there fitness differences (and hence natural selection) among them? I will understand the idea of drift in a way that is somewhat nonstandard. The usual formulation is in terms of random genetic drift; however, the problem I want to address concerns a phenotype – the evolution of fur that is 10 centimeters long. To decide how random genetic drift would influence the evolution of this phenotype, we’d have to know how genes influence phenotypes. How many loci influence this phenotype? Are the different loci additive in their effects on fur length? I am going to bypass these genetic details by using a purely phenotypic notion of drift: under the hypothesis of pure drift, a population’s probability of increasing its average fur length by a small amount is the same as its
Intelligent design is untestable 23 probability of reducing fur length by that amount.8 I’ll similarly bypass the genetic details in formulating the hypothesis of selection-plus-drift; I’ll assume that the SPD hypothesis identifies some phenotype (O) as the optimal phenotype and says that an organism’s fitness decreases monotonically as it deviates from that optimal value. This means, for example, that if 12 centimeters is the optimal fur length, then 11 is fitter than 10, 13 is fitter than 14, etc.9 Given this singly-peaked fitness function whose optimum is O, the SPD hypothesis says that a population’s probability of moving a little closer to O exceeds its probability of moving a little farther away. The SPD hypothesis says that O is an attractor in the lineage’s evolution.10 For evolution to occur, either by pure drift or by selection plus drift, there must be variation. I’ll assume that mutation always provides a cloud of variation around the population’s average trait value; this assumption is amply justified by observations of many traits in many natural populations. We now need to assess the likelihoods of the two hypotheses. Given that present day polar bears have fur that is 10 centimeters long, what is the probability of this observation under the two hypotheses? The answer depends on the fur length that the ancestors of present day polar bears possessed and also on the optimal fur length toward which natural selection, if it occurred, would be pushing the lineage. Some of the options are described in Table 1.3. The lineage leading to the present population might begin with fur that is 2 or 8 or 10 centimeters long. And the optimal fur length might be 2 or 8 or 10 or 12 or 18 centimeters. Suppose that the population’s present value of 10 centimeters also happens to be the optimal value; this is the situation represented by the third column of Table 1.3. In this case, the initial state of the lineage does not matter. Regardless of which row we consider, the hypothesis of selection-plus-drift has a higher likelihood than the pure drift hypothesis – Table 1.3 If polar bears now have fur that is 10 centimeters long, does the hypothesis of SPD or the hypothesis of PD render that outcome more probable? The answer depends on the lineage’s initial state and on the fur length that would be optimal if selection were in operation. Cells that have SPD or PD in them describe which hypothesis is more likely. The answers for cells with O or U in them will be described presently; in these cases, the population either overshoots or undershoots the optimum postulated by the SPD hypothesis. Possible optimal fur lengths
Possible initial states
2 8 10
2
8
10
12
18
PD PD PD
O PD PD
SPD SPD SPD
U U U
U U U
24
Elliott Sober
polar bears have a higher probability of exhibiting a trait value of 10 if selection is pushing them in that direction than they would have if fur length were the result of pure drift. In contrast, suppose that 2 is the optimal fur thickness. If the lineage starts evolving with a trait value of 8, then selection would work against its increasing to a value of 10. Reaching a value of 10 would then be less probable under the selection-plus-drift hypothesis than it would be if the trait were subject to pure drift. This is why pure drift beats selection-plus-drift in the first column. The cells in Table 1.3 with O or U in them are harder to evaluate. Notice that these cells are of two types. If the population began with an initial state of 2 and the optimal value is 8, then the population has to overshoot this optimum if it is to exhibit a final state of 10. The second kind of case arises if the population begins with 2 and has 12 as its optimum; in this case, the population has to undershoot the optimum if it is to end up with a trait value of 10. These two harder cases, as well as the two easier cases already analyzed, exhaust the possibilities when selection is understood in terms of a monotonic fitness function. We can analyze all four cases at once by further investigating the implications of the two hypotheses. The dynamics of SPD are illustrated in Figure 1.1, adapted from Lande (1976). At the beginning of the process, at t0, the average phenotype in the t0 t1
Probability
_ w
t2
t3
t∞
opt. Average phenotype in the population
Figure 1.1 In the process of SPD, the population begins at t0 with a sharp value. As time passes, the mean of the distribution moves toward the optimum and the variance of the distribution increases.
Intelligent design is untestable 25 population has a sharp value. The state of the population at various later times is represented by different probability distributions. Notice that as the process unfolds, the mean value of the distribution moves in the direction of the optimum. The distribution also grows wider, reflecting the fact that the population’s average phenotype becomes more uncertain as more time elapses. The speed at which the population moves toward the distribution centered at the optimum depends on the trait’s heritability and on the strength of selection, which is represented in Figure 1.1 by the peakedness of the w-bar curve. The width of the different distributions depends on the effective population size and on the strength of selection; the larger the product of these two, the narrower the bell curve. In summary, SPD can be described as the shifting and squashing of a bell curve.11 In contrast, the process of PD involves just the squashing of a bell curve; evolution in this case leaves the mean value of the distribution unchanged, although uncertainty about the trait’s future state increases. In the limit of infinite time, the probability distribution is flat, indicating that all average phenotypes are equiprobable.12 I assume that the quantitative character cannot drop below 0 and that there is some upper bound on its value – e.g., that the lineage leading to present day polar bears cannot evolve fur that is more than, say, 100 centimeters. t0
Probability
t1
t2
t3
t∞ 0
100 Average phenotype in the population
Figure 1.2 In the process of PD, the population begins at t0 with a sharp value. As time passes, the mean of the distribution remains the same and the variance of the distribution increases.
26
Elliott Sober
We now are in a position to analyze when SPD will be more likely than PD. Figure 1.3a depicts the relevant distributions when there has been finite time since the lineage started evolving from its initial state (I). Notice that the PD distribution stays centered at I, whereas the SPD curve has moved in the direction of the putative optimum. Notice further that the PD curve has become more flattened than the SPD curve has; selection impedes spreading out. Figure 1.3b depicts the two distributions when there has been infinite time. The SPD curve is centered at the optimum while the PD curve is entirely flat. Whether finite or infinite time has elapsed, the likelihood analysis is the same: the SPD hypothesis is more likely than the PD hypothesis precisely when the population’s actual value is “close” to the optimum. Of course, what “close” means depends on how much time there has been between the lineage’s initial state and the present, on the intensity of selection (as measured by how peaked the w-bar function is in Figure 1.1), on the trait’s heritability, and on the effective population size. Time, intensity of selection, and heritability are relevant to predicting how much the mean of the SPD curve will be shifted in the direction of the optimum; effective population size is relevant to predicting how much variance there will be around that mean value. For example, if infinite time has elapsed (Figure 1.3b), the SPD curve will be more tightly centered on the optimum, the larger the population is. If 10 is the observed value of our polar bears, but 11 is the optimum, SPD will be more likely if the population is small, but the reverse will be true if the population is large. (a) Finite time I
(b) Infinite time O
I
O
SPD
Pr(x|–)
PD
SPD
PD 0
100 Average phenotype in present population
0
100 Average phenotype in present population
Figure 1.3 Which hypothesis, SPD or PD, confers the higher probability on the observed present phenotype? Whether the time between the initial state (I) of the population and the present observation is (a) finite or (b) infinite, SPD has the higher likelihood precisely when the present trait value is “close” to the optimum (O).
Intelligent design is untestable 27 In summary, if we want to test SPD against PD as possible explanations for why polar bears now have fur that is 10 centimeters long, we need to know what the optimal phenotype would be if the selection hypothesis were true. If the optimum (O) turns out to be 10, we’re done – SPD has the higher likelihood. However, if the optimum differs from 10, even a little, we need more information. If we can discover what the lineage’s initial state (I) was, and if this implies that the population evolved away from the optimum, we’re done – PD has the higher likelihood. But if our estimates of the values of I and O entail that there has been undershooting or overshooting, we need more information if we are to say which hypothesis is more likely. These four possibilities are summarized in Table 1.4. One surprise that emerges from this analysis is that undershooting is ambiguous – even if the population has evolved in the direction of the optimum, this, by itself, does not entail that SPD is more likely than PD. Earlier in this essay I chided the intelligent design theorist for simply assuming that the observed traits of organisms must be what the putative intelligent designer intended. The same epistemological point applies to the evolutionist. It does no good simply to assume that the observed fur length must be optimal because natural selection must have been the cause of the trait’s evolution. This assertion is question-begging. What one needs is independent evidence about this and the other auxiliary assumptions that are needed for the two hypotheses to generate testable predictions. I argued before that the assumptions that the design hypothesis needs are not independently attested. Is the evolutionist in a better situation than the creationist in this respect? Table 1.4 When a population evolves from its initial state I to its present state P, how will that trajectory be related to the putative optimal phenotype O specified by the hypothesis of SPD? There are four possibilities to consider. In two of them, the relationship of I, P, and O determines which hypothesis confers the greater probability on the observed present state P; in the other two – when the population overshoots or undershoots the putative optimum – more information is needed to say which hypothesis is more likely. Which hypothesis is more likely? (a) Present state coincides with the putative optimum
---ⱍ---------ⱍ--------------I ⇒P ⫽ O
selection-plus-drift
(b) Population evolves away from the putative optimum
---ⱍ---------ⱍ-----------ⱍ--P ⇐ I O
pure drift
(c) Population overshoots the putative optimum
---ⱍ---------ⱍ-----------ⱍ--I ⇒ O ⇒ P
?
(d) Population undershoots the putative optimum
---ⱍ---------ⱍ-----------ⱍ--I ⇒ P O
?
28
Elliott Sober
5 Independent evidence about the population’s earlier trait value and about the shape of the fitness function If present day polar bears all have fur that is 10 centimeters long, how are we to discover what the fitness consequences would be of having fur that is longer or shorter? The first and most obvious way to address this question is by doing an experiment. Let us dispatch a band of intrepid ecologists to the Arctic who will attach parkas to some polar bears, shave others, and leave others with their fur lengths unchanged. We then can monitor the survival and reproduction of these experimental subjects. This permits us to infer which fitness values attach to different fur lengths. There is a second approach to the problem of identifying the fitness function, one that is less direct and more theoretical. Suppose there is an energetic cost associated with growing fur. We know that the heat loss an organism experiences depends on the ratio of its surface area and its body weight. We also know that there is seasonal variation in temperature. Although it is bad to be too cold in winter, it is also bad to be too warm in summer. We also know something about the abundance of food. These and other considerations might allow us to construct a model that describes what the optimal fur length is; this model would not assume that the bear’s actual trait value is optimal or close to optimal. This type of engineering analysis has been developed for other traits in other organisms;13 there is no reason in principle why it can’t be carried out for the case at hand. Unfortunately, these two approaches face a problem. It is more obvious in connection with the experimental approach, but it attaches to both. The experiment, in the first instance, tells us about the fitness function that would be in place if there were variation in fur length among polar bears now. How is this relevant to our historical question concerning the processes that were at work as polar bears evolved? The same question attaches to the engineering approach, in that it uses assumptions about the other traits that polar bears have. For example, we probably will need information about the range of temperatures that exist in the bear’s environment and about the bear’s body mass and surface area. If we use data from current bears and their current environment, we need to consider whether these values provide good estimates of the values that were in place ancestrally. This leads to our second problem – how independent evidence about the lineage’s ancestral fur length might be obtained. Of course, we can’t jump in a time machine and go back and observe the characteristics present in the ancestors of polar bears. Does this mean that the lineage’s initial state is beyond the reach of evidence? We know that polar bears and other bears share common ancestors and we know this independently of our question about why polar bears have fur that is 10 centimeters long. We can use other characteristics – for example, ones that have no adaptive significance for the organisms that have them – to infer the genealogical relationships that connect polar bears to other bears; this allows us to specify a phylogenetic
Intelligent design is untestable 29 tree like the one depicted in Figure 1.4 in which polar bears and their relatives are tip species. We can then write down the fur lengths of polar bears and their near relatives on the tips of that tree. The character states we observe in these tip species provide evidence about the character states of the ancestors, represented by interior nodes. How might this inference from present to past be drawn? Before addressing that question, I want to describe why Figure 1.4 shows that our question about SPD versus PD needs to be spelled out in more detail. It is obvious that present day polar bears have multiple ancestors, each with their own trait values. If these were all known, the problem of explaining why polar bears now have fur that is 10 centimeters long would decompose into a number of sub-problems – why the fur length present at A5 evolved to the length present at A4, why A4’s fur length evolved to the value found at A3, etc. SPD may be a better answer than PD for some of these transitions, but the reverse might be true for others. Furthermore, it is perfectly possible that SPD is better supported than PD as an answer to the question “Why do polar bears have fur that is 10 centimeters long, given that their ancestor Ai had a fur length of f1?” but that the reverse is true for the question “Why do polar bears have fur that is 10 centimeters long, given that their ancestor Aj had a fur length of f2?” Now back to the problem of inferring the character states of ancestors. One standard method that biologists use is parsimony – we are to prefer the Polar bears 10
6
6
6
6
6
A1 A2
A3
A4
A5
Figure 1.4 Given the observed fur length of present day polar bears and their close relatives, what is the best estimate of the trait values of the ancestors A1, A2, . . ., A5? The most parsimonious hypothesis is that all of them had a trait value of 6.
30
Elliott Sober
assignment of states to ancestors that minimizes the total amount of evolution that must have occurred to produce the trait values we observe in tip species. This is why assigning the ancestors in Figure 1.4 a value of 6 is said to have greater credibility than assigning them a value of 10. But why should we use parsimony to draw this inference? Does the Law of Likelihood justify the Principle of Parsimony? If not, does the principle have some other justification? Or is it merely an unjustifiable prejudice that leads us to prefer hypotheses that are more parsimonious? These are large questions, which I won’t attempt to address in any detail here. However, a few points may be useful. First, it turns out that if drift is the process at work in a phylogenetic tree, then the most parsimonious assignment of trait values to ancestors (where parsimony means minimizing the squared amount of change) is also the hypothesis of maximum likelihood (Maddison 1991). On the other hand, if there is a directional selection process at work, parsimony and likelihood can fail to coincide (Sober 2002c). This point can be grasped by considering the problem depicted in Figure 1.5a. Two descendants have trait values of 10 and 6; our task is to infer the character state of their most recent common ancestor A. Notice that if there is very strong directional selection for increased fur length toward an optimum of, say, 20 in both lineages, then the setting of the ancestor that maximizes the probability that the descendants will obtain values of 10 and 6 will be something less than 6. The problem can be simplified even further, by considering just the single descendant and single ancestor depicted in Figure 1.5b. If the descendant has a trait value of 10, the most parsimonious assignment of character state to the ancestor is 10. But if the lineage has been undergoing strong selection for increasing its trait value, then the most likely assignment will be something 10
6
10
A⫽?
A⫽?
(a)
(b)
Figure 1.5 Two problems in which one has to estimate the character of an ancestor, based on the observed value of one or more descendants. In (a), the most parsimonious hypothesis is that A ⫽ 8; in (b), the most parsimonious hypothesis is that A ⫽ 10.
Intelligent design is untestable 31 less than 10. Imagine you have to swim across a river that has a very strong current. The way to maximize your probability of reaching a target on the other side is not to start directly across from it; rather, you should start a bit upstream. It follows that parsimony does not provide evidence about ancestral character states that is independent of the hypotheses of chance and selection that we wish to test.14 This problem concerning how ancestral fur length is to be inferred also is relevant to the question noted earlier about the fitness function – even if we can discover the fitness function that applies to polar bears now, why should we think that this function is the correct description of how selection would work in ancestral populations? Both problems have the same form – how are we to infer past from present without begging the question?15
6 A digression on dichotomous characters Perhaps the epistemological difficulties just described would disappear if we redefined the problem. Instead of asking why polar bears now have fur that is 10 centimeters long, why not ask why they have long fur rather than short? Isn’t it clear that polar bears are better off with long fur than they would be with short if those are the two choices? If so, long is the optimal fur length in this dichotomous character. Doesn’t this allow us to conclude without further ado that SPD has higher likelihood than PD? Apparently, you don’t need to know the ancestral fur length or other biological information to make this argument. It is interesting how often informal reasoning about natural selection focuses on dichotomous qualitative characters. Sociobiologists usually ask why human beings “avoid incest,” not why they avoid it to the degree they do. The adaptive hypothesis is that selection favors outbreeding over inbreeding. This hypothesis renders “incest avoidance” more probable than does the hypothesis that says that a pure drift process occurred. Of course, the problem gets harder if we estimate how much inbreeding there is in human populations and then ask whether that quantitative feature is more probable under the SPD hypothesis or the PD hypothesis. However, why can an adaptationist not admit that this quantitative problem is harder and still insist on the correctness of the simple likelihood argument advanced to solve the qualitative problem? There is a fly in the ointment. What does it mean to say that fur is “long” rather than “short?” Notice that there are three ranges of fur length in Figures 1.3a and 1.3b, not two. PD beats SPD in the first and third, while SPD beats PD in the second. No matter where the cut-off is drawn to separate “short” fur from “long,” it will do violence to this trichotomy. The fundamental result is that SPD is more likely than PD when the observed fur length is “close” to the optimum; fur lengths that are not close – that are too long or too short – render PD more likely than SPD. Long fur is not,
32
Elliott Sober
contrary to appearances, unambiguous evidence favoring the hypothesis of natural selection. The problem with imposing dichotomous descriptors (“long” versus “short”) on a quantitative character in the case of bear fur length seems to arise from the fact that fur length has an intermediate optimum. But why should this problem arise in connection with a feature like incest avoidance, where it’s true (let’s assume) that the less of it the better? In this case, there will be two regions of parameter space, not three. If human beings have a rate of incest that is “close” to zero, then SPD is likelier than PD; otherwise, the reverse is true. The problem is to say how close is close enough. How much incest is consistent with saying that human beings “avoid incest?” Of course, if human beings have a zero rate of incest, we’re done – SPD is more likely than PD. But if the rate is nonzero, it is unclear how to classify the observation, and so it is unclear whether SPD is more likely. We need further biological information to answer this question. Moving to a dichotomous description of the problem doesn’t change that fact.
7 From polar bear fur to the vertebrate eye It may strike the reader that the example I have been considering – fur length in polar bears – is rather simple and therefore differs in important respects from the problem of testing adaptive hypotheses about a complex structure like the vertebrate eye. In fact, I’m not so sure that bear fur length really is so simple. But even if it is, I want to explain how the problems just adumbrated apply to the vertebrate eye. There are nine or ten basic eye designs found in animals, with many variations on those themes. Let me describe this variation very crudely as follows: vertebrates, squid, and spiders have the camera eye, most insects have compound eyes (but so do many shallow water crustacea), the Nautilus has a pinhole eye, the clam Pectem and the crustacean Gigantocypris have mirror eyes, and flatworms, limpets, and bivalve molluscs have cup eyes. When these features are placed at the tips of an independently inferred phylogenetic tree, we find that these and other basic designs evolved somewhere between 40 and 65 times in different lineages (Salvini-Plaven and Mayr 1977; Nilsson 1989). How are we to decide whether these features favor the hypothesis of selection over the hypothesis of chance? It may seem absurd in the extreme to suppose that drift could produce such complex adaptations. I agree. However, what needs to be determined is whether this gut feeling is borne out by evidence. As we saw in connection with fur length in polar bears, whether selection is more likely than drift depends on the shape of the fitness function and on the character states of ancestors. We can’t simply assume that the trait value a tip species exhibits must be the one that is optimal for it to have; that would be question-begging. Rather, we need independent evidence concerning which structure is best in which lineage. In the case of fur length in polar bears, we considered a simple experiment
Intelligent design is untestable 33 that could provide information about the fitness function. Is there a similar experiment for the case of eye design? Present technology makes this unfeasible. Although it is easy enough to remove or diminish the efficiency of whatever light-sensitive apparatus an organism possesses, it is harder to augment those devices or to substitute one complex structure for another. On the other hand, there is considerable information available concerning the optical properties of different eye designs, though a great deal remains to be learned. Nilsson (1989: 302) agrees with Land’s (1984) contention that “if the Nautilus had a camera-type eye of the same size, it would be 400 times more sensitive and have 100 times better resolution than its current pinhole eye.” He has similar praise for vertebrate eyes as compared to compound eyes: “if the human eye was scaled down 20 times to the size of a locust eye, image resolution would still be an order of magnitude better than that of the locust eye. Diffraction thus makes the compound eye with its many small lenses inherently inferior to a single-lens eye” (Nilsson 1989: 306). If the camera eye is fitter than the compound eye, why isn’t the camera eye more widely distributed? Spiders and squid are as lucky as we are, but bees, the Nautilus, and clams are not. Why not? Nilsson’s suggestion is that compound eyes are trapped on a local adaptive peak. He agrees with Salvini-Plaven and Mayr (1977) that: . . . at an early stage of evolution, the simple eye would be just a single pigment cup with many receptors inside . . ., whereas the compound eye would start as multiple pigment cups with only a few receptors in each. . . . At this low degree of sophistication, neither of the two designs stands out as better than the other. It is only later, when optimized optics have been added, that the differences will become significant. But then there is no return, and the differences remain conserved. (Nilsson 1989: 306) Nilsson is arguing that selection involves a nonmonotonic fitness function and that the two lineages began evolving in the zones of attraction of different adaptive peaks. All of the questions discussed earlier about polar bear fur apply to this problem. We observe that vertebrates have the camera eye. To test SPD against PD as possible explanations of this trait, we’d need to know which eye design is optimal for vertebrates. Even if an engineering analysis assures us that the camera eye is fitter than the compound eye, we still need to ask whether the camera eye is better than all the alternatives that were available ancestrally. We can’t assume that the only variants are ones that are present now; some variants may have been lost in the evolutionary process. If we are able to show that the camera eye is the optimal eye for vertebrates, we’re done – we know that SPD is the more likely hypothesis. But if it is not, we next need to discover the eye design that was present ancestrally. We noted earlier the problems that attach to using parsimony to make this inference.
34
Elliott Sober
If these could be surmounted, we’d next need to ask whether the lineage evolved toward or away from the putative optimum. If away, we’re done – PD is likelier. However, if the lineage overshot or undershot, we need more biological details. In working through this protocol, it is important to realize that “the eye” is not a simple trait, but a complex assemblage of traits. We need to investigate each of them, recognizing that the best explanation for one trait may not be the same as the best explanation of another. Furthermore, we need information about which trait can evolve from which others. In the bear fur problem, we were dealing with a quantitative characteristic, so it seemed entirely natural to assume that a population can evolve from 3.15 centimeters to 3.14 and to 3.16, and that to evolve from 3 to 5 the population would pass through 4. However, it is much less obvious how different eye designs are related to each other – if a lineage is going to evolve from a cup eye to a camera eye, what are the intervening stages through which it must pass? Notice that the questions that arise in the likelihood analysis I have described are very different from the ones that creationists often think are telling. First and most obviously, creationists think that the competition is between natural selection and intelligent design. However, the questions I have described are articulated within evolutionary theory. Drift is just as much an evolutionary process as selection. I have left intelligent design out of the picture because it makes no testable predictions. A second difference is that I am not asking whether it was possible for natural selection to produce sophisticated equipment like the camera eye. I assume that the probability of this trait’s evolving is nonzero according to both the SPD and the PD hypotheses. The point is to determine which hypothesis has the higher likelihood. A third difference is that I am not focusing exclusively on “complex” traits; simple eyes (and the absence of eyes) need to be explained just as much as eyes that are more complex. If complex eyes are so wonderful, why aren’t they universal? Of course, it is possible that the camera eye is not universally optimal; perhaps it is the best design for some ways of making a living, but not for others. What needs to be explained is the full distribution of eye designs, not just the ones that happen to strike us as especially sophisticated. It is an interesting consequence of this analysis that “complex adaptations” are not automatically better explained by SPD than they are by PD. A complex adaptation that is not optimal may be sufficiently far from the optimum that the PD hypothesis has higher likelihood. Evolutionists who find it obvious that complex adaptations must have been produced by natural selection will not like this argument, but as far as I can see it is correct. As soon as a trait – even a “complex” trait – departs from the optimum, even a little, further biological information is needed if we wish to discriminate between the SPD and PD hypotheses.
Intelligent design is untestable 35
8 Two ways out of the impasse The analysis I gave of the polar bear fur problem suggests that we need to know a lot to test the SPD against the PD hypothesis. If this needed information is inaccessible, the evolutionary problem seems to fall prey to the same difficulty that I claim undermines the design argument. I believe there are two ways out of this impasse. The first involves doing a sensitivity analysis. For example, even if we can’t infer the character state of ancestors in a way that is independent of the hypotheses we wish to test, we still may be able to consider value ranges for the ancestral trait value and see how these affect the likelihood analysis. If the drift hypothesis tells us that the best estimate of the ancestral character state is 6, while the selection hypothesis says that it is, say, 3, then we can try to compare the likelihoods of SPD and PD by assuming that the ancestor had fur that was between 3 and 6 centimeters long (or that the range of possibilities was even wider). Uncertainty about the heritability of fur length, the length of time the lineage has been evolving, etc. can be addressed in the same way. The task is then to determine in which regions of parameter space SPD has higher likelihood than PD and in which regions the reverse is true. It may turn out that the higher likelihood of SPD is robust over considerable variation in these auxiliary assumptions. A second way to address the problem is widely used in biology; its rationale is discussed more fully in Sober and Orzack (2003). Suppose we know that the optimal fur length for bears living in colder climates is greater than the optimal fur length for bears living in warmer climates, even though we are unable to say what the optimal point value is for any organism in any environment. If we then observe that bears in colder climates tend to have longer fur than bears in warmer climates, this correlation counts as evidence in favor of the SPD hypothesis.16 What is confirmed here is not the hypothesis that selection has given organisms optimal trait values, nor even that selection has provided them with fur lengths that are close to optimal; rather the favored hypothesis says that selection has caused trait values to evolve in the direction of their (unknown) optima. The observed trend in trait values is more probable on the hypothesis of natural selection than it is on the hypothesis of chance. This is because the latter hypothesis predicts that fur length and ambient temperature should be independent. Notice that this solution to the problem involves changing the kind of data we seek to explain. We started out thinking that the task is to explain why polar bears have fur that is 10 centimeters long. We now have shifted to the problem of explaining why bears in cold climates tend to have longer fur than bears in warmer climates. This permits us to finesse the problem of estimating the character state of ancestors, the optimal trait value, and other biological parameters.17
36
Elliott Sober
9 Conclusion Modus tollens is a bad model for testing the design hypothesis. This is something that Arbuthnot and Paley and other defenders of the design hypothesis realized, even if their modern-day epigones do not. A better model is likelihood – hypotheses confer different probabilities on the observations, and weight of evidence can be assessed by comparing the degree to which different hypotheses probabilify the observations. This is where a Duhemian point becomes relevant.18 The hypotheses we wish to test do not, by themselves, confer probabilities on the observations; they do so only when auxiliary assumptions are supplied. However, we can’t merely invent auxiliary assumptions; rather, we need to find auxiliary assumptions that are independently attested. At this point the design argument runs into a wall. Paley reasoned well when he considered the watch found on the heath, but when he argued for intelligent design as an explanation of organic adaptation, he helped himself to assumptions that were not supported by evidence. The evolutionary hypothesis of natural selection encounters the same logical challenge. A characteristic observed in a present day species might be explained by the hypothesis of natural selection, or by the hypothesis of drift, or by many other hypotheses that evolutionary theory allows us to construct. To decide whether selection makes the observations more probable than the chance hypothesis does, we need further information about how selection would proceed if it were the process at work. To the degree that evolutionary theory can provide good estimates of relevant biological quantities, it avoids the fatal flaw that attaches to creationism. But if it cannot, evolutionary questions may become more tractable by shifting to a comparative framework. Instead of seeking to explain why a single species or group has some single trait value, we might set ourselves the task of explaining a pattern of variation. Getting hypotheses to make contact with comparative data requires fewer assumptions.
Notes 1 My thanks to James Crow, Carter Denniston, John Gillespie, Peter GodfreySmith, Alvin Goldman, Russell Lande, Richard Lewontin, Steve Orzack, Dmitri Petrov, Larry Shapiro, and Stephen Stich for useful discussion. I am also grateful to the National Science Foundation (Grant SES-9906997) for financial support. 2 It follows from Bayes’ theorem that Pr(Design ⱍ Data) ⬎ Pr(Chance ⱍ Data) if and only if Pr(Data ⱍ Design)Pr(Design) ⬎ Pr(Data ⱍ Chance)Pr(Chance). 3 It is the goal of Dembski’s (1998) reconstruction of the design argument to show that all alternatives to design can be rejected, and the hypothesis of design left standing, without the design hypothesis’ having to make any predictions at all. For criticisms, see Fitelson et al. (1999). 4 Being made of metal and glass is just an example of the characteristics that the watch possesses. The same points would apply if we considered, instead, the watch’s ability to measure equal intervals of time, or the fact that it would probably not be able to do this if its internal assembly were changed at random. 5 I omit mention in this table of the raw materials the putative designer would
Intelligent design is untestable 37
6 7 8 9
10 11
12
13 14 15
16 17
have available; this constitutes a third dimension. Unfortunately, the piece of paper before you is flat. For a more detailed treatment of the likelihood approach to the design argument, see Sober (2003). I do not claim that this must be a permanent feature of the hypothesis that organisms have their adaptive contrivances because of intelligent design. Perhaps the epistemic situation will change. Except, of course, when the population has its minimum or maximum value; there is no way to have fur that is less than 0 centimeters long. It is not inevitable that a fitness function should be singly peaked. In addition, I’ll help myself to the simplifying assumption that fitnesses are frequency independent – e.g., whether it is better for a bear to have fur that is 9 centimeters long or 8 does not depend on how common or rare these traits are in the population. We might add to this the idea that the intensity of selection is greater, the greater the population’s distance from the optimum; this idea is often modeled as an Ornstein–Uhlenbeck (“rubber band”) process. I have conceptualized the SPD hypothesis as specifying an optimum that remains unchanged during the lineage’s evolution. If selection were understood in terms of an optimum that itself evolves, the problem would be more complicated. The case of infinite time makes it easy to see why an explicitly genetic model can generate predictions that radically differ from the purely phenotypic model considered here. Under the process of random genetic drift, each locus is homozygotic at equilibrium. In a one-locus two-allele model in which the population begins with each allele at 50 percent, there is a 0.5 probability that the population will be AA and a 0.5 probability that it will be aa. In a two-locus model, again with each allele at equal frequency at the start, each of the four configurations has a 0.25 probability – AABB, AAbb, aaBB, and aabb. Imagine that genotype determines phenotype (or that each genotype has associated with it a different average phenotypic value) and it becomes obvious that a genetic model can predict nonuniform phenotypic distributions at equilibrium. The model of SPD is the same in this regard; there are genetic models that will alter the picture of how the average phenotype will evolve. See Turelli (1988) for further discussion. See, for example, Parker (1978) on dung fly copulation time and Hamilton (1967) on sex ratio. For more general discussion, see Alexander (1996). This is the central problem with the protocol for testing adaptive hypotheses proposed by Ridley (1983); see Sober (2002c) for further discussion. The discovery of fossils is not a solution to this problem. Even if fur length could be inferred from a fossil find, it is important to remember that we can’t assume that the fossils are ancestors of present day polar bears. They may simply be relatives. If so, the question remains the same – how is one to use these data to infer the character states of the most recent common ancestor that present day polar bears and this fossil share? The fact that the fossil is closer to the most recent common ancestor than is an organism that is alive today means that the fossil will provide stronger evidence. But the question of how unobserved cause is to be inferred from observed effect still must be faced. A proper test would have to control for the fact that the species in question are genealogically related, and so their trait values may fail to be independent; see Orzack and Sober (2001) for discussion. If evolutionists get to “change the subject” (by seeking to explain cross-species correlations rather than the single trait value found in a single group), why can’t
38
Elliott Sober
intelligent design theorists do the same thing? They can, but the change does not get them out of the hot water described earlier. What is the probability that an intelligent designer would give bears in colder climates longer fur than bears in warmer climates? That still depends on the goals and abilities of the putative designer. 18 I say that the point is Duhemian, rather than Duhem’s, because Duhem (1914) was thinking about deducing observational predictions, not probabilifying them; Quine (1953) uses a deductivist formulation as well. Still, the same logical point applies, though it does not have the holistic epistemological consequences that Duhem and Quine claimed (Sober 2000, 2004).
Bibliography Alexander, R.M. (1996) Optima for Animals, Princeton, NJ: Princeton University Press. Arbuthnot, J. (1710) “An Argument for Divine Providence, Taken from the Constant Regularity Observ’d in the Births of Both Sexes,” Philosophical Transactions of the Royal Society of London, 27: 186–190. Darwin, C. (1872) The Descent of Man and Selection in Relation to Sex, London: Murray. Dembski, W. (1998) The Design Inference, New York: Cambridge University Press. Duhem, P. (1914) The Aim and Structure of Physical Theory, Princeton, NJ: Princeton University Press. Edwards, A. (1972) Likelihood, Cambridge: Cambridge University Press. Fisher, R. (1925) Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd. Fisher, R. (1930; 2nd edn 1957) The Genetical Theory of Natural Selection, New York: Dover. Fitelson, B., Stephens, C., and Sober, E. (1999) “How Not to Detect Design – Critical Notice of W. Dembski’s The Design Inference,” Philosophy of Science, 66: 472–488. Hacking, I. (1965) The Logic of Statistical Inference, Cambridge: Cambridge University Press. Hamilton, W.D. (1967) “Extraordinary Sex Ratios,” Science, 156: 477–488. Land, M. (1984) “Molluscs,” in M. Ali (ed.), Photoreception and Vision in Invertebrates, New York: Plenum, 699–725. Lande, R. (1976) “Natural Selection and Random Genetic Drift in Phenotypic Evolution,” Evolution, 30: 314–334. Maddison, W. (1991) “Squared-Change Parsimony Reconstructions of Ancestral States for Continuous-Valued Characters on a Phylogenetic Tree,” Systematic Zoology, 40: 304–314. Nilsson, D. (1989) “Vision Optics and Evolution,” Bioscience, 39: 298–307. Orzack, S. and Sober, E. (2001) “Adaptation, Phylogenetic Inertia, and the Method of Controlled Comparisons,” in S. Orzack and E. Sober (eds), Adaptationism and Optimality, Cambridge: Cambridge University Press, 45–63. Paley, W. (1802) Natural Theology, or, Evidences of the Existence and Attributes of the Deity, Collected from the Appearances of Nature, London: Rivington. Parker, G. (1978) “Search for Mates,” in J. Krebs and N. Davies (eds), Behavioral Ecology – An Evolutionary Approach, Oxford: Blackwell.
Intelligent design is untestable 39 Quine, W. (1953) “Two Dogmas of Empiricism,” in From a Logical Point of View, Cambridge, MA: Harvard University Press, 20–46. Ridley, M. (1983) The Explanation of Organic Diversity, Oxford: Oxford University Press. Royall, R. (1997) Statistical Evidence – a Likelihood Paradigm, London: Chapman and Hall. Salvini-Plaven, L. and Mayr, E. (1977) “On the Evolution of Photoreceptors and Eyes,” in M. Hecht, W. Sterre, and B. Wallace (eds), Evolutionary Biology, vol. 10, New York: Plenum, 207–263. Sober, E. (2000) “Quine’s Two Dogmas,” Proceedings of the Aristotlean Society, Supplementary Volume 74: 237–280. Sober, E. (2002a) “Bayesianism – Its Scope and Limits,” in R. Swinburne (ed.), Bayesianism, Proceedings of the British Academy, vol. 113, Oxford: Oxford University Press, 21–38. Sober, E. (2002b) “Intelligent Design and Probability Reasoning,” International Journal for the Philosophy of Religion, 52: 65–80. Sober, E. (2002c) “Reconstructing Ancestral Character States – A Likelihood Perspective on Cladistic Parsimony,” The Monist, 85: 156–176. Sober, E. (2003) “The Design Argument,” in W. Mann (ed.), The Blackwell Guide to Philosophy of Religion, Oxford: Blackwell. Sober, E. (2004) “Likelihood, Model Selection, and the Duhem–Quine Problem,” The Journal of Philosophy, 101 (5): 1–22. Sober, E. and Orzack, S. (2003) “Common Ancestry and Natural Selection,” British Journal for the Philosophy of Science, 54: 423–437. Stigler, S. (1986) The History of Statistics, Cambridge, MA: Harvard University Press. Turelli, M. (1988) “Population Genetic Models for Polygenic Variation and Evolution,” in B. Weir, E. Eisen, M. Goodman, and G. Namkoong (eds), Proceedings of the Second International Conference on Quantitative Genetics, Sunderland, MA: Sinauer, 601–618.
2
Social learning and the Baldwin effect David Papineau
1 Introduction The Baldwin effect occurs, if it ever does, when a biological trait becomes innate as a result of first being learned. Suppose that some trait is initially absent from a population of organisms. Then a number of organisms succeed in learning the trait. There will be a Baldwin effect if this period of learning leads to the trait becoming innate throughout the population. Put like that, it sounds like Lamarckism. But that is not the idea. When James Mark Baldwin and others first posited the Baldwin effect over a hundred years ago, their concern was precisely to uncover a respectable Darwinian mechanism for the Baldwin effect.1 The great German cytologist Augustus Weismann had already persuaded them that there is no automatic genetic inheritance of acquired characteristics: the ontogenetic acquisition of a phenotypic trait cannot in itself alter the genetic material of the lineage that has acquired it. The thought behind the Baldwin effect is in effect that an alternative Darwinian mechanism might nevertheless mimic Lamarckism, in allowing learning to influence genetic evolution, but without requiring Lamarck’s own discredited hypothesis that learning directly affects the genome. Why should we be interested in the possibility of Baldwin effects? One reason the topic attracts attention is no doubt that it seems to soften the blind randomness of natural selection, by allowing the creative powers of mind to make a difference. Still, there are other good reasons for being interested in the Baldwin effect, apart from wanting some higher power to direct the course of evolution. Consider the many innate behavioural traits whose complexity makes it difficult to see how they can be accounted for by normal natural selection. I have in mind here innate traits that depend on a number of components that are of no obvious advantage on their own. For example, woodpecker finches in the Galapagos Islands use twigs or cactus spines to probe for grubs in tree branches. This behaviour is largely innate (Tebbich et al. 2001; Bateson 2001). It also involves a number of different behavioural dispositions – finding possible tools, fashioning them if necessary, grasping them in the
Social learning and the Baldwin effect 41 beak, using them to probe at appropriate sites – none of which would be any use by itself. For example, there is no advantage in grasping tools if you are not disposed to probe with them, and no advantage to being disposed to probe with tools if you never grasp them. Now, insofar as the overall behaviour is innate, these different behavioural components will presumably depend on various independently inheritable genes. However, this then makes it very hard to see how the overall behaviour can possibly be selected for. In order for the behaviour to be advantageous, all the components have to be in place. But this will require that all the relevant genes be present together. However, if these are initially rare, it would seem astronomically unlikely that they would ever co-occur in one individual. And, even if they did, they would quickly be split up by sexual reproduction. So the relevant genes, taken singly, would seem to have no selective advantage that would enable them to be favoured by natural selection. But now add in the Baldwin effect. This now promises a way to overcome the selective barrier. We need only suppose that some individuals are occasionally able to acquire the behaviour using some kind of general learning mechanism. If they can succeed in this, then the Baldwin effect can kick in, and explain how the behaviour becomes innate. Thus behaviours whose selection seems mysterious from the point of view of orthodox natural selection can become explicable with the help of the Baldwin effect. But I am getting ahead of myself. This last suggestion assumes that the Baldwin effect is real, and that has yet to be shown. In the rest of this paper I shall explore possible mechanisms for the Baldwin effect, and consider whether they may be of any biological significance. My general verdict will be positive. I shall aim to show how there are indeed mechanisms which can give rise to Baldwin effects, and moreover that there is some reason to think that such effects have mattered to the course of evolution. I became interested in the Baldwin effect because it has always seemed to me obvious that there is at least one kind of case where it operates – namely, with the social learning of complex behavioural traits. It will be helpful to consider this in broad outline before we get caught up in analytic details. Suppose some complex behavioural trait P is socially learnt – individuals learn P from others, where they have no real chance of figuring it out for themselves. This will then create selection pressures for genes that make individuals better at socially acquiring P. But these genes would not have any selective advantage without the prior culture of P, since that culture is in practise necessary for any individual to learn P. After all, there will not be any advantage to a gene that makes you better at learning P from others, if there are not any others to learn P from. So this then looks like a Baldwin effect: genes for P are selected precisely because P was previously acquired via social learning. By way of an example, consider the woodpecker finches again, and suppose that there was a time when their tool-using behaviour was not innate but socially learned.2 That is, young woodpecker finches would learn
42 David Papineau how to use tools from their parents and other adepts. Now, this socially transmitted culture of tool use would give a selective advantage to genes that made young finches better at learning the trick. For example, it would have created pressure for a gene that disposed finches to grab suitable tools if they saw them, since this would give them a head start in learning the rest of the grub-catching behaviour from their elders. But this gene would not have been advantageous on its own, in the absence of the tool-using culture, since even finches with that gene would not have been able to learn the rest of the tool-using behaviour, without anyone to teach them. In what follows I shall be particularly interested in cases of this kind – that is, cases where social learning gives rise to Baldwin effects. From the beginning, theorists have often mentioned social learning in connection with the Baldwin effect, but without pausing to analyse its special significance. (For an early example, see Baldwin himself 1896; for a recent one, see Watkins 1999.) I shall offer a detailed explanation of the connection between social learning and Baldwin effects. As we shall see, there are two main biological mechanisms that can give rise to Baldwin effects – namely, ‘genetic assimilation’ and ‘niche construction’. Social learning has a special connection with the Baldwin effect because it is prone to trigger both of these mechanisms. When we have social learning, then we are likely to find cases where niche construction and genetic assimilation push in the same direction, and thus create powerful biological pressures. Much recent literature argues that, while there are indeed biological processes that fit the specifications of the Baldwin effect, it is a mistake to highlight the Baldwin effect itself as some theoretically significant biological mechanism (Downes 2003; Griffiths 2003). Rather, Baldwin-type examples are simply special cases of more general biological processes. In particular, they are special cases of either genetic assimilation or niche construction. This is a perfectly reasonable point. As we shall see, genetic assimilation and niche construction are the two main sources of Baldwin effects, and both of these processes are of more general significance, in that they do not only operate in cases where a learned behaviour comes to be innate, but in a wider range of cases, many of which may involve neither learning nor behaviour. Still, if we focus on the social learning cases I am interested in, then the Baldwin effect re-emerges as a theoretically important category. These cases are important, as I said, precisely because they combine both niche construction and genetic assimilation. This combination gives rise to particularly powerful biological pressures, and for this reason is worth highlighting theoretically. Moreover, this combination of pressures arises specifically when a socially learned behaviour leads to its own innateness, and is not found more generally. So the Baldwin effect turns out to be theoretically significant after all.
Social learning and the Baldwin effect 43
2 Preliminaries: genetic control, innateness, and social learning Before proceeding to analysis of the Baldwin effect itself, it will be helpful to clarify various preliminary issues. In this section I shall first discuss the selective advantages and disadvantages of having behavioural traits controlled by genes rather than learning, and then explain what I mean by ‘innate’ and ‘social learning’ respectively in the context of the Baldwin effect. 2.1 Genetic control versus learning In the woodpecker finch example above, I took it for granted that it would be selectively advantageous for the relevant behaviour to depend on genes rather than learning. Since this assumption is generally required for the Baldwin effect, and since it is by no means always guaranteed to be true, it will be useful briefly to discuss the conditions under which it will be satisfied. It might seem unlikely that there will ever be any selective advantage to bringing some trait P under genetic control, given that it can be learned anyway. If some adaptive P is going to be acquired by learning in any case, what extra advantage derives from its genetic determination? Well, one response is that P will not always be acquired in any case, if it is not genetically fixed. Learning is hostage to the quirks of individual history, and a given individual may fail to experience the environments required to instil some learned trait. Moreover, even if the relevant environments are reliably available, the business of learning P may itself involve immediate biological costs, diverting resources from other activities, and delaying the time at which P becomes available. These obvious advantages to genetic fixity – reliability and cheapness of acquisition – can exert a greater or lesser selective pressure, depending on how far genetic fixity outscores learning in these respects. On the other side, however, must be placed the loss of flexibility that genetic fixity may entail. Learning will normally be adaptive across a range of environments, in each case producing a phenotype that is advantageous in the current environment. Thus, if the environment were to vary in such a way as to make P maladaptive, an organism with genes that fix P may well be less fit than one which relies on learning, since the latter would not be stuck with P, and may instead be able to acquire some alternative phenotype adapted to the new environment. As a general rule, then, we can expect that genetic fixity will be favoured when there is long-term environmental stability, and that learning will be selected for when there are variable environments. Given environmental stability, genetic fixity will have the aforementioned advantages of reliable and cheap acquisition. But these advantages can easily be outweighed by loss of flexibility when there is significant environmental instability.
44
David Papineau
In thinking about these issues, it is helpful to think of the relevant behaviours as initially open to shaping by some repertoire of relatively general learning mechanisms (perhaps including classical and instrumental conditioning, plus various modes of social learning). The question is then whether the behavioural trait in question should be switched, so to speak, from the control of those general learning mechanisms to direct genetic control. However, perhaps it should not be taken for granted that the general learning repertoire will itself be unaffected by such switching. Maybe bringing one behavioural trait under genetic control will make an organism less efficient at learning other behavioural traits (Godfrey-Smith 2003). For example, the woodpecker finches may be less able to learn other ways of feeding, once their tool-using behaviour becomes genetically rigid. If so, this too will need to be factored in when assessing the selective gains and losses of bringing some behaviour under genetic control. Exactly how the pluses and minuses of genetic control versus learning work out will depend on the parameters of particular cases.3 Still, I hope it is clear enough that there will be some cases where genetic fixity will have the overall biological advantage, even if there are other cases where learning will be biologically preferable.4 So from now on I shall assume we are dealing with examples where the selective advantages of genetic control does outweigh the costs, since it is specifically these cases that create the possibility of Baldwin effects. 2.2 ‘Innate’ So far I have been proceeding as if there were a clear distinction between ‘innate’ and ‘acquired’ traits. However, I do not think that this distinction is at all clear-cut. No definite meaning attaches to the notion of an ‘innate trait’, once we move away from the genome itself to any kind of phenotypic trait, since nothing outside the genome is determined by the genes alone (even the appearance of basic organs can be disrupted by non-standard environments). True, there are a number of other criteria which are widely taken to constitute ‘innateness’, such as presence at birth, universality through the species, being a product of natural selection, and high developmental insensitivity to environmental variation. However, these criteria all dissociate in both directions in real-life cases. Because of this, the notion of innateness can be a source of great confusion. If you ask me, far more harm than good results from unthinking deployment of this notion (Griffiths 2002). Even so, it will be convenient for the purposes of this essay to continue to talk about traits that are at one time ‘acquired’ later becoming ‘innate’. When I use this terminology, I should be understood in terms of the last criterion mentioned above, that is, high developmental insensitivity to environmental variation. I shall take a trait to be innate to the extent that it has a ‘flat norm of reaction’, that is, to the extent that it reliably occurs
Social learning and the Baldwin effect 45 across a wide range of developmental contexts. Note that it follows from this criterion that a trait will not be innate to the extent it is ‘learned’, given that learning can be understood as a mapping from different developmental environments to different phenotypes.5 Given this understanding of innateness, then, innateness comes out as a matter of degree: as observed above, no non-genomic traits have a completely flat norm of reaction, in the sense of developing in all environments; at most, we will find that some traits are less sensitive to environmental variation than others. This does not worry me. A comparative notion of innateness will be perfectly adequate for the purposes of this essay. It will be interesting enough if we find Baldwin effects where the prior learning of certain traits leads to the selection of new genes that make traits less sensitive to environmental variation, rather than absolutely insensitive. Talk about traits becoming innate should be understood in this comparative way from now on. In this connection, it may be helpful to think of behavioural traits in terms of neural pathways in the brain. The trait will be present when appropriate sensory inputs trigger relevant motor outputs. Some genomes may leave a large ‘gap’ between sensory and motor pathways, in which case general learning mechanisms will have plenty of work to do in closing them. Other genomes may only leave a small such gap, one that can be closed with a minimum of environmental input. However, general evolutionary considerations suggest that it will be unusual to find no gap at all. (Why bother with genes that close the gap entirely, once it is so small that nearly all normal environments will bridge it? In this connection, note that even the highly innate tool use of the Galapagos woodpecker finches still require a modicum of individual trial-and-error learning during a short critical period (Tebbich et al. 2001).) 2.3 ‘Social learning’ I shall use the term ‘social learning’ to cover all processes by which the display of some behaviour by one member of a species increases the probability that other members will perform that behaviour. This covers a number of different mechanisms, but I intend my analysis of social learning and Baldwin effects to apply to them all. Thus we can distinguish (Shettleworth 1998; Tomasello 2000): iii Stimulus enhancement. Here one animal’s doing P merely increases the likelihood that other animals’ behaviour will become conditioned to relevant stimuli via individual learning. For example, animals follow each other around – novices will thus be led by adepts to sites where certain behaviours are possible (pecking into milk bottles, say, or washing sand off potatoes) and so be more likely to acquire those behaviours by individual trial-and-error.
46
David Papineau
iii Goal emulation. Here animals will learn from others that certain resources are available, and then use their own devices to achieve them. Thus they might learn from others that there are ants under stones, or berries in certain trees. iii Blind mimicry. Here animals copy the movements displayed by others, but without appreciating to what end these movements are a means. While it is possible that some non-human animals can do this, it seems to be a relatively high-level ability. iv Learning about means to ends. Here animals grasp that some conspecific’s behaviour is a means to some end, and copy it when they want that end. There is little evidence that non-human animals can do this (but see Akins and Zentall 1998). These processes differ in significant ways. For example, (i) and (ii), unlike (iii) and (iv), do not lend themselves to cumulative culture, since any technical sophistication developed by one individual will not be passed on to the others who duplicate their behaviour (Boyd and Richerson 1996; Tomasello 2000). Again, (iii) – blind mimicry – but not the other modes of social learning, is highly sensitive to which individuals are taken as models, since in this case there is no further mechanism to ensure that only adaptive behaviours are copied. Differences like these may well interact interestingly with the Baldwin effect. However, I shall not pursue these complexities in what follows (though see note 13 below). I shall simply assume the general definition of social learning given above, and stick to points that apply to all its species.
3 Why does learning matter? In an extremely illuminating article on the Baldwin effect, Peter GodfreySmith schematizes the structure of the Effect roughly as follows: Stage 0: The environment changes so as to make phenotype P adaptive. Stage 1: Some organisms learn P and prosper accordingly. Stage 2: There is selection of genes that make P innate. (Godfrey-Smith 2003) Given this schematization6 Godfrey-Smith then raises the obvious question about the Baldwin effect. Why should going through Stage 1 be crucial to reaching Stage 2? Why do we need any learning of P en route to the selection of genes for P? After all, Stage 0 already ensures that organisms with P will survive better, and thus on its own would seem to guarantee that genes for P will have a selective advantage, whether or not there is any intermediate learning. So will Stage 2 – selection of genes for P – not be triggered immediately by Stage 0 – phenotype P becomes adaptive, without any necessity of a detour through Stage 1?
Social learning and the Baldwin effect 47 This worry is widely taken to show the Baldwin effect is a chimera. John Watkins (1999), for example, has recently argued that the Baldwin effect is impossible on precisely these grounds. Watkins allows that any organisms that do acquire P by learning will on that account be more likely to survive and pass on their genes. However, he points out, there is no reason to suppose that those organisms are especially likely to have the genes that make P innate, and so their survival by learning P would seem to contribute nothing to the selection of those genes. However, Watkins is too quick to dismiss the possibility of Baldwin effects. Despite his argument, there are various special cases where the selection of genes for P may indeed depend on P previously being learned. Following Godfrey-Smith, I shall consider three possible such cases, which I shall call ‘Breathing Spaces’, ‘Niche Construction’, and ‘Genetic Assimilation’ respectively. The first suggestion – breathing spaces – is simply the idea that populations of organisms may not survive long enough to allow the selection of the genes for P, if they are not able to learn P in the interim. This seems to have been Baldwin’s own thought.7 Some environmental changes may be so drastic that the populations which undergo them will face extinction if they cannot adapt quickly. In the face of such drastic environmental changes, learning may allow a significant number of organisms to acquire the necessary adaptive trait P, at a time where genes determining P are still rare. This would then allow the population to stay around long enough for natural selection to drive the genes for P to fixity. It is doubtful whether breathing spaces are of any real biological significance. Few environmental changes seem likely to fit its requirements. There are certainly plenty of environmental changes that destroy whole populations – the impact of an asteroid, the commercial destruction of a rain forest – but these are not the kind of catastrophes that can be averted by learning new adaptive tricks. Conversely, environmental changes that are gradual enough to allow organisms to learn new tricks – climatic shifts, say, or the immigration of a new predator – will rarely be so urgent that the whole population would be under threat of complete extinction without the tricks, in which case there will be time for genetic selection to operate even without a learning stage. Considerations such as these lead Godfrey-Smith to dismiss breathing spaces as of dubious importance, and I agree with him. I shall say no more about breathing spaces in this essay. That leaves niche construction and genetic assimilation. With niche construction, the idea is that Stage 1 alters selective pressures so as to render genes for P advantageous, when they were not in Stage 0. With genetic assimilation, by contrast, the learning of P does not alter selection pressures; rather it is itself the function that renders certain genes advantageous. Both these processes require extended discussion. It will be convenient to begin with genetic assimilation, which will occupy the next two sections. After that I shall return to niche construction.
48
David Papineau
4 The Baldwin effect as genetic assimilation The notion of genetic assimilation is due to C.H. Waddington. In the 1940s and 1950s he investigated the way in which the selection of traits triggered by special environments could lead to those traits developing automatically across a wide range of environments. Waddington applied this idea to biological development in general, not just to behavioural traits that are initially acquired by learning. Still, our immediate concern here is with possible mechanisms for Baldwin effects, and so I shall focus on this kind of case, returning to Waddington’s wider concerns in the next section. Let me introduce the logic of genetic assimilation by considering a simple model. Suppose n sub-traits, Pi, i ⫽ 1, . . ., n, are individually necessary and jointly sufficient for some adaptive behavioural phenotype P. (You need to be able to find tool materials, fashion them, grasp them. . . . As before, each individual sub-trait is no good without all the others.) Each sub-trait can either be genetically fixed or acquired through learning. (For this section’s purposes there is no need to assume that this will be social learning – any mode of learning will do.) Suppose further that each sub-trait is under the control of a particular genetic locus: one allele at this locus will genetically determine the sub-trait, while an alternative allele leaves the sub-trait plastic and so available for learning. So, for sub-trait Pi, we have allele IG which genetically fixes Pi, and allele IL which allows it to be learned. To start with, the IGs that genetically determine the various Pis are rare, so that it is highly unlikely that any individual will have all n Pis genetically fixed. Moreover, suppose that it is pretty difficult to get all n Pis from learning. Still, given these specifications, organisms that have some Pis genetically fixed will face less of a task in learning the rest. (If you are already genetically disposed to grab suitable twigs if you see them, you will have less to do to learn the rest of the tool-using behaviour.) Organisms who already have some IGs will have a head start in the learning race, so to speak, and so will be more likely to acquire the overall phenotype. So the IGs that give them the head start will have a selective advantage over the ILs. Natural selection will thus favour the IGs over the ILs, and in due course will drive the IGs to fixity. The population will thus move through a stage where P is acquired by learning (Stage 1) to a stage where it is genetically fixed (Stage 2), thus yielding a prima facie Baldwin effect. This model is a simplification of one developed by Hinton and Nowlan (1987). They ran a computer simulation of essentially the above structure using a ‘sexually reproducing’ population of neural nets, and showed that the dynamics of their simulation would indeed progressively replace the alleles IL which left the Pis to learning with the IGs that fixed them genetically. To better see what is going on in this model, consider the standard worry about the natural selection of a complex of genes none of which is any good on its own. Thus: ‘What is the advantage of any IG on its own, given that it
Social learning and the Baldwin effect 49 only fixes one Pi, which is of no use without the other n ⫺ 1 Pis? Don’t we need all n IGs to occur together for any of them to yield a biological advantage? But that is overwhelmingly unlikely, if they are initially rare, and anyway they would be split up by sexual reproduction, if they did ever cooccur. So each IG on its own would seem to have no selective advantage.’ The above model allows a cogent answer to this argument: each IG does have a selective advantage on its own, even in the absence of the other IGs, precisely because it makes it easier to learn the rest of P. Even in the absence of other IGs at other loci, any given IG will still be favoured by natural selection, because it will reduce the learning load and so make it more likely that its possessor will end up with the advantageous phenotype P. This is what drives the progressive selection of the IGs in the model. Each IG is advantageous whether or not there are IGs at other loci, simply because having an IG rather than an IL at any given locus will reduce the amount of further learning needed to get the overall P. Given this last point, it will be worth thinking a bit about the precise sense in which the modelled process would constitute a Baldwin effect. If we focus on any specific locus, it turns out that the prior learning allowed by that locus’s IL is unnecessary for the selection of the IG after all. This IG will have an advantage over its alternative IL quite independently of any such prior learning; for the possession of this IG by any given individual will reduce its overall learning load, by removing component Pi from the vagaries of learning and placing it under genetic control. This remains true whatever alleles are at other loci, and even if no organisms have never previously used the alternative IL to learn Pi. So from this perspective the Baldwin effect seems to have disappeared. Stage 1, in which the organisms learn Pi, seems to play no role in fostering the selection of IG, just as John Watkins suspected. In order to see why the genetic assimilation model does indeed deliver a Baldwin effect, we need to adopt a wider perspective, and consider the progressive accumulation of – not just one specific IG – but of all the IG alleles, at different loci, which contribute to the overall genetic fixity of P. Recall that in the early stages of this process the various IGs are rare. This means that, even if a lucky individual does have one or two IGs, any success in acquiring the overall P will depend on it learning the remaining Pis. Moreover, it is precisely this possibility that gives the various IGs their initial selective advantage. Any given IG is advantageous precisely because of the way it makes it more likely the organism will be able to learn the remaining non-innate components of P. So now we do have a story in which learning matters. The progressive selection of the whole complex of alleles hinges on the fact that organisms are able to learn elements of P. We wouldn’t arrive at the final stages, where all the Pis get genetically fixed, were it not that in the early and intermediate stages the organisms were able to learn non-innate components of P – otherwise, to repeat, none of the IGs would have any initial selective advantage. So, if we consider the overall accumulation of IGs, we will observe a
50
David Papineau
sequence, with each stage causally necessary for the next, in which learned components of P are progressively replaced by innate ones. The behaviour P becomes innate as a result of a process in which P’s earlier being learned plays an essential role. The process is thus a Baldwin effect.
5 Waddington and genomic space In this section I want to compare the model just outlined with Waddington’s original notion of genetic assimilation. In the 1940s and 1950s Waddington and others were interested in ‘canalization’, that is, in the buffering of adaptive traits against disruptive environments and abnormal genes. To the extent that some trait is highly important to fitness – having normal hands, say – it will be advantageous that it should not be highly sensitive to developmental idiosyncrasies. Because of this, Waddington and his associates wondered in particular whether there could be selection for canalization against environmental variation. Could natural selection operate in favour of genomes which ‘flatten norms of reactions’ of important traits – that is, which ensure that these traits will appear in a wider range of environments than hitherto? In a famous series of experiments, Waddington demonstrated that this could indeed happen. For example, he subjected a population of fruit fly pupae to heat shocks (40°C for 2–4 hours). As a result, some failed to grow cross-veins on their wings (he called this trait ‘veinless’). Waddington then bred selectively from these individuals, and again subjected the pupae to heat shocks. After repeating this process for twelve generations, he was able to isolate a strain of flies that displayed the veinless trait even in the absence of early heat shocks (Waddington 1953). Waddington called his 1953 paper the ‘Genetic assimilation of an acquired character’. At first some of the fruit flies acquire the characteristic ‘veinless’ as a result of an environment of heat shocks. Then, under the artificial selective regime of the experiment, in which only flies with the veinless phenotype survive, we find flies that are innately veinless. Let us compare Waddington’s experiment with the model of the previous section. One obvious difference is that veinless is a morphological trait, rather than a behavioural one. Moreover, it is only ‘learned’ in the extremely attenuated sense in which any environmentally dependent trait is ‘learned’; certainly its acquisition is not the upshot of any general learning mechanism. However, let us put these differences to one side, and focus on the question of whether Waddington’s examples display the same underlying mechanism as modelled in the last section. On the surface, they certainly do not. Veinlessness is not composed of a number of sub-traits, like some complex sequence of behaviour. Nor, correspondingly, is there anything in Waddington’s analysis about a number of genetic loci, each of which can either innately determine some sub-trait, or leave it to environmental factors.
Social learning and the Baldwin effect 51 However, precisely because of these differences, Waddington’s examples are puzzling in a way that the kind of case modelled in the last section is not. As Patrick Bateson has observed, Frequent references are made to genetic assimilation . . . without thought being given to how a usually implicit reference to Waddington might explain what was being proposed. (Bateson 2004: 290) Often commentators will refer to the role of a new environmental factor (the heat shocks in Waddington’s experiment) in ‘revealing’ hitherto unexpressed genetic variability (the presence or absence of the genetic factors that yield veinlessness after heat shocks) and thus subjecting it to selective pressure. But this by itself does not explain Waddington’s results, since there is no intrinsic reason why selecting flies that are veinless-if-heat-shocked should yield a population with an increased likelihood of innately veinless flies. To see this more clearly, it will be helpful to think of Waddington’s flies as having three possible genomes: those that ensure they have cross-veins even if heat shocked; those that make them veinless-if-heat-shocked; and those that render them innately veinless. Most of the flies in the original population had the first genome. By subjecting them to heat shocks and selecting for veinlessness we get a population with the second genome. Now, why should the third genome be more probable in the second population than in the first? Why, so to speak, should the second and third genomes’ similarity in phenotypic space – they are both capable of displaying veinlessness – mean that they are similar in genomic space – a population with the second genome makes the appearance of the third more likely? (Mayley 1996). Why, to revert to our original terms, should Stage 1, in which the trait is ‘learned’, be crucial to reaching Stage 2, where it is innate? Well, here is one possible explanation. Suppose veinlessness depends on two factors: (i) some developmentally important protein loses its required conformation, and (ii) the ‘heat shock protein’ needed to correct this is absent. Both of these factors can be genetically determined, but both genes are originally rare, and so an innately veinless fly is highly improbable. Now think of the extreme heat shocks imposed by Waddington as an alternative non-genetic way of causing these two protein deficiencies. Not all flies subject to the heat shocks will develop one or both of these deficiencies, but an appreciable proportion will. Now it is much easier for the two rare genes for protein deficiencies to be selected for producing veinlessness: no longer do they have to find themselves together with the other gene; it is enough to be in a fly where the other protein deficiency has been environmentally caused.8 This story now explains, in roughly the style of the last section’s model, why selecting for the phenotype veinless-if-heat-shocked should make the genome innately veinless more accessible. Two different and initially rare
52
David Papineau
genes are needed to make a fruit fly innately veinless: the gene for the deformed developmental protein, and the gene for a lack of the heat shock protein. Even so, the genes that determine these deficiencies become ‘advantageous’ on their own, in the artificial regime imposed by Waddington, precisely because they make it easier to end up veinless, since they ensure veinlessness in any fly that has the other deficiency because of the environmental heat shocks. This ‘advantage’ will lead to an increase in the frequency of both genes, up to the point where their co-occurrence, and hence innate veinlessness, acquires a non-negligible probability.9 True, the match between this explanation and the last section’s model remains inexact. As observed above, my suggested explanation does not regard veinlessness itself as composed of a number of sub-traits; rather the explanation works by factoring the determinants of veinlessness into independent components, not veinlessness itself. This shows that the model of the last section is rather more restrictive than is necessary to capture the underlying selective dynamics. It is not essential that the trait at issue itself factors into independent sub-traits, as long as it causally depends on various independent factors that are individually necessary and jointly sufficient. Still, now I have discussed Waddington’s own cases, it will be convenient to return to the more restrictive model of the last section, as it applies to the examples which matter to this paper. So when I talk about ‘genetic assimilation’ in what follows, I shall be referring to cases in which some behavioural trait itself decomposes into sub-traits, each of which can either be environmentally or genetically determined, and where the initial selective advantage of all these genes derives from the fact that it makes it easier to learn the rest of the behaviour.
6 Niche construction With genetic assimilation, prior learning of the trait P does not alter the selective pressures on the genes that might render P innate. Rather, enhanced learning is the function which renders those genes advantageous. At any stage in the genetic assimilation of P, these genes are preferable to alternative alleles because they make it more likely the rest of P will be learned – and this advantage does not depend on such learning previously having occurred, only on it being possible henceforth. With niche construction Baldwin effects, by contrast, the prior learning of P alters selection pressures so as to render genes for P advantageous, where they would not have been advantageous otherwise. I shall consider two ways in which niche construction can yield Baldwin effects. First, I shall briefly look at an idea of Peter Godfrey-Smith’s, which I shall call ‘keeping up with the Joneses’. Then, in the next section, I shall show that social learning is itself a form of niche construction that yields powerful Baldwin effects. Niche construction itself extends far more generally than the Baldwin effect. It occurs whenever some new activity by some population creates new
Social learning and the Baldwin effect 53 selection pressures on their genes. For example, the evolution of innate adult lactose tolerance in some human populations is a response to new selection pressures generated by the availability of milk from domesticated cattle. Again, the innate disposition of cuckoo chicks to eject host eggs from the nest is clearly a genetic adaptation to the parental cuckoo practice of parasitizing other species’ nests (Laland et al. 2000). However, these examples are not Baldwin effects, as I am understanding the term, since they are not cases where the learning of some behaviour lead to the innateness of that selfsame behaviour. (Rather, dairy farming leads to innate lactose tolerance; parental nest parasitizing leads to innate egg ejection by offspring.) Godfrey-Smith, in considering niche construction as a possible source of Baldwin effects, focuses on the possibility that it may become more important to do P when everybody else is doing it (thus ‘keeping up with the Joneses’). In such cases, the widespread learning of P through some population may itself increase the selective advantage of acquiring P quickly and reliably – that is, via genes rather than learning. In such cases, the selective coefficient of genes for P would display a kind of ‘positive frequency dependence’ – their selective advantage would increase as P becomes more widespread in the population. (Note, however, that this is not the kind of ‘frequency dependent selection’ normally discussed in the population genetics literature, in that here the selective coefficient will increase with the frequency of the phenotype P, which may be learned as well as innate, rather than with the frequency of some allele that makes P innate.)10 It is not obvious that the Jones’s mechanism will work. Suppose that some trait P can either be genetically fixed by allele PG, or left to learning by allele PL. If P is adaptive, and PG delivers it more quickly and reliably than PL, as we are assuming, then will PG already have an advantage over PL at Stage 0, even before any organisms acquire P from learning at Stage 1? This is true enough. But it remains possible that PG may have an even greater advantage over PL at Stage 1, and that this increased advantage may be crucial in driving it to fixation. Imagine that the environment changes so that some crucial resource becomes too scarce for all to enjoy it – a climatic change means there are now only enough nuts for 90 per cent of the population, say. Animals who are able to climb nut trees (P) are able to get nuts ahead of those who have to wait for the nuts to fall. But tree climbing is very laborious to acquire from learning, as opposed to getting it innately from some gene PG. In such a case, PG will indeed have some slight advantage over PL even at Stage 0 when scarcely any animals can in fact climb trees, since it will eliminate any danger of ending up with no nuts, yet avoid the costs of learning P. However, this advantage will not be great at Stage 0, for the animals will still have a good chance of getting nuts without climbing trees at all, and so a lack of PG will not make it essential for them to incur the costs of learning P. At Stage 1, however, when most of the population has learned to climb trees, there will be no chance of getting nuts without climbing, and so any animal without PG will be forced to undergo
54
David Papineau
the costs of learning P in order to survive, thus greatly increasing the selective advantage of PG over PL.11 So I agree with Godfrey-Smith that niche construction Baldwin effects might sometimes arise from biological imperatives to ‘keep up with the Jones’s’. However, by focussing on this kind of case, Godfrey-Smith seems to me to miss a far more obvious and important species of niche construction Baldwin effects, namely, those occasioned by social learning.
7 Social learning as niche construction Recall the kind of example I discussed in my Introduction. Some complex behaviour P is socially learned, where it is highly unlikely that any individual could learn P on its own. To vary our example, consider the common herring-gull practice of opening shellfish by grasping them in their beaks, flying up to a suitable height, dropping the shellfish on a hard surface, and retrieving the flesh from the broken shell. There is reason to suppose that this behaviour is socially transmitted. Now, once a given population of gulls possesses this culture for opening shellfish, then this itself will create selection pressures for genes that make them better at acquiring it. An individual with an allele that innately disposes it to grasp clams when it sees them, say, will learn how to get shellfish meat more quickly, since it will have less to learn than gulls who lack this allele. But note that this allele would have no selective advantage, were it not for the pre-existing culture of shellfishdropping, given that there would be no real possibility of learning the rest of the complex behaviour without any exemplars to copy from. There’s no advantage in being disposed to grasp clams when you find them, if you do not then fly up, drop the clams, and retrieve the meat – and even gulls for whom the grasping disposition is innate would be highly unlikely to figure out the rest of this behaviour by individual trial-and-error if they could not learn it from other gulls. So this then gives us another kind of niche construction Baldwin effect. The prior existence of some learned behaviour (Stage 1) creates selection pressures for genes that will render that selfsame behaviour innate. And the prior learning of that behaviour is indeed essential here, since the genes in question would have no selective advantage in an environment (Stage 0) where no animals were learning the behaviour and so providing exemplars for further learners. The analysis of such social learning Baldwin effects is complicated by the fact they will inevitably involve genetic assimilation as well as niche construction. To see this, note that, when a socially learned behaviour creates selection pressures for genes for components of that behaviour, it is precisely by making the behaviour easier to learn. The social learned behaviour is significant as a niche constructor because at earlier stages it enables the remaining components of the overall behaviour to be learned.12 So, insofar as some socially learned behaviour functions as an environmental
Social learning and the Baldwin effect 55 niche that selects for its own innateness, it will be by lightening learning loads, which means that the requirements for genetic assimilation will also be satisfied. As in section 4, we will have a complex behaviour with a number of components, none of which is adaptive on its own. Given this, genes for those components might individually seem to lack any selective advantage, given the improbability of any one of them finding itself together with the others. However, once we take learning into account, then we can see that these genes are individually advantageous after all, since each on its own lightens the amount of learning needed to acquire the overall adaptive P. Still, it would be a mistake to think that the niche construction aspect of social learning adds nothing to the genetic assimilation mechanism discussed earlier. The niche construction aspect also tells us why it is possible for organisms to learn all the rest of P when only a few components of the behaviour are innate. Earlier, when discussing genetic assimilation itself, we simply took it for granted that such learning would be possible. However, in the cases now at issue, it is highly unlikely that any animals with only a few relevant genes will be able to learn all the rest of P by individual trial-anderror – in our example, it was highly unlikely that a herring gull for whom clam-grasping is innate would be able to learn all the rest of the clamopening behaviour on its own. The prior learned culture of P is thus essential for an environment in which the rest of P can be socially learned. It is precisely because the other gulls are already displaying the clam-opening behaviour that a tyro with only a few innate elements can acquire the rest of the behaviour. With socially learned behaviours, then, we get Baldwin effects twice over. The prior learning of P (Stage 1) is crucial to P’s becoming innate in two quite different ways. Not only does P need to be learned while each of the earlier IGs get selected, as in all genetic assimilation, but also the niche construction means there would not be any selective pressure on those IGs to start with unless the socially learned P were being displayed by conspecifics.
8 The significance of social learning I have focused on the structure of two kinds of Baldwin effect: genetic assimilation and niche construction. And I have argued that the genetic selection of social learned behaviours can constitute both kinds of Baldwin effect simultaneously. But is this anything more than a conceptual oddity? Why should it matter that certain possible processes may fit the half-formed ideas of an unimportant nineteenth-century theorist in two different ways? Well, there’s nothing especially significant about possible Baldwin mechanisms, even doubly Baldwinian ones. To show that certain processes are in principle biologically possible is of merely theoretical interest, in the absence of any reason to think that they are empirically important. Maybe the social learning of certain behaviours can lead to their own innateness in a
56
David Papineau
way that fits Baldwin’s conjecture twice over. But unless such cases play an important empirical role in evolution, this would be nothing more than an odd quirk of intellectual history. However, I think that there is some reason to suppose that these doubly Baldwinian social learning processes are empirically important. This is because social learning vastly expands the class of learnable adaptive behaviours. Many behaviours are far too complex for animals to have any realistic chance of acquiring them by individual learning alone (even with one or two genes to help them on their way). So, if everything was left to individual learning, these behaviours could never be genetically assimilated – with no real chance of the rest of the behaviour being learned, no IG would have any initial selective advantage, and genetic assimilation would not get going. But now add in social learning. This means that as soon as one lucky or exceptional individual somehow acquires P, then it becomes possible for the others to pick up P socially, when it would not be possible for them to learn P otherwise.13 And this will give the relevant IGs an initial advantage after all, and allow genetic assimilation to get going. The point is that genetic assimilation requires that the relevant P be learnable, and social learning renders many interesting Ps learnable when they would not otherwise be. Because of this, I suspect that precisely my double Baldwin effect is responsible for the innateness of many complex behavioural traits. If we have genetic assimilation, then that is one kind of Baldwin effect: the components of some adaptive behaviour progressively come under genetic control, because each relevant gene facilitates learning the rest of that same behaviour. However, in many such cases the relevant genes would not facilitate learning the rest of that behaviour, were it not for the help of the second kind of Baldwin effect: some adaptive behaviour is learnable, and so open to genetic assimilation, only because that same behaviour is available as an exemplar for social learning. Of course, this process does require that ‘one lucky or exceptional individual somehow acquires P’ independently of social learning, in order to get the social promulgation of P off the ground. And this prerequisite may seem to be in some tension with the idea that double Baldwin effects will be important precisely with Ps that are ‘too complex for animals to have any realistic chance of acquiring them by individual learning alone’. But this tension is more apparent than real. For note that, in the absence of social learning, all individuals would need to be able to acquire P by individual learning, in order for genetic assimilation to occur. Given social learning, however, only one individual need acquire P non-socially, in order to get things moving. There is no reason why one such lucky strike should not be reasonably probable, even if the chance of any given individual acquiring P non-socially is very low. (If the probability of success in a single trail is p, the probability of at least one success in K independent trials is (1 ⫺ (1 ⫺ p)K).)14 In my Introduction I said that I would defend the importance of Baldwin
Social learning and the Baldwin effect 57 effects against those who say that they are at best special cases of the more general phenomena of genetic assimilation and niche construction. I have now argued that, when the social learning of some behaviour leads to its own innateness, genetic assimilation and niche construction combine to produce a particularly powerful mechanism of natural selection. I take this to show that this kind of Baldwin effect at least is worth singling out for special attention. Let me conclude with an empirical prediction. If my doubly Baldwinian social learning mechanism has indeed been important for the evolution of complex behaviours, as I have hypothesized, then we should expect to find, somewhat paradoxically, that complex innate behaviours are more common in species that are good social learners than in other species. True, any such correlation will be diluted by the fact that the relative costs of learning and genetic control will not always favour bringing socially learned traits under genetic control (in the way I have been assuming since section 2.1). Even so, if I am right in thinking that social learning vastly expands the range of behaviours open to genetic assimilation, species that are good general social learners should still display significantly more complex innate behaviour than other species. Unfortunately I lack the expertise to assess this prediction myself. But I would be very interested indeed to know whether or not the comparative zoological data bear it out.15
Notes 1 In the 1890s Henry Osborne and Conwy Lloyd Morgan had also proposed that non-Lamarckian processes could lead to acquired characteristics becoming innate. Given this, ‘Baldwin has done well to have become the namesake for the effect’, as Peter Godfrey-Smith (2003) observes. The familiar term ‘the Baldwin effect’ is due to George Gaylord Simpson’s 1953 article of that title, which ironically was largely concerned to belittle the effect. For much more interesting history of the Baldwin effect, see Griffiths (2003). 2 There are well-evidenced examples of culturally transmitted tool use in apes, and indications of similar transmission in birds (Whiten et al. 1999; Hunt and Gray 2003). 3 For a detailed quantitative analysis of the relative costs of learning and genetic control, see Mayley (1996). 4 In contexts where learning has the biological advantage over genetic fixity, then we might well find ‘reverse Baldwin effects’, with some trait originally under genetic control coming to depend on learning instead. 5 Some favour the far more controversial thesis that not being learned is sufficient as well as necessary for innateness, at least in the context of psychological traits: that is, not only are psychological traits not innate if they are learned, but also that they are innate if they are not learned (Samuels 2002; Cowie 1999). 6 Godfrey-Smith also requires selection of genes for learning at Stage 1. This seems to me an unhelpful restriction, fostered by an excessive focus on genetic assimilation (sections 4 and 5 below). Pace Godfrey-Smith, there need be no Stage 1 selection of genes for learning in niche construction cases of Baldwin effects (sections 6 and 7). 7 See Baldwin (1896: section II). In addition, there are indications (section III.2)
58
8
9
10
11
12
13
David Papineau that Baldwin also thought that the learned predominance of a trait would lead to sexual preferences for displays of that trait; this would be a case of niche construction rather than breathing spaces (Griffiths 2003: section 3). Will the relevant genes arise from mutations in the ‘veinless-if-heat-shocked’ population, or were they always present in the original experimental population? It seems that Waddington’s experiments did not depend on mutations, since they did not work with inbred fly populations that lacked any initial genetic variability (Jablonka and Lamb 1995: 32–35, 52). However, this might make it unclear why the Stage 1 selection for veinlessness-if-heat-shocked is needed en route to innate veinlessness: if the genes for innate veinlessness were already available at Stage 0, then innate veinlessness would always have been open to selection over veinlessness-if-heat-shocked, given the artificial selective regime imposed by Waddington. However, remember that the availability of genes does not automatically mean that they will occur together (or remain together if they do). So the Stage 1 selection of flies that are veinless-if-heat-shocked is still essential, since it provides a context where the two genes do not have to cooccur in order to yield the phenotypic veinlessness required for their selection. For a more elaborate version of the model I have offered, see Jablonka and Lamb (1995: 32–36). Bateson (1982) offers an alternative suggestion: suppose veinlessness normally depends on the very rare homozygote of a rare recessive gene, and suppose further that the heat shock reverses dominance so that even heterozygotes with one allele will display veinlessness; this reversal will then create a significant selective pressure for the veinlessness allele, and thus increase its frequency to the point where homozygotes – who will display veinlessness even if not heat shocked – will become common. I am not sure whether this scenario is biologically realistic; in any case I would conjecture that the type of model offered in the text has more general applicability. It is arguable that the ‘keeping up with the Joneses’ scenario is only niche construction in an extremely extended sense. Unlike most other cases of niche construction, the relevant environmental change does not create any new developmental opportunities. It is simply an instance of the far more general phenomenon of trait frequencies altering selective pressures. I owe this point to Matteo Mameli. Note that this mechanism does not require that there is selection of genes for learning P at Stage 1, as originally required by Godfrey-Smith (2003). It is quite enough that P is produced by general and long-standing learning mechanisms operating in some new environment. (But see Godfrey-Smith, forthcoming, where he corrects this.) Thus note how the niche construction story ceases to apply at the point where P becomes entirely innate. Imagine that genes for all but the ‘last’ component Pn in some complex behaviour have already become fixed in the population, and consider the remaining competition between alleles NG and NL for this last component. This last NG will still have an advantage over its competing NL, since it ensures P more cheaply and reliably. However, this last advantage will not depend on the prior culture of P, since once an animal has this last NG it will have nothing left to learn from its conspecifics. Given that the other genes are all in place, this last NG would be favoured over the alternative NL even if no other animals were displaying P. Here is one point where differences between different kinds of social learning may matter (cf. section 2.3 above). Suppose that some unusual individual does acquire some adaptive behaviour P non-socially. What ensures that others will copy this individual, rather than others without P? Not all modes of social learning would seem to privilege models who display adaptive behaviours over
Social learning and the Baldwin effect 59 others – in particular, blind mimicry – (iii) – will not do this. But perhaps that does not matter, if we suppose that individual reinforcement acts as a moderator of social learning, perpetuating only those behaviours that yield reinforcing rewards. 14 Following Godfrey-Smith, I have specified that a Baldwin effect begins with a Stage 0 where ‘the environment changes so as to make phenotype P adaptive’. But perhaps such environmental shifts are not always necessary: in some cases the Baldwinization may begin simply because some individual animal happenstantially acquires P and the practice then spreads socially. 15 I would like to thank Nell Boase, Paul Griffiths, Peter Godfrey-Smith, Paul Rozin, Tom Simpson, Stephen Stich, Elliott Sober, Kim Sterelny, and especially Matteo Mameli for comments on previous versions of this essay.
Bibliography Akins, C. and Zentall, T. (1998) ‘Imitation in Japanese Quail: The Role of Reinforcement of Demonstrator Responding’, Psychonomic Bulletin and Review, 5: 694–697. Baldwin, J.M. (1896) ‘A New Factor in Evolution’, The American Naturalist, 30: 441–451, 536–553. Bateson, P. (1982) ‘Behavioural Development and Evolutionary Process’, in King’s Sociobiology Group (eds), Current Problems in Sociobiology, Cambridge: Cambridge University Press. Bateson, P. (2004) ‘The Active Role of Behaviour in Evolution’, Biology and Philosophy, 19: 283–298. Boyd, R. and Richerson, P. (1996) ‘Why Culture is Common but Cultural Evolution is Rare’, Proceedings of the British Academy 88: 73–93. Cowie, F. (1999) What’s Within? Nativism Reconsidered, New York: Oxford University Press. Downes, S. (2003) ‘Baldwin Effects and the Expansion of the Explanatory Repertoire in Evolutionary Biology’, in Weber, B. and Depew, D. (eds), Evolution and Learning, Cambridge, MA: MIT Press. Godfrey-Smith, P. (2003) ‘Between Baldwin Scepticism and Baldwin Boosterism’, in Weber, B. and Depew, D. (eds), Evolution and Learning, Cambridge, MA: MIT Press. Godfrey-Smith, P. (forthcoming) ‘On the Evolution of Representative and Interpretational Capacities’, The Monist. Griffiths, P. (2003) ‘Beyond the Baldwin Effect: James Mark Baldwin’s “Social Heredity”, Epigenetic Inheritance and Niche-construction’, in Weber, B. and Depew, D. (eds), Evolution and Learning, Cambridge, MA: MIT Press. Griffiths, P.E. (2002) ‘What is Innateness?’, The Monist, 85: 70–85. Hinton, G. and Nowlan, S. (1987) ‘How Learning can Guide Evolution’, Complex Systems, 1: 495–502. Hunt, G. and Gray, R. (2003) ‘Diversification and Continual Evolution in New Caledonian Crow Toll Manufacture’, Proceedings of the Royal Society of London, B270 (1517): 867–874. Jablonka, E. and Lamb, M. (1995) Epigenetic Inheritance and Evolution, Oxford: Oxford University Press. Laland, K., Olding-Smee, J. and Feldman, M. (2000) ‘Niche Construction, Biological Evolution, and Cultural Change’, Behavioural and Brain Sciences, 23: 131–137.
60
David Papineau
Mayley, G. (1996) ‘Landscapes, Learning Costs, and Genetic Assimilation’, in Turney, P., Whitely, D. and Anderson, R. (eds), Evolutionary Computation, Evolution, Learning and Instinct: 100 Years of the Baldwin Effect, Cambridge, MA: MIT Press. Samuels, R. (2002) ‘Nativism in Cognitive Science’, Mind and Language, 17: 233–265. Shettleworth, S. (1998) Cognition, Evolution and Behaviour, Oxford: Oxford University Press. Simpson, G.G. (1953) ‘The Baldwin Effect’, Evolution, 7: 110–117. Sterelny, K. (2000) The Evolution of Agency and Other Essays, Cambridge: Cambridge University Press. Tebbich, S., Taborsky, M., Fessl, B. and Blomqvist D. (2001) ‘Do Woodpecker Finches Acquire Tool Use by Social Learning?’, Proceedings of the Royal Society, 268: 2189–2193. Tomasello, M. (2000) The Cultural Origins of Human Cognition, Cambridge MA: Harvard University Press. Waddington, C.H. (1953) ‘Genetic Assimilation of an Acquired Character’, Evolution, 4: 118–126. Watkins, J. (1999) ‘A Note on Baldwin Effect’, British Journal for the Philosophy of Science, 50: 417–423. Whiten, A., Goodall, J., McGrew, W., Nishida, T., Reynolds, V., Sugiyama, Y., Tutin, C., Wrangham, R. and Boesch, C. (1999) ‘Culture in Chimpanzees’, Nature, 399: 682–685.
3
Signals, evolution, and the explanatory power of transient information Brian Skyrms
Pre-play signals that cost nothing are sometimes thought to be of no significance in interactions which are not games of pure common interest. We investigate the effect of pre-play signals in an evolutionary setting for assurance, or stag hunt games and for a bargaining game. The evolutionary game with signals is found to have dramatically different dynamics from the same game without signals. Signals change stability properties of equilibria in the base game, create new polymorphic equilibria, and change the basins of attraction of equilibria in the base game. Signals carry information at equilibrium in the case of the new polymorphic equilibria, but transient information is the basis for large changes in the magnitude of basins of attraction of equilibria in the base game. These phenomena exemplify new and important differences between evolutionary game theory and game theory based on rational choice.
1 Introduction Can signals that cost nothing to send have any impact on strategic interaction? Folk wisdom exhibits a certain skepticism: “Talk is cheap.” “Put your money where your mouth is.” Diplomats are ready to discount signals with no real impact on payoffs. Evolutionary biologists emphasize signals that are too costly to fake (Zahavi 1975; Grafen 1990; Zahavi and Zahavi 1997). Game theorists know that costless signals open up the possibility of “babbling equilibria,” where senders send signals uncorrelated with their types and receivers ignore the signals. Can costless signals have any explanatory power at all? They can in certain kinds of benign interaction where it is in the common interest for signaling to succeed. In fact, even with potential signals that initially have no meaning, it is possible for signals to spontaneously acquire meaning under standard evolutionary dynamics and for such meaningful signaling to constitute an evolutionarily stable strategy. Such spontaneous emergence of signaling systems is always to be expected in strategic interactions such as the sender–receiver games introduced by David Lewis (1969)1 to give a game-theoretic account of meaning. Signals acquire the ability to
62
Brian Skyrms
transmit information as a result of the evolutionary process. That this is so must be counted as one of the great successes of evolutionary game theory. But what about interactions where the interests of the parties involved are not so perfectly aligned? In a bargaining game, one strategy may have no interest in whether signaling succeeds or not. In an assurance game one strategy may have an interest in communicating misinformation. How will the evolutionary dynamics of such games be affected if we add a round of costless pre-play signaling, allowing each player to condition her act on the signal received from the other player? It would not be implausible to expect that in such situations costless signals would have little or no effect. Talk is cheap. In fact, “cheap talk” signaling has a dramatic effect on the evolutionary dynamics of such interactions. That this is so should warn us against quick arguments against the effectiveness of costless signaling. Why it is so illustrates some subtleties of the role of information in the evolutionary process. Section 2 briefly reviews the evolutionary dynamics in sender–receiver games. Section 3 introduces the embedding of a two-person game in a larger “cheap-talk” game with a round of pre-play costless signaling by both players. Section 4 discusses the effect of cheap talk on the evolutionary dynamics of an assurance game, where rational choice theory predicts that it should have no effect. Section 5 discusses the effect of cheap talk on the evolutionary dynamics of a bargaining game. Section 6 concludes.
2 Evolution of meaning in sender–receiver games In Convention, David Lewis introduced sender–receiver games to illustrate a game-theoretic account of conventions of meaning. One player, the sender, has private information about the true state of the world. The other player, the receiver, must choose an act whose payoff depends on the state of the world. The sender has signals available that she can send to the receiver, but they have no exogenously specified meaning. In the model of Lewis there are exactly the same number of states, signals and acts. In each state of the world there is a unique act that gives both players a payoff of 1, all other acts giving a payoff of 0. A sender’s strategy maps states of the world onto signals. A receiver’s strategy maps signals onto acts. There are many Nash equilibria in this game, including ones in which the sender ignores the state of the world and always sends the same signal and the receiver ignores the signal and always chooses the same act. An equilibrium where players always get things right and achieve a payoff of 1 is called a Signaling System Equilibrium by Lewis. There are multiple signaling system equilibria. This makes Lewis’ point that the meaning of the signals, if they have meaning, is conventional – it depends which signaling system equilibrium the players exemplify. From the point of view of evolutionary game theory there are two striking facts about Lewis signaling games. The first has to do with the evolutionarily stable strategies. Recall that a strategy, s, is evolutionarily stable if,
Signals, evolution, and transient information 63 for any alternative strategy, m, s does better played against itself than m does, or if they do equally well against s, then s does better against m than m does. (If the latter condition is weakened from “s does better against m than m does” to “s does at least as well against m as m does” the strategy is said to be neutrally stable.) In the evolutionary model for the Lewis game, a player may find herself either in the role of sender or receiver. Her strategy is a pair ⬍sender’s strategy, receiver’s strategy⬎ from the original Lewis game. Although Lewis sender–receiver games have many equilibria other than signaling system equilibria, the signaling system equilibria coincide with the evolutionarily stable strategies in this evolutionary model.2 The second fact is more powerful. In a standard model of evolutionary dynamics, the replicator dynamics, the signaling system equilibria are attractors whose joint basin of attraction cover almost all of the possible population proportions. This is shown in simulations, where some signaling system or other always goes to fixation. In simplified sender–receiver games, it can be shown analytically (Skyrms 1999). The analytical proof generalizes from the replicator dynamic to a broad class of adaptive dynamics, giving a demonstration of some robustness on the result.
3 Evolutionary games with cheap talk Given the effectiveness of adaptive dynamics in investing signals with meaning, it might be interesting to examine the co-evolution of strategy in a game and of meaning in pre-play signals preceding that game. In particular, consider a two-person symmetric game. We imbed this in a cheap talk game by introducing a set of signals, let each player send a signal to the other, and then let them play the base game where the act in the base game can depend on the signal received from the other player. If there are n possible signals then a strategy in the cheap talk game is an n ⫹ 1-tuple: ⬍signal to send, act to take if receive signal 1, . . ., act to take if receive signal n⬎. Thus a 2 by 2 base game with 2 signals generates a cheap talk game with 8 strategies; a 3 by 3 base game with 3 signals generates a cheap talk game with 81 strategies. If two strategies are paired in the cheap talk game, they determine acts in the base game and they receive the payoffs of their respective acts when those are paired in the base game. Robson (1990) was the first to point out that cheap talk may have an important effect in evolutionary games. He considered a population of individuals defecting in the Prisoner’s Dilemma. If there is a signal not used by this population, a mutant could invade by using this signal as a “secret handshake.” Mutants would defect against the natives and cooperate with
64
Brian Skyrms
each other. They would then do better than natives and would be able to invade. Without cheap talk, a population of defectors in Prisoner’s Dilemma would be evolutionarily stable. With cheap talk this is no longer true. This is not to say that cheap talk establishes cooperation in the Prisoner’s Dilemma. Mutants who fake the secret handshake and then defect can invade a population of the previous kind of mutants. And then if there is still an unused message, it can be used by a third round of mutants as a secret handshake. It seems that the whole story might be fairly complex. But even if all signals are used and all strategies defect, the state – although it is an equilibrium – is not evolutionarily stable. It is a mistake to assume that cheap talk has no effect.
4 Cheap talk in a stag hunt In a note provocatively titled “Nash Equilibria are not Self-Enforcing,” Robert Aumann argues that cheap-talk cannot be effective in the following game: Aumann’s stag hunt c
d
(Stag) c
9,9
0,8
(Hare) d
8,0
7,7
In this base game, there are two pure strategy Nash equilibria, cc and dd. The first is Pareto Dominant and the second is safer (risk-dominant). Aumann points out that no matter which act a player intends to do, he has an interest in leading the other player to believe that he will do c. If the other so believes, she will do c which yields the first player a greater payoff. One can think of c as hunting stag and d as hunting hare, where diverting the other player to hunting stag increases a hare hunter’s chances of getting the hare. Then both stag hunting types and hare hunting types will wish the other player to believe that they are stag hunters. Aumann concludes that all types of players will send the message, “I am a stag hunter” and consequently that these messages convey no information. In this game, unlike the sender–receiver games, we have a principled argument for the ineffectiveness of cheap talk. The argument, however, is framed in a context different from the evolutionary one we have been considering. Aumann is working within the theory of rational choice, and furthermore is assuming that the signals have a pre-existing meaning, so that the players know which signal says “I am a stag hunter.” Does the force of the argument carry over to evolutionary dynamics? We will compare the base game and the resulting cheap-talk game with two signals.
Signals, evolution, and transient information 65 In the base stag hunt game there are three Nash equilibria, both hunt stag, both hunt hare, and a mixed equilibrium. As an evolutionary game we have two evolutionarily stable strategies, hunt stag and hunt hare. The polymorphic state of the population that corresponds to the mixed equilibrium is not evolutionarily stable, and it is dynamically unstable in the replicator dynamics. The dynamical phase portrait is very simple, and is shown in Figure 3.1. The state of the system is specified by the proportion of the population hunting stag. If pr(Stag) ⬎ 7/8 the replicator dynamics carries stag hunting to fixation. If pr(Stag) ⬍ 7/8 replicator dynamics carries hare hunting to equilibrium. If pr(Stag) ⫽ 7/8 we are at an unstable equilibrium of the replicator dynamics. Now we embed this in a cheap talk game with 2 signals. A strategy now specifies which signal to send, what act to do if signal 1 is received, and what act to do if signal 2 is received. There are 8 strategies in this game. What is the equilibrium structure? First, we must notice that some states where everyone hunts hare are unstable equilibria. For instance, if the entire population has the strategy: “Send signal 1 and hunt hare no matter what signal you receive” then a mutant could invade using the unused signal as a secret handshake. That is, the mutant strategy: “Send signal 2 and hunt stag if you receive signal 2, but if you receive signal 1 hunt hare” would hunt hare with the natives and hunt stag with its own kind, and would thus do strictly better than the natives. The replicator dynamics would carry the mutants to fixation. Next, neither a population of hare hunters that sends both messages nor a population of stag hunters is at an evolutionarily stable state. The proportions of those who hunt hare and send message 1 to those who hunt hare and send message 2 could change with no payoff penalty. Likewise with the case where all hunt stag. These states are stable in a weaker sense. They are said to be neutrally stable, rather than evolutionarily stable. (Under the replicator dynamics they are dynamically stable, but not asymptotically stable.) Stag Hare
8
Payoff
6 4 2 0 0
0.5 Probability hare
Figure 3.1 Aumann’s stag hunt.
1
66
Brian Skyrms
There is, however, an evolutionarily stable state in the cheap talk game. It is an entirely new equilibrium, which has been created by the signals. This is a state of the population in which half the population has each of the strategies: ⬍1, Hare, Stag⬎ ⬍2, Stag, Hare⬎ The first strategy sends signal 1, hunts hare if it receives signal 1 and hunts stag if it receives signal 2. The second sends signal 2, hunts stag if it receives signal 1 and hare if it receives signal 2. These strategies cooperate with each other, but not with themselves! Notice that in a population that has only these two strategies, the replicator dynamics must drive them to the 50/50 equilibrium. If there are more who play the first strategy, the second gets a greater average payoff; if there are more of the second, the first get a greater average payoff.3 One can check that this state is evolutionarily stable. Any mutant would do strictly worse than the natives and would be driven to extinction by the replicator dynamics. It is also true that this is the only evolutionarily stable state in this game (Schlag 1993; Banerjee and Weibull 2000). We are led to wonder whether this is just a curiosity, or whether this new equilibrium plays a significant role in the evolutionary dynamics. Dynamical questions are also raised by another aspect of Aumann’s paper. David Kreps asked Aumann if he would give the same analysis of the ineffectiveness of cheap talk signaling in the following stag hunt game: Kreps’ stag hunt c
d
(Stag) c
100,100
0,8
(Hare) d
8,0
7,7
Aumann reports: The question had us stumped for a while. But actually, the answer is straightforward: Indeed, (c,c) is not self-enforcing, even here. It does look better than in figure 1 [Aumann’s stag hunt: Figure 3.1 this volume]; not because an agreement to play it is self-enforcing, but because it will almost surely be played even without an agreement. An agreement to play it does not improve its chances further. As before, both players would sign the agreement gladly, whether or not they keep it; it therefore conveys no information. (Aumann 1990)
Signals, evolution, and transient information 67 Aumann is certainly correct in that, in Krep’s Stag Hunt, hunting stag is an attractive choice even without communication. In the evolutionary dynamics, if most of the population are not initially hunting stag, hare hunting will be carried to equilibrium. The dynamical picture of the base game is shown in Figure 3.2. But it is also true that in Aumann’s stag hunt the odds are rigged for hunting hare. So we might want to also consider a “neutral” stag hunt, where hunting stag and hunting hare are equally attractive: Neutral stag hunt c
d
(Stag) c
15,15
0,8
(Hare) d
8,0
7,7
9
Stag Hare
8
7
6
Payoff
5
4
3
2
1
0 0
0.5 Probability hare
Figure 3.2 Krep’s stag hunt.
1
68
Brian Skyrms
The evolutionary equilibrium analysis for Aumann’s Stag Hunt holds good for all these stag hunts. The states in which everyone hunts stag are neutrally stable but not evolutionarily stable. The states in which everyone hunts hare are either unstable (if there is an unused message) or neutrally stable, but they are not evolutionarily stable. The unique evolutionarily stable state is again the polymorphism: ⬍1, Hare, Stag⬎ 50% ⬍2, Stag, Hare⬎ 50% It is evident that in this polymorphism, the signals are certainly carrying information. This information allows individuals to always coordinate on an equilibrium of the base game. Ignoring the signals, the frequency of response types [⬍*, Hare, Stag⬎ or ⬍*, Stag, Hare⬎] in the population is 50 percent. The signal sent identifies the response type of the sender with relative frequency of 100 percent. We should be especially interested to see whether evolutionary dynamics should lead us to expect that this type of state should be rare or frequent. This leaves us with a number of questions about the size of basins of attraction in the signaling games. Is the size of the basin of attraction of stag hunting behavior (hare hunting behavior) increased, decreased, or unaffected by cheap talk signaling? Does the new polymorphic equilibrium created by signaling have a negligible or a significant basin of attraction? In a Monte Carlo simulation of 10,000 trials of the neutral stag hunt game, the results were: All hunt stag at equilibrium
75,386 trials
All hunt hare at equilibrium
13,179 trials
Polymorphism
11,435 trials
It is evident that the basin of attraction of the polymorphic evolutionarily stable state created by the signaling is not negligible. It is almost as great as the basin of attraction of the set of hare-hunting equilibrium states! In the polymorphic equilibrium, the signals carry perfect information about the response type of the sender. That information is utilized by the receiver, in that he performs an act that is a best response to the sender’s response to the signal he himself has sent. In an equilibrium in which everyone hunts stag, the signals carry no information about response types. Either only one signal is sent, in which case probability of response type conditional on sending that signal is equal to probability of that response type in the population, or both signals are sent, in which case all members of the population have the response type “hunt stag no matter which signal you receive.” And in the equilibria where
Signals, evolution, and transient information 69 everyone hunts hare both messages are always sent and everyone has the response type “hunt hare no matter which signal you receive.” Nevertheless, the signals have something to do with the basins of attraction of these equilibria. Without signaling stag hunting and hare hunting each have basins of attraction of 50 percent of the simplex of possible population proportions. With cheap talk signals the basin of attraction of stag hunting equilibria is above 75 percent while the basin of attraction of hare hunting equilibria is below 14 percent4. There is something of a mystery as to how cheap talk has made this difference. We will pursue this mystery further in the next section.
5 Cheap talk in a bargaining game We will consider a simple, discrete Nash bargaining game. Each player makes a demand for a fraction of the pie. If their demands total more than 1, no bargain is struck and they get nothing. Otherwise they get what they demand. In this simplified game there are only three possible demands: , and , which we denote by act 1, 2, and 3 respectively. The resulting evolutionary game has a unique evolutionarily stable strategy – Demand – but it also has an evolutionarily stable polymorphic state in which half of the population demands and half of the population demands . The polymorphic equilibrium squanders resources. Each strategy has an average payoff of . The state where all demand and get it is efficient. Nevertheless, if we compute the basin of attraction in the replicator dynamics of Demand , it is only about 62 percent of the 3simplex of possible population proportions. The wasteful polymorphism has a basin of attraction of 38 percent. (Another polymorphic equilibrium, in which Demand has probability , Demand has probability and demand has probability , is dynamically unstable and is never seen in simulations.) What happens if we embed this game in a cheap talk game with three signals? A strategy in this cheap talk game is a quadruple: ⬍Signal to send, Demand if I receive signal 1, Demand if signal 2, Demand if signal 3⬎ There are now 81 strategies. If we run a Monte Carlo simulation, sampling from the uniform distribution on the 81-simplex, and let the population evolve for a long enough time, the counterparts of the old – polymorphism in the base bargaining games are not seen. Short simulations (20,000 generations) may appear to be headed for such a state, as in example 1. Example 1 pr ⬍2112⬎ ⫽ 0.165205 pr ⬍2122⬎ ⫽ 0.012758 pr ⬍2212⬎ ⫽ 0.276200
70
Brian Skyrms pr ⬍2222⬎ ⫽ 0.235058 pr ⬍2312⬎ ⫽ 0.053592 pr ⬍2322⬎ ⫽ 0.248208
All strategies send signal 2. Approximately half the population demands and approximately half demands . But it is not exactly half, and the numbers do not quite sum to one because very small numbers are not printed out. This simulation (20,000 generations) has not run long enough. A population of just these strategies and with half demanding and half demanding would not be evolutionarily stable because it would be invalidible by strategy ⬍3113⬎. Message 3 functions as a “secret handshake.” This strategy would get an average payoff of when playing against the natives and when playing against itself. Indeed, when we run the simulation longer the secret handshake strategy has time to grow, and this sort of result is never seen. As in the previous section, the signals create new evolutionarily stable polymorphic equilibria, such as those in examples 2 and 3: Example 2 pr ⬍1132⬎ ⫽ 0.2000000 pr ⬍1232⬎ ⫽ 0.2000000 pr ⬍2331⬎ ⫽ 0.4000000 pr ⬍3121⬎ ⫽ 0.1000000 pr ⬍3122⬎ ⫽ 0.1000000 Example 3 pr ⬍1312⬎ ⫽ 0.250000 pr ⬍2213⬎ ⫽ 0.250000 pr ⬍2223⬎ ⫽ 0.250000 pr ⬍3133⬎ ⫽ 0.250000 These appear in simulations where the population was allowed to evolve for 1,000,000 generations. We can gain some insight into these polymorphisms if we notice that the population can be partitioned into three subpopulations according to signal sent. Since a player conditions her strategy on the signal received, a player uses a strategy of the base game against each of these subpopulations, and the choice of strategies against different subpopulations are logically independent. Consider example 2. The first thing to look at is the interaction of subpopulations with themselves. Considered alone, the subpopulation that sends signal 1 is in the – polymorphic evolutionary equilibrium of the base game. So is the subpopulation that sends signal 3. The subpopulation that sends signal 2 is in the All evolutionary equilibrium of the base game.
Signals, evolution, and transient information 71 The subpopulations not only play themselves. They also play each other. Notice that when two subpopulations meet they play a pure strategy Nash equilibrium of the base game. When signal 1 senders meet signal 2 senders they both demand ; when signal 1 senders meet signal 3 senders the former demand and the latter demand ; when signal 2 senders meet signal 3 senders the former demand and the latter demand . These are all strict equilibria and stable in two population replicator dynamics. The three subpopulations can be thought of as playing a higher level game with payoff matrix: Example 2⬘ Sig 1
Sig 2
Sig 3
Sig 1
Sig 2
Sig 3
Considered as one-population evolutionary game, this has a unique interior attracting equilibrium at Pr(Sig 1) ⫽ 0.4, Pr(Sig 2) ⫽ 0.4, Pr(Sig 3) ⫽ 0.2, which are their values in example 2. Example 3 has a similar analysis. Here the – polymorphism occurs in the subpopulation that sends signal 2, while the other subpopulations always demand when they meet themselves. The higher level game between subpopulations has the payoff matrix: Example 3⬘ Sig 1
Sig 2
Sig 3
Sig 1
Sig 2
Sig 3
Considered as a one-population evolutionary game, this has a unique interior attracting equilibrium at Pr(Sig 1) ⫽ 0.25, Pr(Sig 2) ⫽ 0.50, Pr(Sig 3) ⫽ 0.25, as we observe in example 3. The foregoing modular analysis depends for its validity on the fact that all the equilibria at various levels of the story are structurally stable dynamical attractors. These polymorphisms achieve a payoff much better than that of the – polymorphism in the base game. In the – polymorphism, each strategy has
72
Brian Skyrms
an average payoff of . In examples 2 and 3, the average payoffs to a strategy are 0.466666 . . . and 0.458333 . . . respectively. The remaining inefficiencies are entirely due to the polymorphisms involved in subpopulations of those who send a specified signal meeting themselves. Although these new polymorphisms are fascinating, they play a minor role in the overall evolutionary dynamics of the bargaining game. In fact, more that 98 percent of the 81-simplex of the cheap talk game evolves to one of the equilibria where all demand . The result is robust to the deletion or addition of a signal. If we run the dynamics with 2 signals or 4 signals, the resulting simulation still leads to all demanding more than 98 percent of the time. The little mystery that we were left with at the end of the last section has become a bigger mystery. Existing theorems about cheap-talk do not apply (Bhjaskar 1998; Blume et al. 1993; Kim & Sobel 1995; Schlag 1993, 1994; Sobel 1993; Warneryd 1991, 1993). The simulation is set up so that each run begins in the interior of the 81-simplex. That is to say that each strategy has some positive, possibly small, probability. In particular, there are no unused messages. The game is not a game of common interest. It is in the common interest of the players to make compatible demands, but within these bounds there is opposition of interests. The best result is to demand while the other player demands . Signals have no pre-existing meaning. Does meaning evolve? Consider some of the final states of trials in which the population ends up in states where everyone demands : Example 4 (1,000,000 generations) pr ⬍2131⬎ ⫽ 0.189991 pr ⬍2132⬎ ⫽ 0.015224 pr ⬍2133⬎ ⫽ 0.037131 pr ⬍2231⬎ ⫽ 0.245191 pr ⬍2232⬎ ⫽ 0.048024 pr ⬍2233⬎ ⫽ 0.021732 pr ⬍2331⬎ ⫽ 0.128748 pr ⬍2332⬎ ⫽ 0.175341 pr ⬍2333⬎ ⫽ 0.138617 Each strategy sends signal 2; each strategy demands upon receipt of signal 2. But concerning what would be done upon receipt of the unused signals 1 and 3, all possibilities are represented in the population. Example 5 (1,000,000) generations) pr ⬍1333⬎ ⫽ 0.770017 pr ⬍2333⬎ ⫽ 0.057976 pr ⬍3333⬎ ⫽ 0.172008
Signals, evolution, and transient information 73 Here all messages are sent, each strategy ignores message sent and simply demands . In between these two extremes we find all sorts of intermediate cases. It is clear that in our setting we do not have anything like the spontaneous generation of meaning that we see in Lewis sender–receiver games. The action of signals here is more subtle. Suppose we shift our attention from the strong notion of meaning that we get from a signaling system equilibrium in a sender–receiver game to the weaker notion of information. A signal carries information about a player if the probability that the player is in a certain category given that he sent the signal is different from the probability that he is in the category simply given that he is in the population. At equilibria where players all demand , signals cannot carry information about the player’s acts. Conceivably, signals could carry some information about a player’s response type at equilibrium, where response type consists of the last three coordinates of a player’s strategy. But in examples 4 and 5 this cannot be true, because in example 4 there is only one signal sent and in example 5 there is only one response type. Perhaps the place to look for information in the signals is not at the end of the process, but at the beginning and middle of evolution. For the signals to carry no information about response types at a randomly chosen vector of population proportions would take a kind of a miracle. At almost all states in the simplex of population proportions there is some information about response types in the signals. There is information present – so to speak, by accident – at the beginning of evolution in our simulations. Any information about a response type could be exploited by the right kind of strategy. And the right kind of strategy is present in the population because all types are present. Strategies that exploit information present in signals will grow faster than other types. Thus, there will be an interplay between “accidental” information and the replicator dynamics. To investigate this idea we need a quantification of the average amount of information in a signal. We use the Kullbach–Leibler (K–L) discrimination information between the probability measure generated by the population proportions and that generated by conditioning on the signal. Let P denote the proportion of the population and pi denote the proportion of the sub-population that sends message i. Recall that a response type consists of a triple: ⬍Act if message 1, Act if message 2, Act if message 3⬎ There are three signals which we denote as Si and 27 response types, which we denote as Tk. Then the K–L information in message i is: ⌺k(pi[Tk]log(pi[Tk]/P[Tk]))
74
Brian Skyrms
The average amount of information in the signals in the population is gotten by averaging over signals5: ⌺iP[Si]⌺k(pi[Tk]log(pi[Tk]/P[Tk])) This is identical to the information provided by an experiment, as defined by Lindley (1956), where looking at a signal is thought of as an experiment and the particular signal seen is the experimental outcome. For example, consider the polymorphic equilibrium in the stag hunt game: ⬍1, Hare, Stag⬎ 50% ⬍2, Stag, Hare⬎ 50% There are two response types present in the population and two signals. In the probabilities conditional on signal 1, there is only one response type. The information in signal 1 is just: 1 log(1/()) ⫽ log(2) ⫽ 0.693 (natural logarithm) The information in signal 2 is likewise log(2) as in the average information in the signals in the population. In example 4 of this section, the average information in the population is the information in signal 2, which is always sent. This is zero, since for all response types, k, in the population ⫽ P. In example 5 of this section there is only one response type in the population, so the information in each message is zero. It is, however, possible for there to be positive information in an equilibrium population state in which everyone demands , for instance: ⬍1,3,3,1⬎ 50% ⬍2,3,3,2⬎ 50% Only messages 1 and 2 are sent, and the response types only differ on what they would do counterfactually, in response to receiving message 3. The messages distinguish these response types perfectly, so they contain information about response types. It is possible to compute the average information in the messages in a population, and to average the results over many trials in the Monte Carlo simulation used to compute basins of attraction. The results of averaging over 1,000 trials are shown in Figure 3.3. (The computation uses the natural logarithm.) There is, of course, some information present at the onset of the trials “by accident.” Then the replicator dynamics leads to an increase in average information in the signals which peaks at about 300 generations. After that the average information in a signal begins a slow, steady decline.
Signals, evolution, and transient information 75 0.35 0.30
Information
0.25 0.20 0.15 0.10 0.05 0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
Generations
Figure 3.3 Evolution of information.
What is the effect of this information? If there is information present in the population there are strategies present that can use it to good effect. Thus, if we look only at the behaviors in the base bargaining game as if cheap talk was invisible – we would expect to see some departure from random pairing. That is to say, the signals should induce some correlation in the behaviors in the bargaining game. We might look to see whether Demand behaviors are exhibited in pairs more often than at random, which would favor those types that manage to effect the correlation. We also might see if there is positive correlation between Demand and Demand behaviors. And we should be interested in whether negative correlation develops between and players and between pairs of players. There would be something to be said for all the foregoing if the behaviors were fixed and self-interested. But they are not, and the evolution of behaviors is a complex product of the evolution of signal and response. How the dynamics of all this will work out is difficult to see a priori. So we again resort to simulation. Here we calculate the covariance of the indicator variables for behaviors. That is to say, two individual types from the 81 types in the signaling game are picked according to the current population proportions. A pair of types determines the behaviors in the bargaining game. There are indicator variables for “First member demands , , ,” and for “Second member demands , , .” Then: Cov(i, j) ⫽ Pr(i, j) ⫺ Pr(i)Pr(j) where Pr(i, j) is the probability that the first demands i and the second demands j.6 Demand behaviors would “like” to be paired with themselves;
76
Brian Skyrms
Demand behaviors would “like” to be paired with Demand behaviors; Demand behaviors and Demand behaviors would “like” to avoid one another. Figure 3.4 shows that evolution complies to some extent. (It shows the results of simulations averaged over 1,000 trials.) The interaction between the information in the signals and the evolutionary dynamics generates a positive correlation between the compatible demands (, ) and (, ). It generates a negative correlation between the incompatible demands(, ). In each of these cases, the absolute magnitude of the covariance peaks at about 400 generations. At its peak, Cov(, ) is above 3 percent while Cov(, ) is less than half of that value. One should not be hasty, however, in generalizing the compliance of the correlation generated by evolution with what one might expect a strategist to desire. Demand behaviors would not at all mind meeting compatible Demand behaviors. And Demand behaviors would “like” to avoid meeting themselves. But here evolution fails to deliver correlations in the desired directions (as shown in Figure 3.5). Cov(, ) is as negative at 400 generations as Cov(, ) is positive. And evolution does not give Demand behavior the edge in avoiding itself that one might expect. Demand behaviors do not effectively use signals to avoid each other. Cov(, ) is mostly positive but near zero. It is evident from Figure 3.5 that thinking about what behaviors in the embedded game would “like to happen” is not a reliable guide to the working of the evolutionary dynamics in the cheap talk game. The evolution of correlation can be put into perspective by comparing it to the evolution of bargaining behaviors, with the probabilities averaged over the same 1,000 trials. This is shown in Figure 3.6. Demand behaviors rapidly take over the population. Demand 0.04 <1/2, 1/2> <2/3, 1/3> <2/3, 1/2>
0.03
Covariance
0.02 0.01 0 ⫺0.01 ⫺0.02
0
200
400
600
800
1000
1200
Generations
Figure 3.4 Evolution of correlation I.
1400
1600
1800
2000
0.004 0.002 0 ⫺0.002
Covariance
⫺0.004 ⫺0.006 ⫺0.008 ⫺0.010 ⫺0.012 ⫺0.014 <2/3, 2/3> <1/2, 1/3>
⫺0.016 ⫺0.018
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Generations
Figure 3.5 Evolution of correlation II. 1.0
Probability
0.8
0.6
0.4
Demand 1/3 Demand 2/3 Demand 1/2
0.2
0 0
200
400
600
800
1000
Generations
Figure 3.6 Evolution of bargaining behaviors.
1200
1400
1600
1800
2000
78 Brian Skyrms behaviors die out even more quickly than Demand behaviors, notwithstanding the positive correlation between the two. At 400 generations Demand has, on average, taken over 73 percent of the population; at 1000 generations it has, on average 97 percent of the population. At 2,000 generations, positive and negative covariance have all but vanished, Demand behaviors have taken over 99 percent of the population, and what little information remains in the signals is of little moment. In the limit, the information left in the signals, if any, is counterfactual. It only discriminates between response types that differ in their responses to signals that are not sent. It is clear that the interesting action is taking place at about 400 generations. This suggests that we might look at individual orbits at 400 generations and then at (or close to) equilibrium. Here is an example to illustrate the complexity of the interaction of signaling strategies. It is of a printout of all strategies with more than 1 percent of the population at 400 generations and at 10,000 generations of a single run, followed by their population proportion and average payoff. At 400 generations, we have: ⬍1 3 1 1⬎ 0.011134 U ⫽ 0.386543 ⬍1 3 1 3⬎ 0.044419 U ⫽ 0.436378 ⬍1 3 2 3⬎ 0.012546 U ⫽ 0.398289 ⬍1 3 3 3⬎ 0.203603 U ⫽ 0.473749 ⬍2 1 3 3⬎ 0.026870 U ⫽ 0.372952 ⬍2 3 1 1⬎ 0.052620 U ⫽ 0.385357 ⬍2 3 1 2⬎ 0.014853 U ⫽ 0.369061 ⬍2 3 1 3⬎ 0.025405 U ⫽ 0.425317 ⬍2 3 2 3⬎ 0.054065 U ⫽ 0.416539 ⬍3 1 1 3⬎ 0.022481 U ⫽ 0.354270 ⬍3 3 1 1⬎ 0.016618 U ⫽ 0.3954966 ⬍3 3 1 2⬎ 0.046882 U ⫽ 0.396710 ⬍3 3 1 3⬎ 0.050360 U ⫽ 0.416904 ⬍3 3 2 2⬎ 0.010570 U ⫽ 0.366388 ⬍3 3 2 3⬎ 0.011266 U ⫽ 0.386582 ⬍3 3 3 1⬎ 0.124853 U ⫽ 0.419688 ⬍3 3 3 3⬎ 0.010625 U ⫽ 0.440625 At 10,000 generations, we have: ⬍1 3 3 3⬎ 0.973050 U ⫽ 0.500000 (with the remaining population proportions spread over the other 80 strategies.) It is evident that there is more going on here than the secret handshake. Handshake strategies are present. For example, the first strategy listed, ⬍1 3 1 1⬎ sends signal one, demands when paired with anyone who sends
Signals, evolution, and transient information 79 signal one, and plays it safe by demanding in all other cases. But there are also what might be thought of as anti-handshake strategies, such as ⬍3 3 3 1⬎, which occupies about 12.5 percent of the population at 400 generations. This strategy demands when it meets those who send a different signal, but plays it safe by only demanding when it meets those who send the same signal. The anti-handshake strategy contributes to the success of the strategy that eventually takes over almost all of the population ⬍1 3 3 3⬎ as it eventually dies out. The same is true of the handshake strategy ⬍1 3 1 1⬎.7 What is the cause of the dramatic difference in the magnitude basins of attraction of the equal split produced by cheap talk (between 62 percent and 98⫹ percent)? We would like to have a more complete analysis, but we can say something. It appears to be due to transient information. By way of a complex interaction of signaling strategies, this transient information produces transient covariation between behaviors in the base bargaining game. Much more than the “secret handshake” is involved. The type of covariation produced could not have been predicted on grounds of rational choice. Nevertheless, the net effect is to increment the fitness of strategies that demand .
6 Conclusion Evolution matters! Analysis of the Stag Hunt and bargaining games from the viewpoint of rational choice-based game theory would not predict the phenomena discussed here. The traditional rational choice theory either makes no prediction at all, or it makes the opposite prediction from evolutionary game theory. The disparity calls for empirical testing. Cheaptalk matters! Costless signals can have a large effect on evolutionary dynamics. The signals may create entirely new equilibria, and may change the stability properties of equilibria in the base game. Equilibrium analysis, however, misses important dynamic effects of signaling. Costless signals may cause large changes to the relative magnitude of basins of attraction of multiple equilibria in the base game. In the evolutionary process, information that is present “by accident” is used and amplified in the evolutionary process. Sometimes informative signaling goes to fixation. This is the case in polymorphic equilibria whose existence is created by the signals. But sometimes the information is transient, and has disappeared by the time the dynamics has reached a rest state. Transient information matters! Information, although transient, may nevertheless be important in determining the eventual outcome of the evolutionary process. Why and how this is so is a question involving complex interaction of signaling strategies. Some insight into the nature of these interactions can be gained from simulations, but a deeper analysis would be desirable. We can say this for certain. If costless pre-play signaling is present, then
80
Brian Skyrms
omitting it from the model on rational choice grounds may be a great error. If the signals were invisible, and an observer only saw the behaviors in the base games we have discussed, the course of evolution would not be intelligible. Mysterious correlations would come and go. The dynamics would not appear to be the replicator dynamics. Signals, even when they are cheap, have a crucial role to play in evolutionary theory.
Notes 1 A more general sender–receiver was introduced and analyzed in Crawford and Sobel (1982). 2 The situation is somewhat complicated if the Lewis model is modified to allow the number of messages to exceed the number of states, but the analysis by Wärneryd shows how the privileged status of signaling system equilibria remains intact. 3 This explains why there is no stable equilibrium among the strategies ⬍1, Stag, Hare⬎ and ⬍2, Hare, Stag⬎, which cooperate with themselves, but not with others. If one were more numerous, it would get a greater payoff, and replicator dynamics would drive it to fixation. 4 In Aumann’s Stag Hunt the basin of attraction of stag hunting without cheap talk is only 0.125. With cheap talk the basin of attraction of stag hunting is increased to 0.149. The polymorphic Evolutionarily Stable State is also seen here, with a basin of attraction of 0.015. 5 The sums are over those signals that are present in some positive proportion of the population. 6 Pr(i, j) ⫽ Pr(j, i) since the draws of types are independent, so Cov(i, j) ⫽ Cov(j, i). 7 The reader may wonder, as I did, what the effect would be of removing all potential handshake strategies. I ran simulations where, in the first generation, I killed every strategy that demanded when it met itself. Nevertheless, about 38 percent of the time the population converged to an equilibrium where each strategy demanded of every other strategy that sent a different signal. For each signal, there were two types equally present, which implement the – polymorphism within the sub-population that sends that signal. Here is an instance: ⬍1 1 3 3⬎ pr ⫽ 0.01667 ⬍1 2 3 3⬎ pr ⫽ 0.01667 ⬍2 3 1 3⬎ pr ⫽ 0.01667 ⬍2 3 2 3⬎ pr ⫽ 0.01667 ⬍3 3 3 1⬎ pr ⫽ 0.01667 ⬍3 3 3 2⬎ pr ⫽ 0.01667 The average payoff for each strategy is 0.444444.
Bibliography Alexander, J. and Skyrms, B. (1999) “Bargaining with Neighbors: Is Justice Contagious?” Journal of Philosophy, 96: 588–598. Aumann, R.J. (1990) “Nash Equilibria are Not Self-Enforcing,” in J.J. Gabzewicz, J.-F. Richard, and L.A. Wolsey (eds), Economic Decision Making, Games, Econometrics and Optimization, Amsterdam: North Holland, 201–206. Banerjee, A. and Weibull, J. (2000) “Neutrally Stable Outcomes in Cheap-Talk Coordination Games,” Games and Economic Behavior, 32: 1–24.
Signals, evolution, and transient information 81 Bhaskar, V. (1998) “Noisy Communication and the Evolution of Cooperation,” Journal of Economic Theory, 82: 110–131. Blume, A., Kim, Y.-G. and Sobel, J. (1993) “Evolutionary Stability in Games of Communication,” Games and Economic Behavior, 5: 547–575. Crawford, V. and Sobel, J. (1982) “Strategic Information Transmission,” Econometrica, 50: 1431–1451. Grafen, A. (1990) “Biological Signals as Handicaps,” Journal of Theoretical Biology, 144: 517–546. Kim, Y.-G. and Sobel, J. (1995) “An Evolutionary Approach to Pre-play Communication,” Econometrica, 63: 1181–1193. Kullback, S. (1959) Information Theory and Statistics, New York: Wiley. Kullback, S. and Leibler, R.A. (1951) “On Information and Sufficiency,” Annals of Mathematical Statistics, 22: 79–86. Lewis, D.K. (1969) Convention: A Philosophical Study, Oxford: Blackwell. Lindley, D. (1956) “On A Measure of the information Provided by an Experiment,” Annals of Mathematical Statistics, 27: 986–1005. Nydegger, R.V. and Owen, G. (1974) “Two-Person Bargaining: An Experimental Test of the Nash Axioms,” International Journal of Game Theory, 3: 239–250. Robson, A.J. (1990) “Efficiency in Evolutionary Games: Darwin, Nash and the Secret Handshake,” Journal of Theoretical Biology, 144: 379–396. Roth, A. and Malouf, M. (1979) “Game Theoretic Models and the Role of Information in Bargaining,” Psychological Review, 86: 574–594. Schlag, K. (1993) “Cheap Talk and Evolutionary Dynamics,” Discussion Paper, Bonn University. Schlag, K. (1994) “When Does Evolution Lead to Efficiency in Communication Games?” Discussion Paper, Bonn University. Skyrms, B. (1996) Evolution of the Social Contract, New York: Cambridge University Press Skyrms, B. (1999) “Stability and Explanatory Significance of Some Simple Evolutionary Models,” Philosophy of Science, 67: 94–113. Sobel, J. (1993) “Evolutionary Stability and Efficiency,” Economic Letters, 42: 301–312. Taylor, P. and Jonker, L. (1978) “Evolutionarily Stable Strategies and Game Dynamics,” Mathematical Biosciences, 40: 145–156. Van Huyck, J., Batallio, R., Mathur, S., Van Huyck, P. and Ortmann, A. (1995) “On the Origin of Convention: Evidence from Symmetric Bargaining Games,” International Journal of Game Theory, 34: 187–212. Warneryd, K. (1991) “Evolutionary Stability in Unanimity Games with Cheap Talk,” Economic Letters, 39: 295–300. Warneryd, K. (1993) “Cheap Talk, Coordination and Evolutionary Stability,” Games and Economic Behavior, 5: 532–546. Zahavi, A. (1975) “Mate selection – a Selection for a Handicap,” Journal of Theoretical Biology, 53: 205–214. Zahavi, A. and Zahavi, A. (1997) The Handicap Principle, Oxford: Oxford University Press.
Part II
Rationality
4
Untangling the evolution of mental representation Peter Godfrey-Smith
1 Co-evolution and the eye of the interpreter The “tangle” referred to in my title is a special set of problems that arise in understanding the evolution of mental representation. These are problems over and above those involved in reconstructing evolutionary histories in general, over and above those involved in dealing with human evolution, and even over and above those involved in tackling the evolution of other human psychological traits. I am talking about a peculiar and troublesome set of interactions and possibilities, linked to long-standing debates about the status of folk psychology and the nature of semantic properties. More specifically, there are two sets of problems I have in mind, which I will call eye-of-the-interpreter issues and co-evolutionary issues. We usually find ourselves approaching questions about the evolution of the mind using a particular framework and vocabulary: roughly, a kind of folk psychology sharpened up by various scientific influences. But there is a persistent tradition of argument in philosophy of mind that holds that we should not think of folk-psychological categories as picking out natural distinctions with respect to the machineries lurking inside our heads. Beliefs, desires, and other folk-psychological states only exist in a kind of interpreter–dependent way; to have a particular belief is just for that interpretation of you to be compelling to a certain kind of interpreter. We might have reason to reject the simplest kinds of “interpretationist” views of the mind, but it would be a mistake to ignore this issue altogether. We need to ask: what kind of description of cognitive mechanisms picks them out in a way that is appropriate for evolutionary explanation? In a way, this is a special case of the problem of “trait-individuation,” which is familiar from discussions of adaptationism in evolutionary biology (Gould and Lewontin 1979). But that general problem has some special features in this context. Those are the “eye-of-the-interpreter” issues. The second set of issues concerns co-evolutionary possibilities. I said above that we have to untangle the relations between real psychological mechanisms and socially-maintained practices of interpretation. But whatever story we tell about this, we also have to contend with the fact that
86
Peter Godfrey-Smith
from an evolutionary point of view, each of these is liable to be a causal influence on the other. If folk-psychological interpretation is biologically old, then it has been part of the environment in which human cognitive traits were exposed to natural selection. Folk psychology is not just the tool that we use when first thinking about the mind, it is also a social fact that human agents have had to contend with, for some unknown period of time. It is part of the social context in which thought and action take place. So while it is obvious that folk-psychological practices of interpretation will have been affected by the facts about cognitive mechanisms, it is also true that the evolution of cognitive mechanisms might have been affected by the social environment generated by folk-psychological interpretive habits. To make this latter point vivid, it is helpful to compare folk psychology with folk physics. In each case, we can ask questions about the evolution, biological and cultural, of folk-theoretic understanding of the world. The difference is that the target of folk physics – motions and interactions involving physical objects – is indifferent, oblivious to the activities of the theorist or interpreter. In the case of folk psychology, however, the interpreter is often part of the social context in which action occurs and has its consequences. Suppose there was a folk physics “module,” for example, universal in the species and rather inflexible to individual learning. If there was such a thing, the physical world would not treat this as an evolutionary opportunity, and change its basic principles in response, to suit its own interests. But that is something that could, in principle, happen in the case of folk psychology. In the case of folk psychology (with or without modules) our interpretive practices have been, and continue to be, a causal player in the evolution of cognition and behavior. I call this a “co-evolutionary” possibility, even though this is extending the standard sense of that term in evolutionary theory.1 For here the two traits evolving together are not found in different species, but in the same species and even the same individuals. So this is a special, but I think reasonable, sense of the term “co-evolution.” Those, then, are the tangles. This essay will not try to untangle them very far. I will describe these problems in more detail, and make some preliminary moves and speculations. This essay is one of a series of initial forays into an area that I hope eventually to discuss more systematically (see also Godfrey-Smith 2002, 2003, 2004).
2 Two sets of facts In this section I will try to organize the issues in more detail. We need first to describe the phenomena in a way that avoids some of the tangles. Let us distinguish two sets of facts: 1
Facts about the wiring and organization of behaviorally complex organisms, and the connections between this inner wiring and the organisms’ environments. (I will call these “wiring-and-connection facts.”)
The evolution of mental representation 87 2
Facts about ordinary human practices of interpretation and ascription of content. (I will call these “interpretation facts.”)2
Everyone agrees that these two sets of facts are real, though both can change over time. One thing people disagree about is the relations between these two sets of facts. One view is familiar from the work of people like Jerry Fodor (1987), Fred Dretske (1988), and William Lycan (1988). This is a view in which ordinary human practices of interpreting others make use of a “folk psychology” that is a fairly good theory of the wiring-and-connection facts. Here are the basic ideas. First, folk psychological interpretation is an attempt to label and describe the real inner causes of behavior – the reasons, feelings, memories, and plans that make people behave as they do. Second, these interpretive practices are also predictively successful. Such success suggests that our interpretive practices use or embody a largely true theory of how people’s minds work. Kim Sterelny (2003) and I call this view the simple coordination thesis. It asserts a simple descriptive relationship between the interpretations and wirings-and-connections. The simple coordination thesis can accept that folk psychology is inadequate and incomplete in many areas. Eventually, we will have a better theory. But we can expect this theory to retain many of the basic features of folk psychology. The simple coordination thesis also suggests a set of evolutionary claims. The capacity for mental representation of the world is seen as a special, adaptive type of inner wiring that has an evolutionary history of a fairly standard kind. Asking about the evolution of mental representation is no different from asking about the evolution of the immune system or warm-bloodedness (see also Millikan 1984; Papineau 1993). One well-known alternative to the simple coordination thesis, and an ideal point of contrast, is Daniel Dennett’s view (1978, 1987). For Dennett, it is a mistake to think of ordinary folk-psychological interpretation as an attempt to describe wiring-and-connection facts. Belief ascriptions are not attempts to pinpoint discrete, causally active, internal states with special connections to the world. We should give a different sort of theory of the structure and social role of our interpretative practices. For an agent to have a particular belief, for example, is merely for the attribution of this belief to be compelling to an interpreter, where an interpreter has a characteristic viewpoint and a special set of goals. We should think of interpretation as a social tool, used in a holistic and rationalizing way. This position regarding the status of folk psychology implies a different view of how the “evolution of mental representation” should be addressed. There is no special inner wiring that is “the mental representation kind” of wiring. Consequently, there is no evolutionary story to tell about that special wiring. There is, of course, an evolutionary story to tell about how animals like us became so good at getting around the world. But we should
88
Peter Godfrey-Smith
not try to tell that story in a way that uses folk-psychological concepts to pick out the important different kinds of wirings and connections. On the other hand, we might tell a different kind of adaptive story about folk psychology; one where the trait in question is the interpretive practice itself. Via some process or other – possibly a mixture of biological and cultural evolution – we developed an interpretive tool with a specific structure. That is a real trait, for the purposes of evolutionary explanation. The simple coordination thesis and Dennett’s view mark out two extreme options, when we think about the various ways in which the wiring-andconnection facts and the interpretation facts might be connected. There are other options, of course (see Sterelny 1990 and Rey 1997 for more detailed surveys). These options include “eliminativist” views. In this context we should distinguish two kinds of eliminativism. One eliminativist possibility is hostile toward folk psychology but friendly toward mental representation. Folk psychology might be seen as committed to false views about the architecture of the mind, and perhaps about particular kinds of semantic relationships between the mind and the world. But the basic idea of mental representation can be retained when folk psychology is abandoned. Another option is hostility toward not just folk psychology but toward the whole notion of the mind as a representational organ. The Churchlands, it seems to me, have perhaps not made a definite commitment here, but are both more inclined toward the former view – they oppose folk psychology more than the general idea of mental representation.3 In fact, many ways of denying the simple coordination thesis leave open, to varying degrees, the possibility of a representational view of the mind that is not folk-psychological. For the sake of simplicity, however, in this essay I will mostly abstract away from options of that kind. The problems that I called “eye of the interpreter” problems should now be fairly clear. Does folk psychology supply us with concepts that we can use to formulate good evolutionary questions about the mind? Is folk psychology even “trying” to describe real features of cognitive mechanisms? If, as might be the case, both the extremes described here are false and some intermediate view is true, how do we sort through and isolate the concepts that will be useful in evolutionary discussions of wiring-and-connection facts?
3 Folk psychology as a model This section outlines a proposal of my own about folk psychology. This view is supposed to be both generally defensible and helpful in this specific context. The most familiar way of thinking about folk psychology at the moment is to take it as a “theory” (Stich 1983; Fodor 1987; Davies and Stone 1995; Gopnik and Meltzoff 1997). The main alternatives to this view treat folk psychology as either a sort of instrumental-hermeneutic tool, or a simulation ability. Here is another possibility: think of folk psychology as a model.
The evolution of mental representation 89 I intend this to be a modification of the familiar “theory–theory” idea. Initially it might look like a very insignificant modification, but I hope to show that it does some useful work. The word “model” has a multiplicity of senses, in science and in philosophy. And sometimes, indeed, it means no more than theory, or deliberately simplified theory. But there is a sense of the term found in science that is useful here.4 In this sense, a model is a structure used to represent another system, via some resemblance relation between the two. A key feature of the role of models in science is that models are open to a variety of construals. We have a model and a “target” system. The model is supposed to help us in some way to understand the target system. But the exact way in which the model is supposed to represent or help us deal with the target system is flexible. Models in science have what we might call an adjustable intended-fidelity relation. The adjustability applies to both degree and kind. Different people can use the same model to deal with the same target system, but construe the model differently. For example, a model can be treated as an input–output device and no more; the model is supposed to predict how the target system will behave. At the other extreme, a model can be construed as a detailed map of the hidden structure and workings of the target system. Between these extremes many intermediates are possible. Some resemblance relation between the structure of the model and the structure of the target system is generally the goal, but this resemblance admits many varieties and degrees. The intended relation between model and target can vary over time, across users, and across uses. The target itself can also change. A model does not bring with it its own rule of interpretation; the model is just the structure itself. Suppose that we think of folk psychology as a model of thought. The model is an abstract structure, including a characteristic set of elements (beliefs, desires, sensations, memories, plans, wishes, fears, actions) and a characteristic set of interactions between these elements. The model can be applied to a variety of target systems, but most usually to our fellow humans. And even given a specified target system, the model is open to a variety of construals. These include various kinds of realism and instrumentalism. The model can also, I think, be used in a basic way with no construal at all; there is a kind of mere deployment of the model. So what is the ordinary, everyday folk-psychological skill? Perhaps we should describe it as facility with a model. This is what a child might acquire, rather than “grasp of a theory.” The child acquires, by about age four, facility with the basic model, the ability to bring it to bear on a situation and make moves within it. One can have this facility without much of a construal. Philosophers, however, find that we can construe the model in various ways. We can treat it in more and less realist ways, and in more descriptive and more normative ways. We can treat it as a picture of the inner machinery of the mind, as an abstract theory of rational thought, or as a
90 Peter Godfrey-Smith predictive tool for getting around the social world. And philosophers and scientists are not the only ones who exercise this ability for multiple construals. We see some of the same flexibility in everyday uses of folk psychology. Contrast folk psychology on the freeway and in the lawcourt. In the lawcourt special attention is paid to inner states – to motives, to what was in the person’s mind. Did the accused fear for his own life, when he fired the fatal shot? Here folk psychology is used to try to make deep claims about the inner wiring (whether this activity is justified or not). On the freeway, the main concern is just to get predictions about how others will behave. So some alternative construals of the model might themselves be part of the more sophisticated folk-psychological competence of an adult user. I have been emphasizing the multiplicity of construals. But, of course, there are also different versions of the model itself. It might be better to think of folk psychology as a family of models, with a common core and a range of variants and elaborations. These variants are developed in different cultures, subcultures, and various specialized contexts including philosophical discussion. The models in the family all have some macro-features in common: a distinction between beliefs and desires or preferences, the notions of sensory input and behavioral output, the characteristic dependence of actions on perceptions, thoughts and feelings, and so on. But some versions of the model have more detail, and here we find differences between them. These versions may have features designed to help with special applications of the model. Consider, for example, degrees of belief. These are not found, I suppose, in the simplest versions of folk psychology, but they can be easily introduced in an informal way (“How sure are you?”). They can then be used to develop the very precise version of the model found in Bayesianism. So some debates about folk psychology can be resolved via a simple assertion of “false opposition.” People have argued for decades about what sort of claims folk psychology makes about the organization of the machinery inside our heads. These range from the view that it makes no claims (Dennett), rather weak claims (Jackson and Pettit 1990), strong-ish claims (Fodor 1987), to quite strong claims (Stich 1983; Ramsey et al. 1991). Roughly speaking, no one is right here; the folk psychological model does not dictate its own construal. But all those construals are available. I will digress for a moment to present a few reflections on other parts of philosophy. I suggest that philosophical literatures often engage in an odd kind of exploration-cum-elaboration of the folk psychological model. “Elaboration” here is meant to imply making additions to the model. Fields like moral psychology, action theory, and meta-ethics are full of this. They might present themselves as engaged in exploration of pre-existing structure in our concepts of thought, action, and so on, when in fact they are building new, refined versions of the model, guided by various theoretical goals such as consistency and economy, as well as the philosopher’s intuitions or introspections. These refined models may sometimes generate spurious philosophical
The evolution of mental representation 91 problems. They can get tangled in regresses, for example, especially in getting the machinery to produce a definite output in the form of an action. No doubt these exercises do sometimes reveal real problems, but I think we should often be wary of spurious problems that are artifacts of the way in which the basic model was elaborated into a more precise and complex version. That ends my digression. We now move to an obvious question. How good is the basic folk psychological model? It will be apparent from what I said above that our first response to this question should be: good for which target systems under what sort of intended-fidelity relation? When considered as an input–output structure, a predictive device for dealing with normal humans, folk psychology is clearly very good. How good is it as a guide to the real internal structure of the human mind? Even under fairly weak demands for the fidelity of the model, this is surely still very uncertain, notwithstanding occasional bombast from people like Jerry Fodor and Paul Churchland. I will finish this section with a discussion of the role of folk psychological concepts within cognitive science. Cognitive science encounters the folk psychological model as a starting point and as a source of structural ideas. As a scientific field changes, earlier models are “mined” for features to use in later ones. And this mining can proceed in a piecemeal way. For example, a basic contrast between beliefs and desires can be retained even when much else is not. Propositional content can be dropped for something more holistic; the idea of beliefs as states that are available to all mental processing and deliberation can be dropped in favor of a modular option. Still, a contrast between information-bearing, “how-it-is” structures and structures expressing goals or desired outcomes might be kept. In situations like this, the vocabulary of folk psychology itself will either be retained or rejected in scientific discussion according to a range of rules and goals. Does the speaker want to emphasize the continuity of some new view with folk psychology, or emphasize the discontinuities? 5 What I have sketched here is an empirical hypothesis about folk psychology. It is intended to modify the familiar “theory–theory” option by introducing a better conception of what an inner “theory” of others’ minds might be like. This hypothesis might be entirely wrong, and even if it is on the right track in some ways, it is probably highly oversimplified. Alvin Goldman, in his own contribution to the Porto conference, rightly emphasized the possibility that humans might have several mind-reading devices, rather than just one. As a modification of the theory–theory, this view inherits that position’s commitments on some empirical controversies. In particular, the view presented here is not supposed to dissolve (“collapse”) the contrast between the theory–theory and simulationism (see Davies and Stone forthcoming). This is the case even though my version of the “theory” idea uses the concept of a model, which is often appealed to by simulationists (see Goldman’s
92
Peter Godfrey-Smith
contribution to this collection). The contrast between the two general views can be restored. In particular, there is the “one-device-or-two?” question that can be restated once the theory–theory option has been formulated in my way. As before, we can distinguish: (i) using one’s own decision-making machinery as a sort of physical model for other people, from (ii) having an abstract theoretical model for use on other people, in addition to one’s own decision-making machinery. There might be subtle reasons why this apparent opposition does “collapse,” but the modification I have made to the theory–theory here does not itself imply such a collapse.
4 Co-evolutionary and other connections In the remainder of this essay I will generally assume that the picture of folk psychology outlined in the previous section is roughly correct. Only some of what I will say, however, depends on the more contentious parts of the ideas in the previous section. If the claims made in the first three sections of this essay are right, we need to think about the evolution of two structures or sets of traits. One, is the wiring of the brain itself; the other is the psychological model that we use to deal with each other. In both cases there will be evolution at different scales, affecting traits at different levels of “grain,” ranging from basic architectural features all the way down to individual quirks. And in each case, there will presumably be different processes of change operating in some mixture or combination: biological evolution, cultural evolution, and various forms of within-generation adaptation such as trial-and-error learning. We can then ask: What sorts of connections might there be between these processes of change? How might the evolution of the wiring and model be connected to each other? I count three main kinds of influence. First, the model is being shaped by how well it deals with the wiringand-connection facts – primarily, by how well it enables interpreters to predict the behavior of those around them. This shaping of the model might occur by various different processes, from data-driven learning to mutation and selection, with others in between.6 Second, behavioral patterns, and hence the wirings-and-connections that give rise to behavior, encounter selection pressures deriving from a social environment that includes the current state of the folk psychological model. Third, there might be a kind of internalization of the model into the wiring of the mind. Aside from its role in interpretation, the folk psychological model might become a cognitive tool facilitating some kinds of reasoning, planning, and deliberation. This is a more tendentious possibility, which I will discuss in a very cautious way. This possibility is linked to interesting issues involving the influence of public symbols on thought. So there is one (fairly obvious) way in which the present state of the wiring exerts influence on the model, and two (less obvious) ways in which
The evolution of mental representation 93 the opposite kind of influence might occur. When we have influence in both directions, we have a kind of co-evolution. This should occur to some extent, but it is important not to go overboard. The situation is far from symmetrical. Suppose the co-evolutionary relationships are significant. Then quite a lot may depend on a general issue about the character of human social evolution: the relative importance of cooperative as opposed to antagonistic relationships between individuals (Godfrey-Smith 2002). Here “cooperation” includes the case where there is strong competition within groups and antagonism to outsiders. That issue should have general importance for the following reason. In social interactions, when one party is interpreting the behavior of another, the interpreter wants to get the right answer. But does the interpretee want to be accurately interpreted and predicted? Game theory shows the occasional importance of being hard to predict. If many of these interactions tend to be non-cooperative, then a sort of arms race may result, although an unusual one. Qua interpreter, you want the folk-psychological model to work well; qua actor, you want the model to fail. Both traits evolve, with the model chasing the wiring. The outcome will depend on various cost-benefit factors, as well as the rates of change possible for each trait. People who believe that folk psychology is inflexibly modular, in particular, should be alert to the possibility that this arms race may produce a folk psychology module that is not very accurate, or is inaccurate in some particularly antagonistic contexts. Some may see the possibility for an adaptationist argument for a simulationist view of folk psychological interpretation here. Suppose interpretees do not want to be easily predicted, and selection on behavioral mechanisms favors opacity from the point of view of interpreters. Then if you use a simulationist method of dealing with others, you cannot be left behind. The interpreter mechanism automatically tracks the evolution of the reasoning mechanisms, at least with respect to basic features, because the two share the same processing device. One token of the reasoning mechanism is used as a physical model of another token of more or less the same type. So we might expect the co-evolutionary relationships between inner wiring and folk psychological interpretation to be affected by: (i) the mixture of cooperation and antagonistic competition found in crucial periods of human evolution, and (ii) the extent to which simulationist options have been utilized. Lastly I will discuss some more controversial ways in which there might be an interaction between the cognitive wiring and a folk psychological model. In these cases the word “co-evolution” is misleading; we are not talking about a situation where each trait comprises part of the selective environment of the other, but a more direct influence. These are possibilities in which there is the internalization of features of a folk-psychological model that evolved as a social tool. This is an example of a more general type of possibility that arises in
94
Peter Godfrey-Smith
different forms in various areas of psychological investigation. Most broadly, these are possible cases of the internalization into individual cognition of socially derived structures, such as public language. I will look at several cases of this “internalization” possibility here, beginning with simpler ones that do not involve folk psychology, and then turning to some that bear more directly on the topic of this essay. I should emphasize that my aim in this section is to explore some in-principle possibilities and relate them to philosophical issues – not to make empirical bets. I begin with a discussion of a comparatively simple but very interesting case, investigated in some striking experiments by Sarah Boysen and her coworkers.7 Suppose you offer a chimp two trays of candies, with different numbers of candies on each. The chimp chooses a tray by pointing. But the chimp then gets the tray it did not point to, and the other is lost. Chimps, at least in Boysen’s lab setting, seem unable to learn to make the optimal choice here, despite hundreds of trials. They tend to choose the tray with more candies. And performance gets worse as you make the difference between the numbers of candies larger. These chimps had been also taught to use Arabic numerals to count objects, in the course of other experiments. So the problem can be presented to the chimps with numerals instead of candies. You present two trays with, say, “2” and “4” on them. If the chimp chooses “2” it gets four candies; if it chooses “4” it gets two candies. Chimps did well on this task, and tended to do well immediately, as if they’d already learned the rule on the candy-tray trials but could not execute the right choice with the candies. As Boysen says, it seems that the intrinsic attractiveness of the candies interferes with rational choice. A basic or primitive perception–action mechanism is interfering with information processing. The use of symbols frees the chimp from the constraints of this primitive mechanism, and the chimp can then exercise the right choice. We can think of more adventurous and less adventurous versions of this general idea.8 According to less adventurous options, there is nothing wrong with the chimps’ cognitive processing in the candy-versus-candy (“c/c”) condition. But in the c/c condition, choice is being interfered with by an old and inflexible system in the chimp. Essentially, it’s a performance problem. The number-versus-number (n/n) condition frees the chimp from this interference. According to more adventurous options, a different kind of internal processing is made possible by the use of symbols. As Boysen sometimes expresses it, symbols make possible more abstraction, more selection from among the different features of a stimulus. This makes possible a choice that is more information-driven, more rational in a decision-theoretic sense. It is hard to tell these options apart, both conceptually and empirically. The fact that the chimps seem to learn the reinforcement rule in the c/c regime, but do not act on it, tends to suggest the less adventurous option, a performance breakdown. But some of the follow-up experiments by Boysen and her co-workers perhaps favor more adventurous options.
The evolution of mental representation 95 The first follow-up is to replace the candy/candy condition with a rock/rock condition (1996). Is it just the intrinsic desirability of the candies that is causing the trouble, or are there deeper differences between thinking in terms of concrete objects and thinking in terms of symbols? So, let the number of rocks on each tray represent the same number of candies. Can the chimps learn to choose the smaller tray of rocks in order to get the larger number of candies? Surprisingly, they were not able to do this. They did almost as badly with the rock/rock condition as they did with the c/c one. The second follow-up is to use mixed pairs of trays, one tray with a numeral and one with candies (a c/n choice). Will they treat this more like c/c or more like n/n? The answer is: more like n/n. The presence of one numeral was enough to get them on track to make the right choice. This seems quite surprising; we might expect the chimps to be strongly biased toward choosing candies in any amount, over a numeral. Certainly, the less adventurous hypotheses, proposing an interference with performance due to the very sight of the candies, might lead us to expect this. But overall, the c/n results were similar to the n/n results. (Individual differences were larger for the c/n case though.) The conclusions drawn by Boysen and her co-authors do not explicitly discuss the difference between more and less adventurous options. But the later papers seem to express the conclusions in ways more friendly to the “different processing” option (especially 1999). This seems reasonable to me, given the results from the two follow-ups. These results do suggest that the chimps are engaging in a different kind of processing when they are dealing with numerals. Presenting the problem symbolically makes possible more flexible, information-driven, and decision-theoretically rational behavior. I am not seeking to resolve these alternatives here, either conceptually or empirically. But it will be clear why I think these experiments are so suggestive, with respect to issues about the influence of public symbol systems upon thought itself. I now move to a second stage in the presentation of these “internalization” possibilities. As Boysen emphasizes, her experiments are not looking at the psychological role of language skills. Most of the chimps she worked with in these experiments had no language training, though they had been taught to use numerals symbolically. But let us think, in a similar spirit, about possible roles for language itself. The idea that the structure of a person’s public language may strongly influence the structure of their cognition is famous and controversial. Benjamin Lee Whorf (1956) defended an extreme version of this view. The idea was often treated very skeptically by psychologists in the decades since then, and Whorf himself began to approach “crackpot” status. But there has been a recent reconsideration of the possibility of quasi-Whorfian effects of language on thought. This reconsideration appears within a very different overall theoretical framework from Whorf’s, in cognitive developmental psychology (Bowerman and Levinson 2001). How might the acquisition of
96
Peter Godfrey-Smith
language affect development of the overall structure of cognitive processing, especially in non-linguistic areas? According to Alison Gopnik, for many years there was surprisingly little data bearing directly on this question, in part because there are so many causal possibilities that must be controlled for. What is wanted, but rare, is evidence of specific links between linguistic and non-linguistic development. Gopnik (2001) reports some evidence of this kind, however. Gopnik and her co-workers found some specific correlations between the appearance of words in a child’s vocabulary and the passing of non-linguistic tests that have a natural-looking conceptual connection to the meanings of the words. (These children were aged 15 to 21 months.) For example, there was a correlation between the appearance of “gone” in a child’s vocabulary and the child’s passing object-permanence tests. Then there was a correlation between the appearance of “uh-oh” in the child’s vocabulary, and passing means-end tests. The word and the associated skill generally seemed to come within weeks of each other in each individual case, but there was variation across children in exactly when particular word/skill pairs appeared, and in the order of different acquisitions. The acquisition of “gone” did not, for example, correlate with means-end skill, and “uh-oh” did not correlate with object permanence. It was also not the case that the word always appeared before, or after, the associated non-linguistic skill; there was no clear directionality in the situation. Perhaps, Gopnik says, a two-way interaction seems likely. Gopnik also discusses some cross-linguistic work that suggests similar effects (and see also Gopnik and Meltzoff 1997: chapter 7). A related line of argument is presented by Elizabeth Spelke (2003). Spelke focuses on the possible cognitive role of highly general features of human language, however, like combinatorial structure itself. So there may be the beginning of a revival, in a very different theoretical framework, of the idea that learned features of public language affect the structure of thought. Issues about the internalization of public language bear fairly directly on some of our questions about folk psychology. Here is a fairly tendentious possibility that illustrates the connection. Suppose it can be argued that in non-linguistic animals, there is no psychological reality to propositions, as representational units. This claim, of course, can be contested. But accept it for the sake of argument. Then we can make the following comparison. Compare a non-linguistic, perhaps holistic mental modeling of the world, with the organization of representations into proposition-like units. In language, the constraints of communicability and learnability impose a kind of organization on how the world is represented. Subject-predicate structure is an obvious example. These organizational features might be very useful in thought itself. An internalization of a linguistic format of representation might result in a range of advantages, making it possible to keep track of long chains of reasoning and deliberation, for example. Any internalization of language is relevant to the relations between the
The evolution of mental representation 97 folk psychological model and the inner wiring. If thought becomes propositionally organized by language, this fact alone creates a new element of resemblance between the model and the wiring, for the belief and desire ascriptions of folk psychology treat representation in a propositional way, via their embedded “that-clauses.” Without the internalization of language into thought, there might be little reason to think that this feature of folk psychology corresponds to anything in the inner wiring. But the internalization of language might make propositional content psychologically real, at least with respect to some parts of the mind. Lastly, I will discuss an even more tendentious and direct way in which internalization processes might affect the status of folk psychology.9 Here I refer to the internalization of folk psychology itself, the internalization of the folk psychological model as an organizing structure for complex deliberation and thought. One reason this possibility is interesting to consider is the fact that it exhibits the opposite explanatory direction from simulationism about folk psychology. According to simulationism, a pre-existing inner reasoning device is pressed into service as a predictive tool, a tool that works by exploiting the similarities between individuals. But in principle, the opposite direction of influence is also possible, at least to some extent: a predictive and interpretive tool, born out of social and linguistic interaction, might become a resource for internal processing. Is this any more than an in-principle possibility? Here I will sketch another body of interesting work in cognitive developmental psychology (very helpfully reviewed in Perner and Lang 1999). There is a fairly well-established correlation in early childhood between the development of “theory of mind” (folk psychological skills) and what is called “executive function,” or “executive control.” Executive function is a kind of “higher-level action control,” as seen in planning, coordination, and the intelligent suppression of natural or previously reinforced responses in new situations (Perner and Lang 1999: 337). The problems seen in Boysen’s chimps (discussed above) are related to executive function problems; some of the standard experiments used to test executive function are somewhat similar to the candy-choosing task Boysen gave her chimps. For example, a child might be asked to say “day” when shown a picture of a night scene and “night” when shown a day scene. Theory of mind ability is usually assessed, and is assessed in the work discussed here, with a “false belief task.” These tasks test, in various ways, whether the child is willing to recognize the existence of beliefs about the world that are false. Suppose we accept that the appearance of “theory of mind” and the appearance of executive function are associated in child development. What sort of connection between the two is responsible for this association? Perner and Lang survey various options, of which two are most relevant here. First, it might be that theory of mind is a pre-requisite for executive function. Second, it might be that executive function is a pre-requisite for theory of
98
Peter Godfrey-Smith
mind. Perner and Lang find these two “functional dependence” options to be better supported than some other possibilities they examine, but the directionality of the connection is not at all clear. Perner’s own preferred hypothesis is especially relevant: Perner suggests that both theory of mind and executive function require that the child understand that action is caused by inner states with representational properties. In the executive function tasks, the child must often suppress something in his or her own mind, in a top-down way, recognizing that this habit or tendency is inappropriate for the situation. An understanding that one’s mind is populated by distinct mental states with causal powers is part of what makes this selective suppression possible. So Perner’s hypothesis supposes that a general “theoretical” understanding of the nature of thought helps the child to better manage his or her own thoughts and actions.10 This is not quite the kind of “internalization” possibility I introduced earlier. In this area, we should distinguish internalization possibilities from recognition possibilities. For Perner, as I understand his view, the child is seen as acquiring accurate knowledge of a pre-existing fact about how minds work, the fact that they contain inner representations with causal powers. It is not that the child reorganizes his thinking in a way that makes the folk-psychological theory true of him. But we can see, I think, the possibility of variations on Perner’s hypothesis that would constitute genuine internalization hypotheses. And it is important that internalization of folk psychology need not be an all-or-nothing matter. There might be some internalization phenomena mixed in with other very different phenomena. Some readers might find hints of this possibility in introspection. Consider internal deliberative monologues, of the kind that one can engage in during really serious decision-making. “OK, what do I really want here? . . . That means my first priority should be. . . . But what do I really expect to happen then . . .?” Different philosophical and psychological theories about the mind make very different claims about how seriously we should take these episodes. They may well just be rationalizing epiphenomena, rather than an important kind of mental processing. But if we were to accept that these episodes are genuinely important in deliberation, there is a possibility that these episodes are affected by a folk-psychological model of thought that is partly an importation from the public domain of social and linguistic interaction. If that happened, then we would again have a situation in which the folk psychological model and the inner processing have come to resemble each other to a higher degree, but via a change to the inner processing rather than (or as well as) a change to the model. It would be especially interesting if explicit adherence to the folkpsychological model made possible a shift toward more rational, less heuristic-driven internal processing. Indeed, one role that the folk psychological model of thought sometimes has in public discussion is a normative role; it is taken to specify how people should reason and deliberate, as well as how they
The evolution of mental representation 99 actually do these things. So the self-conscious practical reasoner may sometimes try to conform to folk-psychological patterns of mental organization. When one forces oneself to list one’s relevant beliefs and preferences, and combine them explicitly, the result might indeed be a more rational pattern of thought, a pattern more attentive to logical relationships and the totality of relevant information. We might thereby overcome whatever dependence we may have on the “fast and frugal heuristics” emphasized by Gigerenzer, Todd, and their colleagues (Gigerenzer et al. 2000), just as Boysen’s chimps overcame the effects of the sight of candies, and Perner’s children improved their executive function. My aim in this section has been to discuss what I take to be a range of possible connections between the evolution of the inner wiring and the evolution of our interpretive model. So I have considered some obvious connections and some highly tendentious ones. There is a tradition in philosophy of advancing very ambitious views about the influence of public symbolism on thought. Those advancing such views have often paid insufficient attention to the great range of possible hypotheses in this area. (One of my own favorite philosophers, John Dewey (1929/1958), is certainly guilty of this.) As emphasized at the beginning of this section, when we look at the simultaneous evolution of the wiring of the mind and our interpretive tools, we are also confronted with a problem that is made complex by the many different “grains” at which traits on both sides can be identified; the evolution of basic architecture is not the same thing as the evolution of individual differences. And as emphasized at the beginning of this essay, we also confront a long-standing and deeply difficult set of “eye-of-the-interpreter” problems about our familiar tools for psychological description. This domain of inquiry remains, I would say, in a state of considerable entanglement.
Acknowledgment Thanks to António Zilhão for his work in organizing the Porto conference in 2002, and to all those present for a very helpful discussion of this essay. Thanks to Alison Gopnik and Daniel Stoljar for comments and correspondence. I have benefited from unpublished work on related topics by Michael Weisberg and Amol Sarva. I am especially grateful to Kim Sterelny for lengthy and close collaboration on all these issues.
Notes 1 The term is standardly used for interactions between the evolution of two species. See Futuyma (1998) especially his Glossary. 2 See also Godfrey-Smith (2002) and Sterelny (2003) on this distinction. 3 See P.M. Churchland (1981), P.S. Churchland (1986), and especially Churchland and Churchland (1983). 4 For a good critical discussion of the role of “models” in the philosophy of science, emphasizing their multiplicity and not getting misled by the logician’s
100
5
6
7 8 9
10
Peter Godfrey-Smith
sense of “model,” see Downes (1992). The treatment of models in Giere (1988) is perhaps the most useful positive account for my purposes here. What of the idea of a “scientific” or “naturalistic” concept of mental content, which might replace the folk psychological concept? I suggest that we need to rethink this well-known contribution that philosophy has tried to make to cognitive science. The most likely future, I suggest, is one in which the idea of a single “naturalistic theory of content” is replaced by the idea of a family of naturalistically specifiable relations being used in cognitive science. Relations in this family will have a range of resonances and similarities with those used in “folksemantic” interpretive practices. Indeed, in some ways this future is well and truly with us already. In cognitive science, a range of naturalistic relations between mental structures and environmental conditions are found to be useful, including informational relations in the sense of Shannon and Dretske, teleological or teleonomic relations involving natural selection in roughly the sense of Millikan, and various kinds of similarity and isomorphism. The acquisition of folk psychology has been hypothesized to involve everything from the unfolding of an innate skill shaped by natural selection, to a process of data-driven science-like individual inference (Fodor 1987; Gopnik and Meltzoff 1997). Cultural transmission involving “scaffolded learning” is an important intermediate option (Sterelny 2003). These experiments and their philosophical importance were brought to my attention by Susan Hurley (2003). Hurley (2003) discusses a similar breakdown of explanatory options here. Gopnik briefly mentions something like this possibility herself, at the end of her essay on the effect of language on thought (2001). She illustrates this possibility with a maxim due to the French essayist François La Rochefoucauld, that no one would fall in love if they hadn’t read about it first. In a recent paper (Perner et al. 2002), Perner and his co-workers express some new doubts about Perner’s hypothesis, based on an unexpected experimental result.
Bibliography Bowerman, M. and S. Levinson (eds) (2001) Language Acquisition and Conceptual Development, Cambridge: Cambridge University Press. Boysen, S.T. and G.G. Berntson (1995) “Responses to Quantity: Perceptual Versus Cognitive Mechanisms in Chimpanzees (Pan troglodytes),” Journal of Experimental Psychology: Animal Behavior Processes, 21: 82–86. Boysen, S.T., G.G. Berntson, M.B. Hannan and J.T. Cacioppo (1996) “QuantityBased Interference and Symbolic Representations in Chimpanzees (Pan troglodytes),” Journal of Experimental Psychology: Animal Behavior Processes, 22: 76–86. Boysen, S.T., K.L. Mukobi and G.G. Berntson (1999) “Overcoming Response Bias Using Symbolic Representations of Number by Chimpanzees (Pan troglodytes),” Animal Behavior and Learning, 27: 229–235. Churchland, P.M. (1981) “Eliminative Materialism and the Propositional Attitudes,” Journal of Philosophy, 78: 67–90. Churchland, P.M. and Churchland P.S. (1983) “Stalking the Wild Epistemic Engine,” Noûs, 17: 5–18. Churchland, P.S. (1986) Neurophilosophy: Toward a Unified Science of the Mind/Brain, Cambridge, MA: MIT Press.
The evolution of mental representation 101 Davies, M. and T. Stone (eds) (1995) Folk Psychology: The Theory of Mind Debate, Oxford: Blackwell. Davies, M. and T. Stone (forthcoming) “Mental Simulation, Tacit Theory, and the Threat of Collapse,” Philosophical Topics. Dennett, D.C. (1978) Brainstorms. Philosophical Essays on Mind and Psychology, Cambridge, MA: MIT Press. Dennett, D.C. (1981) “Three Kinds of Intentional Psychology,” reprinted in D.C. Dennett, The Intentional Stance, Cambridge, MA: MIT Press, 1987. Dennett, D.C. (1987) The Intentional Stance, Cambridge, MA: MIT Press. Dewey, J. (1929/1958) Experience and Nature, revised edition, New York: Dover. Downes, S. (1992) “The Importance of Models in Theorizing: A Deflationary Semantic View,” in D. Hull, M. Forbes and K. Okruhlik (eds), PSA 1992, vol. 1, East Lansing: Philosophy of Science Association. Dretske, F. (1981) Knowledge and the Flow of Information, Cambridge, MA: MIT Press. Dretske, F. (1988) Explaining Behavior, Cambridge, MA: MIT Press. Fodor, J.A. (1987) Psychosemantics, Cambridge, MA: MIT Press. Futuyma, D. (1998) Evolutionary Biology, 3rd edn, Sunderland, MA: Sinauer Associates. Giere, R. (1988) Explaining Science: A Cognitive Approach, Chicago, IL: Chicago University Press. Gigerenzer, G., P. Todd and the ABC Research Group (2000) Simple Heuristics That Make Us Smart, Oxford: Oxford University Press. Godfrey-Smith, P. (2002) “On the Evolution of Representational and Interpretive Capacities,” The Monist, 85: 50–69. Godfrey-Smith, P. (2003) “Folk Psychology Under Stress: Comments on Hurley’s ‘Animal Action in the Space of Reasons,’ ” Mind and Language, 18: 266–272. Godfrey-Smith, P. (2004) “On Folk Psychology and Mental Representation,” in H. Clapin, P. Staines, and P. Slezak (eds), Representation in Mind: New Approaches to Mental Representation, Amsterdam: Elsevier Publishers, 147–162. Gopnik, A. (2001) “Theories, Language, and Culture: Whorf Without Wincing,” in M. Bowerman and S. Levinson (eds), Language Acquisition and Conceptual Development, Cambridge: Cambridge University Press, 45–69. Gopnik, A. and A. Meltzoff (1997) Words, Thoughts, and Theories, Cambridge, MA: MIT Press. Gould, S.J. and R.C. Lewontin (1979) “The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Program,” Proceedings of the Royal Society, London, 205: 581–598. Hurley, S. (2003) “Animal Action in the Space of Reasons,” Mind and Language, 18: 231–256. Jackson, F. and P. Pettit (1990) “In Defence of Folk Psychology,” Philosophical Studies, 59: 31–54. Lycan, W.G. (1988) Judgment and Justification, Cambridge: Cambridge University Press. Millikan, R.G. (1984) Language, Thought, and Other Biological Categories, Cambridge, MA: MIT Press. Papineau, D. (1993) Philosophical Naturalism, Oxford: Blackwell. Perner, J. and B. Lang (1999) “Development of theory of mind and executive control,” Trends in the Cognitive Sciences, 3: 337–344.
102
Peter Godfrey-Smith
Perner, J., B. Lang, and D. Kloo (2002) “Theory of mind and self-control: More than a common problem of inhibition,” Child Development, 73: 752–767. Ramsey, W., S. Stich, and J. Garron (1991) “Connectionism, Eliminativism and the Future of Folk Psychology,” in J. Greenwood (ed.), The Future of Folk Psychology: Intentionality and Cognitive Science, Cambridge: Cambridge University Press. Rey, G. (1997) Contemporary Philosophy of Mind: A Contentiously Classical Approach, Oxford: Blackwell. Spelke, E. (2003) “What Makes Us Smart? Core Knowledge and Natural Language,” in D. Gentner and S. Goldin-Meadow (eds), Language in Mind: Advances in the Study of Language and Thought, Cambridge, MA: MIT Press. Sterelny, K. (1990) The Representational Theory of Mind: An Introduction, Oxford: Blackwell. Sterelny, K. (2003) Thought in a Hostile World, Oxford: Blackwell. Stich, S.P. (1983) From Folk Psychology to Cognitive Science: The Case Against Belief, Cambridge, MA: MIT Press. Stich, S.P. (1992) “What is a Theory of Mental Representation?” Mind 101, reprinted in Stich and Warfield (1994). Stich, S.P. and T.A. Warfield (eds) (1994) Mental Representation: A Reader, Oxford: Blackwell. Stone, T. and M. Davies (eds) (1996) Mental Simulation: Evaluations and Applications, Oxford: Blackwell. Whorf, B.L. (1956) Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf (ed. J.B. Carroll), Cambridge, MA: MIT Press.
5
Innateness and brain-wiring optimization Non-genomic nativism Christopher Cherniak
Our experimental work in computational neuroanatomy has uncovered distinctively efficient layout of wiring in nervous systems. When mechanisms are explored by which such “best of all possible brains” design is attained, significant instances turn out to emerge “for free, directly from physics.” In such cases, generation of optimal brain structure appears to arise simply by exploiting basic physical processes, without the need for intervention of genes. An idea that physics suffices here – of some complex biological structure as self-organizing, generated without genomic activity – turns attention to limiting the role of the genome in morphogenesis. The familiar “nature/nurture” alternatives for origins of basic internal mental structure are that it arises either from the genome or from invariants of the external environment. A third alternative is explored for the neural cases here, a nongenomic nativism. The study of minimization of neural connections reveals interrelations between the Innateness Hypothesis and theses associated with the Central Dogma of genetics. The discussion shifts from the usual focus upon abstract cognitive structure instead to underlying brain hardware structure, to hardwired neuroanatomy.
1 Nativism The familiar, prototypical Innateness Hypothesis is: abstract mental structure – e.g., relating to knowledge of language (Stich 1975) – is intrinsic to an organism, for instance genetically determined. But similarly also of course for underlying biological hardware, that is, brain structure, neuroanatomy. A brain is no more plausibly a blank slate with unlimited plasticity in response to its environment than is a mind. As originally (narrowly) construed, the Central Dogma of genetics was: Genetic information flows one way, from the genome outward to its cellular milieu. That is, DNA → RNA → proteins (Crick 1958; Watson 1965). A recent example that may help to define by contrast is the challenge to the Central Dogma perceived in “prion” diseases (e.g., Kuru, Creuzfeld-Jacobsen Syndrome, Bovine Spongiform Encephalopathy (BSE)); prions are non-DNA
104
Christopher Cherniak
containing particles that appear capable of replication – and infection – in mammalian central nervous systems (Keyes 1999). To the Central Dogma we can also add a tacit “Pre-Central Dogma”: the genome information encodes an instruction-set, blueprint, representation, for construction of the organism (e.g., Watson et al. 1987; DePomerai 1990). How complete can such a blueprint be? Can it approximate for biological structure-generation a kind of Maxwell’s demon micromanaging busybody? The genome instructions cannot be total. A classical regress argument from philosophy of mind can be adapted here, for example, along lines of Ryle (1949). To stop the regression, some basic level of hardware capacities must be assumed – a Lego/Meccano/Erector-set repertoire of fundamental operations, e.g., the basic physical laws of our universe.
2 Optimized neuroanatomy However, a more extensive a-genomic domain emerges for some large-scale neuroanatomical structure. The framework is a “Best of all possible brains” hypothesis, in particular, with respect to generative principles to “Save wire.” If the brain had an unbounded supply of connections, there would be no pressure to optimize employment of wiring. But in fact connection resources are limited. And since the essential function of a brain is to connect, to make connections, connections seem to have an extremely high value, and so their use has been perfected to a very fine-grained degree. Optimized brain anatomy is detectible in minimized wiring at both micro and macro scale. In some cases, the refinement is discernible down to a best-in-a-billion level (Cherniak 2000; Cherniak et al. 2004). Such results begin to approach some of the most precise confirmed predictions in neuroanatomy; why such peculiarly fine-grained optimization should arise, itself, in turn, needs explaining. These wiring optimization problems, typically encountered in computer microchip design, are believed to be intrinsically computationally intractable (“NP-complete”) (Lewis and Papadimitriou 1978, Garey and Johnson 1979, Stockmayer and Chandra 1979), unavoidably requiring a combinatorially exploding amount of computation to solve. For instance, neuron arbor morphogenesis behaves like flowing water (Cherniak 1992; Cherniak et al. 1999). ii
ii
“Neural fluid mechanics”: a fluid-dynamical model for minimized walldrag of pumped flow through a system of pipes will predict the geometry of a variety of dendrite and axon structures almost as well as it predicts configuration of river drainage networks. Water flow in branching networks in turn acts like a tree with segments composed of weights-cords-and-pulleys, that is, vector-mechanically; so also do the neuron arbors. The result is that such axons and dendrites globally minimize their total volume to about 5 percent of optimum for interconnecting their terminal loci (see Figure 5.1).
Innateness and brain-wiring optimization 105 Dendrite
Axon
River
Actual 100 µ
25 µ
1m
Optimal
Figure 5.1 Complex biological structure arising directly from basic physics: nerve cell anatomy behaves like flowing water, and waterflow in turn acts like a tree composed of springs. “Instant arbors, just add water.” The dendrite (input) and axon (output) branchings are portions of mammalian neurons, the river is an experimentally generated drainage network. In each of the three cases, the actual structure is within a few percent of the optimal, minimumvolume configuration shown; this is evidence of goodness of fit of the physical optimization model (Cherniak et al. 1999). Diagram: Mark Changizi.
As another instance, vector mechanics suffices for optimization of placement of the ganglia (neural sub-networks) of the roundworm Caenorhabditis elegans (Cherniak 1994a; Cherniak et al. 2002). “The web that weaves itself”: Some of the above self-organizing model of arbor optimization also can serve as a mechanism of ganglion placement in the roundworm nervous system. Our prior work (Cherniak 1995) had found that the actual observed positioning of the ganglia in the worm was optimal, in that it required, out of 40 million alternative layouts, the least total wire length for the animal’s interconnections. We have now constructed a force-directed placement simulator, Tensarama, where each of the worm’s connections behaves like a micro weights/cords/pulley system. This vector-mechanical net outputs the actual minimized ganglion layout by converging on energy-minimization equilibrium at the actual positioning of the ganglia – without much susceptibility to local-minima traps (see Figure 5.2). So, the hypothesis is that optimization accounts for a significant extent of observed anatomical structure. An additional hypothesis is that simple physical processes are responsible for some of this optimization. Combined, then, the picture is: Physics → Optimization → Neural structure That is, self-assembling brain structure “for free, directly from physics,” i.e., generated via simply exploiting basic physical processes, without need for
106
Christopher Cherniak actual.mtx TENSARAMA Head 0 0 1 0 5 0
1 5
2 0
2 5
3 0
3 5
4 0
Tail 4 5 5 0 Tetrons
PH (100.000000) AN (300.000000) RNG (440.000000) DO (506.000000) LA (564.000000) VN (744.000000) RV (948.000000) VCa (1856.000000) VCp (3856.000000) PA (4726.000000) DR (4810.000000) LU (4884.000000) Final layout popped out after: 100,000 iterations [@final 100] Tension Constant: 0.010000 Total Wirecost: 87802.750000 um
Figure 5.2 Runscreen for “Tensarama,” a force-directed placement algorithm for optimizing layout of ganglia of the nematode Caenorhabditis elegans: that is, minimizing total length of interconnections. This vector mechanical simulation represents each of the roundworm’s ⬃1,000 interconnections as a sort of micro-spring acting upon the horizontally movable ganglia (nervous system clusters) “PH,” “AN,” etc. (Connections themselves do not appear on runscreen, nor fixed components such as sensors and muscles.) The above screendump shows the final configuration of the system after 100,000 iterations (re-update cycles for forces and locations): the system has found the global minimum-cost positioning of the ganglia (with about 8.7 cm total of wire) – which is also the actual layout. In this way, physics suffices to generate this neuroanatomical structure, out of ⬃40 million alternative possible configurations (Cherniak 1995; Cherniak et al. 2002).
intervention of genes. Physics suffices; complex biological structure as selforganizing. So, besides conserving connections, another possible explanation for the extreme level of optimization observed might be just that it is a sidebenefit of complex biostructure hitching a free ride from physics – in particular, for minimal-wiring neuroanatomy for dendrites, axons, and roundworm ganglia (Cherniak et al. 2002). Now also we have observed even finer wiring optimization of cerebral cortex layout of cat and macaque monkey (Cherniak 2000; Cherniak et al. 2004). Thus, a harmony of physics and neuroanatomy via fluid-mechanics
Innateness and brain-wiring optimization 107 and mesh of springs; a sort of plate tectonics of the brain. Some of the methodological significance here is that discrete-state processes (e.g., as in mutation of genomic sequences (Mitchell 1996)) are not required; continuous-process models can suffice for neural wiring optimization. This work falls in the tradition of seeking simple underlying mathematical form in complex aspects of Nature, ranging from Pythagoras through D’Arcy Wentworth Thompson (1917/1961).
3 Non-genomic nativism Hence, contrary to a “nature/nurture” dichotomy, a third possibility-zone – “nature” here includes not only biology, but also mathematics and physics. A via media between two extremes: (a) brains of course cannot grow like crystals, entirely non-genomically; (b) yet life must still play by the rules of the game, subject to mathematical and physical law. So, there is a division of labor between the genome and simple physical processes. The organism’s genome is written not upon a tabula rasa, but upon a specifically pre-formatted, pre-printed form or slate, already inscribed with a significant proportion of structural information. Perhaps such a picture of interpenetrating domains clashes with some metaphysics hardwired in human beings, with a category structure that draws a bright line between the animate and the inanimate. Also, the expression “non-genomic nativism” may sound like a bit of a solecism to sharper ears. (Piaget: “That which is inevitable does not have to be innate” (1970).) Of course, label choice is not of interest. The underlying point is that some complex biological structure – comparable in extent to genomespecified structure – is intrinsic, inborn, yet not genome-dependent. One rationale for organisms to exploit such free anatomy is the vast mismatch of scale between brain structure and genome structure. The human brain is commonly characterized as the most complex physical structure known, yet the total information representation capacity of the human genome is comparatively small. After allowing for noncoding introns (about 95 percent of the total), the amount of brain-specific DNA available might amount to as little information as is contained in a desk dictionary (about 50,000 entries, ⬵100 Mb total). Hence, information on brain structure must pass through a “genomic bottleneck” constraint on DNA informationrepresentation capacity (Cherniak 1988, 1992). Brain structure for free lowers this genome information-carrying load. One caveat in interpreting genome information limitations concerns data compression. The key idea, from algorithmic information theory (Li and Vitanyi 1997; Chaitin 1987), is: for a given symbol sequence, what is the smallest program of a given format that will generate the sequence (and then halt)? A related concept (intimately connected with unsolvability of the Halting Problem) is that of, e.g., a five-state “Busy Beaver” Turing machine – a ten-line Turing machine program that generates the largest number of
108
Christopher Cherniak
1’s for its size on an initially all-blank tape and halts; the most productive such program presently known outputs 4,098 1’s (Dewdney 1993) (see Figure 5.3). Six-state Turing Machines are now known with much vaster output. Thus, one can ask, what is the minimum such program that emits the text of Anna Karenina, or the bitstream soundtrack of a Beethoven quartet performance, or that generates a given genome sequence? One could imagine such information compression occurring on the genome to help pack in extensive brain structure specification. However, it seems a mistake to go on to conclude that this data compression is in practice virtually unlimited; for compression/decompression itself entails computation costs. For instance, the above five-state busy beaver candidate requires over 11 million steps to generate its string of 1’s. So, no free lunch here: storage space in effect is traded off for encode/decode time.
4 Misgivings An uneasy “biology from physics” coda may help in articulating nongenomic nativism. Proceeding from static structure to behavior, physiology: that is, let us explore instances of complex biological functions similarly originating directly from simple physical processes. For example, as explained above, Tensarama yields the neuroanatomical layout of C. elegans by simulating each of the worm’s thousand connections as approximately a [CST
INP]
[OUT
MCV
NST]
1
0
1
L
2
1
1
L
1
0
1
R
3
1
1
R
2
0
1
L
1
1
1
R
4
0
1
L
1
1
1
R
5
0
1
R
0
1
0
R
3
2 3 4 5
Figure 5.3 Turing machine program that has been the contender for title of five-state “busy-beaver” – maximally productive TM program – without challenge for over a decade (Dewdney 1993). It takes 11,798,826 steps to generate 4,098 “1”s on an initially blank tape before it halts. Thus, the program illustrates ⬃1:100 data-compression, which genome encoding might exploit. However, the program also illustrates the high computation cost of message “decompression” (a human being with pencil and paper would take about a year to complete the computation, a TM simulator (Cherniak 1990) takes a few hours).
Innateness and brain-wiring optimization 109 microspring; as the system proceeds to vector-mechanical equilibrium, it weaves the ganglia into the actually observed layout – a one-in-ten-million achievement. Of course, such complex anatomy-generation can in itself be pictured instead as a slow-motion intelligent behavior. A corresponding behavioral instance would be to derive realistic locomotion behavior similarly from a mesh of springs. In this connection, the “Sodaconstructor” package yields strikingly lifelike crawling motion for a simulated “worm” device, walking for a generic quadruped, etc. (Sodaplay.com). Yet all this animation is generated from what amounts to again just a mesh of springs (with masses), driven by a single simple sine wave. Another case is a random-path searcher device: a simple smooth sphere containing a motor-driven counterweight will robustly outperform robots with conventional obstacle detection and evasion chip software. Simulated annealing is an example of a system using only basic physical processes (thermodynamic temperature-schedule models) to solve search problems with interesting structure that have local minima traps (Kirkpatrick et al. 1983). Also suggestive is the idea of mechanisms that exploit chaos-theoretic (nonlinear dynamical) phenomena – e.g., intricate behavior from a simple device, such as a jointed pendulum (Thompson and Stewart 1986). One question is how far such a for-free-from-physics approach can proceed; what is the most complex behavior so derivable? (For examples of intricate behavior from simple rotational dynamics, see Walker 1985.) So, the obvious leap to a conclusion would be a sunny picture, of complex neuroanatomical structure “for free, directly from physics.” And similarly also of complex behavior derived directly from basic physical processes, the latter again partaking of the Pythagorean tradition of D’Arcy Thompson (1917/1961), of simple mathematical form in Nature. However, a darker picture also crystallizes, of a retrogression back to a familiar type of neoreductionism: rather than progress, a recantation of a generation of philosophy of mind threatens. Let us contemplate the development of mind/brain sciences in the postbehaviorist era, since c.1970. Functionalist/computationalist theories of mind emerged, opening the conceptual possibility of mental states not having to be identified just with physical hardware states, but instead with abstract software states (Putnam 1964). The corresponding idea in genetics was of course where we began the essay, the picture of “genome as program.” However, by now some of us may wonder whether AI has perhaps not fulfilled its early promise. For instance, compare the 1966 “Eliza” conversationalist program (an anti-AI project, really) (Weizenbaum 1976) with the remarkably unimpressive “Alice” – recent two-time winner of the Loebner Turing Test prize (ALICE 2004). AI and von Neumann-architecture machines both emerged a half century ago: if hardware had developed as has AI, we would still be using abacuses and sliderules – computers would merely be exotic laboratory confections. Hardware and software engineering differ profoundly (Cherniak 1988). Perhaps faith here in the future of progress may begin to waver.
110
Christopher Cherniak
The above five-state Busy Beaver candidate, although of course itself a program, illustrates another potential Pyrrhic victory for computationalism. This TM table was discovered by a huge constrained brute-force search of many possible TM tables by a program written by Heiner Marxen, of the Technical University of Berlin. When one contemplates the TM’s operation, it indeed displays a kind of inhuman, unintelligible elegance – it seems repeatedly to re-embroider segments of its tape, to re-use parts of its own code palindromically, etc. Rather than incomprehensibly huge, it is incomprehensibly compact (a “nano-kluge”). And so an idea emerges of unintelligible intelligence, of programs that might generate intelligent behavior, but without identifiable representations of their world, rules for using such models, etc. This corresponds to the above picture of intelligence for free from physics. What turns out to work efficiently may not be – indeed, may be antagonistic to – what is humanly comprehensible (Cherniak 1988). (The satirist Emo Phillips proposed, in another context, a universal nondenominational prayer: (approximately) “Lord, please arrange the Universe for my convenience.”) Yet I suspect anyone who actually experienced the aridity of the emptyorganism era could not regard its revival in any form with equanimity. The unintelligibly compressed programs recall the Ramsey sentences and Craig interpolations of 1950s instrumentalist philosophy of science (Ramsey 1931; Craig 1956). I conclude, or fail to conclude, with this anomie, this uneasy equivocation. A variation upon Matthew Arnold’s lines that Henry Adams (1918/1973) took as motto comes to mind: To find oneself caught between worldviews, one declining, one struggling to bring itself into existence.
Acknowledgments Some of the experimental work described here was supported by NIMH grant MH49867.
Bibliography Adams, H. (1918/1973) The Education of Henry Adams, Boston: Houghton. ALICE (2004) [2000, 2001 and 2004 Loebner “Turing Test” Prize winner] www.alicebot.org. Chaitin, G. (1987) Algorithmic Information Theory, New York: Cambridge University Press. Cherniak, C. (1988) “Undebuggability and cognitive science,” Communications of the Association for Computing Machinery, 31: 402–412. Cherniak, C. (1990) MindSet CogSci Courseware. PCOMP Logic Engine Package: TM* Turing Machine Modeller. www.glue.umd.edu/~cherniak/philcomp/. Cherniak, C. (1992) “Local optimization of neuron arbors,” University of Maryland Institute for Advanced Computer Studies Technical Report, No. 90–90 (1990); Biological Cybernetics, 66: 503–510 (1992). Cherniak, C. (1994a) “Component placement optimization in the brain,” University
Innateness and brain-wiring optimization 111 of Maryland Institute for Advanced Computer Studies Technical Report, No. 91–98 (1991); Journal of Neuroscience, 14: 2418–2427 (1994). Cherniak, C. (1994b) “Philosophy and computational neuroanatomy,” Philosophical Studies, 73: 89–107. Cherniak, C. (1995) “Neural component placement,” Trends in Neurosciences, 18: 522–527. Cherniak, C. (2000) “Network optimization in the brain: NIMH Grant Application,” University of Maryland Institute for Advanced Computer Studies Technical Report, UMIACS-TR-2001-28. Cherniak, C., Changizi, M., and Kang, D. (1999) “Large-scale optimization of neuron arbors,” University of Maryland Institute for Advanced Computer Studies Technical Report, No. 96-78 (1996); Physical Review E59: 6001–6009 (1999). http://pre.aps.org/. Cherniak, C., Mokhtarzada, Z., and Nodelman, U. (2002) “Optimal-wiring models of neuroanatomy,” in G. Ascoli (ed.), Computational Neuroanatomy: Principles and Methods, Totowa, NJ: Humana. Cherniak, C., Mokhtarzada, Z., Rodriguez-Esteban, R., and Changizi, B. (2004) “Global optimization of cerebral cortex layout,” Proceedings National Academy of Sciences, 101: 1081–1086. Craig, W. (1956) “Replacement of auxiliary expressions,” Philosophical Review, 65: 38–55. Crick, F. (1958) “On protein synthesis,” Symposia of the Society for Experimental Biology, 12: 138–167. DePomerai, D. (1990) From Gene to Animal, New York: Cambridge University Press. Dewdney, A. (1993) The New Turing Omnibus, New York: W.H. Freeman, ch. 39. Garey, M. and Johnson, D. (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco, CA: W.H. Freeman. Keyes, M. (1999) “The prion challenge to the ‘Central Dogma’ of molecular biology, 1965–1991,” Stud. Hist. Biol. and Biomed. Sci., 30: 1–19. Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983) “Optimization by simulated annealing,” Science, 220: 671–680. Lewis, H. and Papadimitriou, C. (1978) “The efficiency of algorithms,” Scientific American, 238: 96–109. Li, M. and Vitanyi, P. (1997) Introduction to Kolmogorov Complexity and its Applications, New York: Springer. Mitchell, M. (1996) An Introduction to Genetic Algorithms, Cambridge, MA: MIT Press. Piaget, J. (1970) Structuralism, New York: Basic. Putnam, H. (1964) “Robots: Machines or artificially created life?”, Journal of Philosophy, 61: 668–691. Ramsey, F. (1931) “Theories” in his The Foundations of Mathematics and other Logical Essays, ed. by R.B. Braithwaite, London: Routledge & Kegan Paul, 212–236. Ryle, G. (1949) The Concept of Mind, New York: Barnes & Noble. Sodaconstructor locomotor simulator package: www.sodaplay.com. Stich, S. (ed.) (1975) Innate Ideas, Berkeley, CA: University of California Press. Stockmeyer, L. and Chandra, A. (1979) “Intrinsically difficult problems,” Scientific American, 240: 140–159. Thompson, D. (1917/1961) On Growth and Form, New York: Cambridge University Press.
112
Christopher Cherniak
Thompson, J. and Stewart, H. (1986) Nonlinear Dynamics and Chaos, New York: Wiley. Walker, J. (1985) Roundabout: The Physics of Rotation in the Everyday World, New York: W.H. Freeman. (Cf. “tippetop” inverting top, “rattleback” spin-reverser, boomerang.) Watson, J. (1965) Molecular Biology of the Gene, 1st edn, New York: W.A. Benjamin. Watson, J. et al. (1987) Molecular Biology of the Gene, 4th edn, Menlo Park, CA: Benjamin/Cummings. Weizenbaum, J. (1976) Computer Power and Human Reason, San Francisco, CA: W.H. Freeman.
6
Evolution and the origins of the rational Inman Harvey
Unshared assumptions A civilized disagreement should ideally start with a set of recognized, shared assumptions. This at least forms the basis for comprehension of the context for the argument, of the terms used and their baggage of implicit premises. Discussion between academics rarely meets this ideal, and even more rarely when they come from different disciplines. I and my colleagues are in the business of investigating fundamental ideas about cognition by creating working artifacts that demonstrate basic cognitive abilities. So to that extent we count as cognitive scientists, and indeed our work can be seen as being at one end of the artificial intelligence (AI) spectrum; though, for reasons that may become apparent, we are wary of this label and the baggage it carries. We work with neuroscientists, ethologists, evolutionary theorists and roboticists, so the struggle to communicate across disciplinary boundaries is all too familiar to us. Our projects require us to take a philosophical stance on what cognition is all about, but our desired goal is not so much a theory as a working product, a working robot (or a principled computer simulation of a robot) that demonstrates some cognitive ability: intentionality, foresight, ability to learn, to communicate and cooperate, or rational behaviour. So the goal is to create physical mechanisms, if not of flesh and blood then made of plastic and transistors, that demonstrate behaviour describable in the language of the mind. Of course, if some specific mechanism can generate behaviour X, this does not necessarily imply that humans or animals use the same or even a similar mechanism. Nevertheless, such a demonstration is an existence proof: ‘See, we can demonstrate phenomenon X with no more than these physical components, with nothing up our sleeves. So any claim that some other ingredient is essential for X has been disproved.’ This ‘philosophy of mind with a screwdriver’ is, we claim, more challenging than the armchair version. A working robot becomes a puppet displaying the philosophical stance of its creators, and is an invaluable tool for making explicit the assumptions used, and for exposing the assumptions of those who doubt its abilities. In fact we try and minimize the assumptions
114
Inman Harvey
and preconceptions used when designing our robots or simulated agents, by using an evolutionary robotics (ER) approach: we use an artificial analogue of natural Darwinian evolution to design the robot/agent mechanisms. One assumption that we most emphatically do not share – with many philosophers of mind, cognitive scientists and AI practitioners – is that physical mechanisms and mental mechanisms are pretty much the same, that the architecture of the brain is isomorphic or even similar to the architecture of the mind. The mechanism of a clock can be specified in terms of a pendulum, weights, cogs, levers and springs, and carries with it no mention of timekeeping, even though that is the role of the clock. Likewise, we are interested in specifying at the physical level what constituents are sufficient for assembling into an artificial agent that displays cognition, and it is pointless to try to label any of these physical parts with mental or cognitive terms. This pernicious habit is regrettably rife in AI, but one of the core insights of a dynamical systems (DS) approach to cognition, as discussed below, is that a state of mind should not be equated with a physical state of the brain; no more than a ‘state of running fast’ in a clock can be equated with a state of one or many, or even all, the parts of its clockwork. This confusion largely arises because the term ‘state’ has different usages within different disciplines. For someone using a DS approach, the instantaneous state of a physical mechanism is specified by a vector: the current physical values of all the relevant physical variables of the system, including the instantaneous angle and angular velocity of the pendulum and of all the cogs in a clock, the electrical potentials and chemical concentrations at all relevant parts of a real brain, the sensor and motor configurations and all the internal instantaneous activations of a robot brain. In this sense, the state of a physical mechanism is a vector of real numbers that is continuously varying (unless the clock, the brain or the robot is effectively dead). By contrast, a state of mind such as belief, knowledge, fear or pain is something that is extended in time. When challenged, an identity theorist who equates mental states with brain states turns out not to be using this vector terminology to describe a state of the brain. The claim usually takes the form of some subpart, perhaps some module of the brain, being capable of maintaining over the relevant extended period of time a stable value, or stable pattern of values, which is causally correlated to the mental state – independently of whatever values the other variables in the brain might take. In my preferred terminology this is not a ‘brain state’, but rather a ‘subset of possible states of a subpart of the brain’. Even with this revised terminology, we can demonstrate robots that display mental states that have no such identifiable physical correlate within the (artificial) brain; only when we take into account the physical state of the environment, and the agent’s current physical interaction with the environment, can the full explanation be given of how the cognitive phenomena arise from the physical mechanisms. So this preamble is by way of a warning that the assumptions used here
Evolution and the origins of the rational 115 may be rather different from your own; and that we may reject some of the standard arguments against the philosophical positions we take, not because we disagree with the form of the argument but because we reject some of the assumptions on which they are based. First, we should discuss what we mean by cognition and the rational.
Chauvinism The English are notoriously bad at learning foreign languages. In fact, we don’t see the point of foreign languages at all – why should the French call a cow by some arbitrary word such as ‘vache’ when clearly it is a cow? God, after all, is an Englishman, and clearly the natural language of the universe is English. Philosophers tend to be similarly chauvinistic – but species-chauvinist rather than nation-chauvinist. By ‘rationality’ we usually mean human rationality, indeed usually twentieth or twenty-first century western educated human rationality; and too often we assume that this rationality is some God-given eternal yardstick. Just as the English question the wisdom of, indeed the need for, any other language, so the philosopher takes our human-specific rationality as the only perspective that makes sense. There is some virtue in this; after all if we speak we must speak in the language we have available, and if we are trying to make sense of the world then we are trying to make sense from our own human perspective. But I suggest that many of the problems and confusions in cognitive science come from this unthinking chauvinism, and above all when we consider the origins of the rational. There are at least two meanings for the term rational. On the one hand, we apply the word literally to human acts of reasoning, of logical or systematic reasoning; we can understand the steps in the proof of a theorem, and we can understand how the smooth curves of an aeroplane have arisen from thoughtful consideration by aeronautical engineers. On the other hand, we apply the word metaphorically to articles where we can, a posteriori, reason that form follows function as if it had been rationally designed; the smooth curves of a dolphin can be rationalized as a near-optimal design for swimming – they are a rational shape. A major revelation of Darwinian evolution was that ‘rational’ design can be the product of the mechanical unthinking process of heredity, variation and natural selection. With this Darwinian revelation in mind, we should always be careful, when talking about any living creatures including humans, to distinguish between ‘rational’ in the sense of explicit reasoning, and ‘rational’ in the metaphorical or ‘as-if’ sense that the theory of evolution now gives us. Let us summarize this briefly.
116
Inman Harvey
A brief history of life Here is a simplified version of the typical working assumptions of many biologists today. Many details are glossed over, there is room for disagreement on many of the specifics, but the overall picture will be generally shared. Around four billion years ago on this planet, some relatively simple living creatures arrived or arose somehow. There are several current theories as to how this might have happened, but the details do not matter for our present purposes. What does matter is that these creatures reproduced to make offspring that were similar but not necessarily identical to their parents; and the variations in the offspring affected their likelihood of surviving and producing fresh offspring in their turn. The likelihood of surviving depended on their ability to extract nutrients from their surroundings and avoid harm. Those that happened to do better than others, in their particular circumstances, would tend to be the parents of the next generation; so that the children tended to inherit their characteristics from relatively successful parents. Fitness depended on the particular requirements for nutrients, and the particular type of environmental niche. Over many generations, this Darwinian evolution working on the natural blind variation resulted in organisms that looked as though they were crafted, were rationally designed for their life style. The blind watchmaker produced ‘as-if’ rational design, without explicit reasoning. It also produced organisms with wants and needs, and with purposive behaviours that tended to satisfy those needs. We should be suspicious of arguments that produce an ‘ought’ from an ‘is’; but we can argue that Darwinian evolution shows us just how a ‘want’ can be derived from an ‘is’. Initially, all such organisms were single cells; the distinction between self and other was clearly defined by the cell membrane. Cells competed against other cells of their own kind (close relatives) and of other species (distant relatives) for limited resources. At some stage it occasionally became viable for limited coalitions to form. Multicellular organisms ‘put all their eggs in one basket’ as far as their joint and several fitnesses were concerned. New issues arose of policing the new boundaries between self and other, and ensuring that the commonwealth of interests within the boundary was not disrupted by the cancer of cheating. Life got more complex. Organism’s needs were served by not merely reacting to the immediate, but by anticipating future occurrences. Even plants have nervous systems and can learn; for instance sun-seeking flowers such as Malvastrum Rotundifolium, in order to maximize the time they spend soaking up the warmth face-on to the sun, return overnight to face the expected direction of the dawn. This direction changes with the seasons, and experiments show that it takes just a couple of days for some plants to learn where to expect a changed sunrise. Plant cognition for a plant’s lifestyle.
Evolution and the origins of the rational 117 Some organisms are more mobile than others in seeking out sustenance; animals prey on plants and other animals. More sophisticated nervous systems are required to deal with the faster and more varied world that this change of lifestyle implies. For some animals, recognition of places, and of other individuals within ones species, becomes important. Social behaviour requires knowing who to trust and who to distrust. The sophisticated manipulation of other creatures displayed by bacteria and plants becomes even more sophisticated and faster. After four billion years of evolution, for a brief period of a few million years one particular slightly different species flourished before being ousted by other species with different interests. Humans shared most of their cognitive faculties with their bacterial, plant and animal relatives, but adjusted to their particular lifestyle. They had hunger and fear, they searched for food and avoided danger. They could recognize what was good and bad for them, they could learn from experience and alter their behaviour accordingly. They had a rich social life and good memory for faces. I am emphasizing here the seamless continuity of human cognition with the cognition of all biological organisms. The differences should not blind us to the fact that most of our cognitive faculties existed in other creatures prior to the arrival of humans on the scene. But then, within this context, we should acknowledge the peculiar differences that distinguish humans from other species.
What makes humans different It just so happened that the strategic direction evolution chanced on for this species was towards increased social interaction and manipulation. Division of labour had been seen in many insect societies, but was carried much further among humans. Though division of labour has potential great benefits for the participating individuals, it requires trust and mechanisms for retaliating against or shunning those who betray that trust. Consensual manipulation of others became important, and the relatively simple methods of communication used by plants and animals became extended into a complex system of stereotypical sounds. As well as simple commands and warnings, intentions and coordination of future actions could be conveyed. Different individuals, with different roles in some coordinated action, could convey information and negotiate as to possible strategies. Discussion, and then the ability within discussion to anticipate events and other peoples’ situations, became a defining human characteristic. First came the birth of language, and then the birth of thinking; for the language of thought is language. One subset of language is the language of rational thought. With this, we can reason about possible situations that are not present; by developing mathematics and logic we can reason about whole classes of events.
118
Inman Harvey
Completing the circle of rationality So we should recognize that we are one biological species among many. Evolution has crafted the design, the behaviour, the cognitive faculties of all species to be generally appropriate to the interests of those organisms given their particular lifestyles. At some stage the peculiar human faculty of rational thought, made possible by language, arrived on the scene. We can rationally, consciously and with explicit intent design artifacts to do a job. It was Darwin’s brilliant insight that the irrational, unconscious and unintended forces of natural evolution can design adapted organisms that look as if they are rationally designed; alternatively, one can say that their designs are rational – in what is now perhaps a metaphorical sense of the word. So I want to distinguish between two different senses of the word ‘rational’: a primary sense, shall we say rationala, referring to the explicit reasoning that we humans with our languages are exclusively capable of; and a secondary, derivative sense, rationalb, where we appreciate Darwin’s insight that naturally evolved designs can be near-optimal for some function without a rationala designer. Confusingly, in terms of evolution rationalb designs occurred historically prior to the emergence in humans of rationalitya; and indeed I have spelt out above one possible rationalizationa as to just how rationalitya arose because it was rationalb for the lifestyle that humans came to take up!
Studies of minimal cognition Many of my colleagues working in cognitive science and AI are specieschauvinistic. They are solely concerned with that part of human cognition that is exclusively human, and fail to see the relevance of the related cognitive abilities of our animal, plant and bacterial cousins and ancestors. Within our evolutionary and adaptive systems group, however, we do explicitly follow up these relationships. On the basis that one should crawl before one runs, we are interested in exploring minimal levels of cognition in adaptive organisms, where ‘adaptive’ refers to behaviour that is rationalb in the face of changing circumstances. We use the tools of artificial life to synthesize artificial creatures – simple actual autonomous robots or simulated versions of them – to test ideas on how rationalb behaviour is generated. By autonomous we mean that, as far as is feasible, the artificial nervous system should be self-contained and these creatures should ‘do their own thing’ without the experimenter making internal adjustments.
Philosophy of mind We use synthesized model creatures sometimes to investigate how some very specific feats of animal cognition are done: the visual navigation of ants and
Evolution and the origins of the rational 119 bees, for instance. But also some of our experiments are for exploration of more general issues of cognition, with wider applicability across many or perhaps even all organisms, both actual and potential. When we work with synthesized creatures we know exactly what is and is not there; there can be no vitalistic forces hidden up our sleeves. If we can demonstrate intentional behaviour, foresight, learning, memory, desire, fear, re-identification of temporarily missing objects, social coordination, language use etc. in a simulated creature whose inner workings are visible to us, then we have an existence proof for one possible way that this might have been achieved in the natural world. It may not be the same way, but at a minimum this can perhaps help dispel some of the mysteries that have puzzled people for ages. Some of these mysteries fall into the remit of philosophy of mind, and the experiments that we do have some resemblance to thought experiments. But there is this important difference: our experiments do not survive unjustified or careless underlying assumptions, they can fail. This is the crucial test for philosophy of mind with a screwdriver. Wittgenstein famously saw the aim of philosophy as ‘shewing the fly the way out of the fly-bottle’. Confused buzzing within the philosophy of mind includes such questions as: Just how can the physical substance of brain and body generate mental phenomena such as intentional behaviour? Is some vital extra ingredient needed? When a human or other creature takes account of some object in its present environment, or uses foresight to take account of an object that is currently out of sight, how do the physical mechanisms of the nervous system relate to such an object? How can social coordination arise, how can we understand the origins of language in terms of a physical system made of meat and nerve fibres, or metal and silicon? The fly-bottle that has trapped so many people in these areas is the genuine puzzlement many have about the relationship between the physical and the mental. The perspective I take is that there are (at least) two different levels of description appropriate for humans, animals and other agents, including robots. We can describe the physical substrate of the agent: this is the language of the doctor, the anatomical neuroscientist, the robot engineer. Or we can describe an agent in behavioural and intentional terms, in ‘mentalese’ such as goals and desires, terms that are only appropriate for living creatures that have primary goals of survival and secondary derived goals; these mentalese terms may (cautiously) be extended where appropriate to artificial creatures. The fly-bottle focused on here is: how can we reconcile these two languages of description? A common cause of confusion arises from careless use of language, from smuggling in mentalese terms such as ‘representation’ into what should be exclusively physical descriptions; the physical symbol systems approach of classical AI being a common source of this confusion. In Beatrix Potter’s children’s story, it was thought to be sufficient explanation for why lettuce made rabbits sleepy to announce that they contained
120
Inman Harvey
something soporific. Hopefully philosophers are less easily satisfied; it should be obvious that one cannot attempt to explain the relationship between the mental and physical by smuggling mental terms into physical descriptions. So a first recommendation for the way out of this fly-bottle is to rigorously police the use of language: mental descriptions can and should be in mentalese, but physical-level descriptions should use no such terms. There is a reason why we should be tempted into the trap of using mentalese in our physical descriptions. If we believe some version of the tale above on the origins of language, this was partially, perhaps primarily, spurred on by the need to understand, plan and manipulate human relationships. An explanation typically works by restating the complex and unfamiliar in terms of the familiar and unquestioned. So we will explain central heating systems in terms of homunculi: ‘the thermostat is like a little person assessing the temperature, and passing on a message down the wires to the box in the control system that acts like a little human manager: in turn, this sends the equivalent of a human command to the switch that turns the boiler on’. This is perfectly acceptable, human-friendly language of explanation that we use everyday; it is normally sensible and the metaphors have much to recommend them. Just in this one special case, of elucidating the relationship between mentalese language of behaviour and the physical language of mechanism, it is a crucial trap for the careless fly (see how naturally we use a fly-homuncular metaphor to get the point across).
Cartesian or classical approaches to robotics In cognitive science the term ‘Cartesian’, perhaps rather unfairly to Descartes, has come to exclusively characterize a set of views that treat the division between the mental and the physical as fundamental – the Cartesian cut (Lemmen 1998). One form of the Cartesian cut is the dualist idea that these are two completely separate substances, the mental and the physical, which can exist independently of each other. Descartes proposed that these two worlds interacted in just one place in humans, the pineal gland in the brain. Nowadays this dualism is not very respectable, yet the common scientific assumption rests on a variant of this Cartesian cut: that the physical world can be considered completely objectively, independent of all observers. The Cartesian objectivity assumes that there just is a way the world is, independent of any observer at all. The scientist’s job, then, is to be a spectator from outside the world, with a God’s-eye view from above. When building robots, this leads to the classical approach where the robot is also a little scientist–spectator, seeking information (from outside) about how the world is, what objects are in which place. The robot takes in information, through its sensors; turns this into some internal representation or model, with which it can reason and plan; and on the basis of this formulates some action that is delivered through the motors. Brooks calls this the SMPA, or sense-model-plan-act architecture (Brooks 1999).
Evolution and the origins of the rational 121 The ‘brain’ or ‘nervous system’ of the robot can be considered as a Black Box connected to sensors and actuators, such that the behaviour of the machine plus brain within its environment can be seen to be intelligent. The question then is, ‘What to put in the Black Box?’ The classical computationalist view is that it should be computing appropriate outputs from its inputs. Or possibly they may say that whatever it is doing should be interpretable as doing such a computation. The astronomer, and her computer, perform computational algorithms in order to predict the next eclipse of the moon; the sun, moon and earth do not carry out such procedures as they drift through space. The cook follows the algorithm (recipe) for mixing a cake, but the ingredients do not do so as they rise in the oven. Likewise if I was capable of writing a computer program which predicted the actions of a small creature, this does not mean that the creature itself, or its neurons or its brain, was consulting some equivalent program in ‘deciding what to do’. Formal computations are to do with solving problems such as ‘When is the eclipse?’ But this is an astronomer’s problem, not a problem that the solar system faces and has to solve. Likewise, predicting the next movement of a creature is an animal behaviourist’s problem, not one that the creature faces. However, the rise of computer power in solving problems naturally, though regrettably, led AI to the view that cognition equalled the solving of problems, the calculation of appropriate outputs for a given set of inputs. The brain, on this view, was surely some kind of computer. What was the problem that the neural program had to solve? – the inputs must be sensory, but what were the outputs? Whereas a roboticist would talk in terms of motor outputs, the more cerebral academics of the infant AI community tended to think of plans, or representations, as the proper outputs to study. They treated the brain as the manager who does not get his own hands dirty, but rather issues commands based on high-level analysis and calculated strategy. The manager sits in his command post receiving a multitude of possibly garbled messages from a myriad of sensors and tries to work out what is going on. Proponents of this view tend not to admit explicitly, indeed they often deny vehemently that they think in terms of a homunculus in some inner chamber of the brain, but they have inherited a Cartesian split between mind and brain and in the final analysis they rely on such a metaphor.
What is the computer metaphor? The concepts of computers and computations, and programs, have a variety of meanings that shade into each other. On the one hand a computer is a formal system with the same powers as a Turing machine (assuming the memory is of adequate size). On the other hand a computer is this object sitting in front of me now, with screen and keyboard and indefinite quantities of software.
122
Inman Harvey
A program for the formal computer is equivalent to the pre-specified marks on the Turing machine’s tape. For a given starting state of this machine, the course of the computation is wholly determined by the program and the Turing machine’s transition table; it will continue until it halts with the correct answer, unless perhaps it continues forever – usually considered a bad thing! On the machine on my desk I can write a program to calculate a succession of co-ordinates for the parabola of a cricket-ball thrown into the air, and display these both as a list of figures and as a curve drawn on the screen. Here I am using the machine as a convenient fairly user-friendly Turing machine. However most programs for the machine on my desk are very different. At the moment it is (among many other things) running an editor or wordprocessing program. It sits there and waits, sometimes for very long periods indeed, until I hit a key on the keyboard, when it virtually immediately pops a symbol into an appropriate place on the screen; unless particular control keys are pressed, causing the file to be written, or edits to be made. Virtually all of the time the program is waiting for input, which it then processes near-instantaneously. In general it is a good thing for such a program to continue for ever, or at least until the exit command is keyed in. The cognitivist approach asserts that something with the power of a Turing machine is both necessary and sufficient to produce intelligence; both human intelligence and equivalent machine intelligence. Although not usually made clear, it would seem that something close to the model of a word-processing program is usually intended; i.e., a program that constantly awaits inputs, and then near-instantaneously calculates an appropriate output before settling down to await the next input. Life, so I understand the computationalists to hold, is a sequence of such individual events, perhaps processed in parallel.
What is a representation? The concept of symbolic reference, or representation, lies at the heart of analytic philosophy and of computer science. The underlying assumption of many is that a real world exists independently of any given observer; and that symbols are entities that can ‘stand for’ objects in this real world – in some abstract and absolute sense. In practice, the role of the observer in the act of representing something is ignored. Of course this works perfectly well in worlds where there is common agreement among all observers – explicit or implicit agreement – on the usages and definitions of the symbols, and the properties of the world that they represent. In the worlds of mathematics, or formal systems, this is the case, and this is reflected in the anonymity of tone, and use of the passive tense, in mathematics. Yet the dependency on such agreement is so easily forgotten – or perhaps ignored in the assumption that mathematics is the language of God.
Evolution and the origins of the rational 123 A symbol P is used by a person Q to represent, or refer to, an object R to a person S. Nothing can be referred to without somebody to do the referring. Normally Q and S are members of a community that have come to agree on their symbolic usages, and training as a mathematician involves learning the practices of such a community. The vocabulary of symbols can be extended by defining them in terms of already-recognized symbols. The English language, and the French language, are systems of symbols used by people of different language communities for communicating about their worlds, with their similarities and their different nuances and clichés. The languages themselves have developed over thousands of years, and the induction of each child into the use of its native language occupies a major slice of its early years. The fact that, nearly all the time we are talking English, we are doing so to an English-speaker (including when we talk to ourselves), makes it usually an unnecessary platitude to explicitly draw attention to the community that speaker and hearer belong to. Since symbols and representation stand firmly in the linguistic domain, another attribute they possess is that of arbitrariness (from the perspective of an observer external to the communicators). When I raise my forefinger with its back to you, and repeatedly bend the tip towards me, the chances are that you will interpret this as ‘come here’. This particular European and American sign is just as arbitrary as the Turkish equivalent of placing the hand horizontally facing down, and flapping it downwards. Different actions or entities can represent the same meaning to different communities; and the same action or entity can represent different things to different communities. In the more general case, and particularly in the field of connectionism and cognitive science, when talking of representation it is imperative to make clear who the users of the representation are; and it should be possible, at a minimum, to suggest how the convention underlying the representation arose. In particular it should be noted that where one and the same entity can represent different things to different observers, conceptual confusion can easily arise. When in doubt, one should always make explicit the Q and S when P is used by Q to represent R to S. In a computer program a variable pop_size may be used by the programmer to represent (to herself and to any other users of the program) the size of a population. Inside the program a variable i may be used to represent a counter or internal variable in many contexts. In each of these contexts a metaphor used by the programmer is that of the program describing the actions of various homunculi, some of them keeping count of iterations, some of them keeping track of variables, and it is within the context of particular groups of such homunculi that the symbols are representing. But how is this notion extended to computation in connectionist networks?
124 Inman Harvey
Representation in connectionism When a connectionist network is being used to do a computation, in most cases there will be input, hidden and output nodes. The activations on the input and output nodes are decreed by the connectionist to represent particular entities that have meaning for her, in the same way as pop_size is in a conventional program. But then the question is raised – ‘What about internal representations?’ If a connectionist network is providing the nervous system for a robot, a different interpretation might be put on the inputs and outputs. But for the purpose of this section, the issues of internal representation are the same. All too often the hidden agenda is based on a Platonic notion of representation – what do activations or patterns of activations represent in some absolute sense to God? The behaviour of the innards of a trained network are analysed with the same eagerness that a sacrificed chicken’s innards are interpreted as representing one’s future fate. There is however a more principled way of talking in terms of internal representations in a network, but a way that is critically dependent on the observer’s decomposition of that network. Namely, the network must be decomposed by the observer into two or more modules that are considered to be communicating with each other by means of these representations. Where a network is explicitly designed as a composition of various modules to do various subtasks (for instance a module could be a layer, or a group of laterally connected nodes within a layer), then an individual activation, or a distributed group of activations, can be deemed to represent an internal variable in the same way that i did within a computer program. However, unlike a program which wears its origins on its sleeve (in the form of a program listing), a connectionist network is usually deemed to be internally ‘nothing more than’ a collection of nodes, directed arcs, activations, weights and update rules. Hence there will usually be a large number of possible ways to decompose such a network, with little to choose between them; and it depends on just where the boundaries are drawn just who is representing what to whom. It might be argued that some ways of decomposing are more ‘natural’ than others; a possible criterion being that two sections of a network should have a lot of internal connections, but a limited number of connecting arcs between the sections. Yet as a matter of interest this does not usually hold for what is perhaps the most common form of decomposition, into layers. The notion of a distributed representation usually refers to a representation being carried in parallel in the communication from one layer to the next, where the layers as a whole can be considered as the Q and S in the formula ‘P is used by Q to represent R to S’. An internal representation, according to this view, only makes sense relative to a particular decomposition of a network chosen by an observer. To assert of a network that it contains internal representations can then only be
Evolution and the origins of the rational 125 justified as a rather too terse shorthand for asserting that the speaker proposes some such decomposition. Regrettably this does not seem to be the normal usage of the word in cognitive science, yet I am not aware of any well-defined alternative definition.
The dynamical systems approach to cognition In the following section we shall be outlining the strategy of evolutionary robotics as a design methodology for putting together robot nervous systems from a shelf-full of available components. But first we must establish just what sorts of components are necessary and sufficient – and the components will not be intended to carry any representations. The nervous system of an animal is an organized system of physical components such as neurons, their connecting axons and dendrites, and their substrate, with electrical and chemical activity swirling around. The picture that neuroscientists give us is still changing; it is only in the last decade or so that the significance of chemical transmission as well as electrical transmission between neurons has been noted as significant. But the universal working assumption is that in principle there is a finite (though extremely large) number of physical variables that could be picked out as relevant to the workings of the machinery of the brain; and these variables continuously interact with each other according to the laws of physics and chemistry. When we have a finite number of variables and can in principle write down an equation for each one, stating how its rate of change at any instant can be given by a formula related to the instantaneous values of itself and some or all of the other variables, then formally speaking we have a dynamical system. In the case of a nervous system, the variables are not only internal ones, but also include sensory inputs and motor outputs, the interactions with the world around it. So for our robots we pick appropriate sensors and motors for the task, bearing in mind that it is unwise to treat, for instance, an infra-red transceiver as a distance-measurer; rather, it is a transducer that outputs a voltage that depends on a variety of factors including the distance from the nearest object, the ambient light levels, the material that a reflective surface is made from. An infra-red detector may well indeed be useful for a robot that needs to avoid obstacles, but it would be a mistake to label the output as ‘distanceto-wall’. This would be a term in ‘mentalese’ rather than a neutral physical term. As for the internal components, for some specific ER experiments these have been components of standard electronic circuits as packaged in a field programmable gate array. But for reasons of practicality, in many experiments we use artificial neural networks (ANNs) as simulated in real time with the aid of a computer. In particular one favourite class of ANNs is the CTRNN, continuous time recurrent neural network (Beer 1995). This is a potentially fully-connected network of real time leaky integrators with
126
Inman Harvey
specified temporal constants, and is an archetypal dynamical system. The class of CTRNNs has the useful property of universal approximation to any smooth dynamical system (Funahashi and Nakamura 1993); in other words, given any DS where the variables change smoothly, we can in principle find a CTRNN, with enough nodes, that will approximate its dynamical behaviour to any desired degree of accuracy.
Evolutionary robotics The problem of AI is just how to build systems that generate adaptive or intelligent behaviour. With an evolutionary perspective, it makes sense to start small and simple first, so we look at minimally cognitive artificial agents; in particular with rationalb adaptive behaviour as a more fundamental starting place than the rationala behaviour of human intelligence. How can we design the control mechanisms which can produce adaptive behaviour in synthetic creatures? We must bear in mind that the mechanism must be described in non-intentional language if this is going to give us any insight into the relationship between the mental and the physical. Braitenberg’s vehicles are a very simple example of what we are seeking. We can describe a very simple circuit where left and right photocells, mounted on the front of a simple toy vehicle, are connected respectively to the right and left motors driving the wheels on each side. In the simplest version, the circuit drives each wheel slowly forward in the absence of any light. Any increase in the light-level reaching either photoreceptor results in increased speed on the attached wheel. I have described this ‘nervous system’ in non-mentalese, yet we can see quite easily that this results in lightseeking behaviour in the vehicle. Further, experiment shows that this behaviour is adaptive, in the sense that the photovore behaviour will adapt the motion to continually chase a moving light-target. The observer watching this behaviour just naturally describes it in intentional language. This simple example helps to resolve worries about how such relatively simple organisms as bacteria can display similarly goal-directed behaviour; there is no need to call on explicit conscious intentions. How do we scale up from this to more sophisticated examples? One approach is to follow the course of evolution fairly literally; this gives us the relatively new field of evolutionary robotics, as a methodology for designing artificial creatures using artificial Darwinian evolution. Suppose that we wish to produce a creature that will repeatedly seek a light target, but also learn what landmarks will aid it in the search. Then as evolutionary roboticists we set up a test environment, where robots can be evaluated and scored on their fitness at this task. We then work with a population of robots that have various designs of nervous system architecture; or more practically, we usually work with one robot and a population of possible architectures. The components of the nervous system are real or idealized physical components equivalent to the pendulums, cogs and
Evolution and the origins of the rational 127 levers of a clockmaker; we shall return in more detail to this below. In our role as the creators of this artificial world, we specify some appropriate mapping between genotypes, strings of artificial DNA that may well be simply composed of 0s and 1s, and phenotypes by which we mean the actual way in which the nervous system is assembled from the available components. At the simplest level, the genotype may be (when translated) in effect a blueprint for assembling the nervous system. With more sophisticated mappings, it may act more like a recipe ‘for baking a cake’, directing and influencing (perhaps in league with environmental influences) the final architecture without actually explicitly specifying its form. Whichever form of mapping we choose to use, the result should be that the search through the space of possible architectures is paralleled by an equivalent search through the space of possible genotypes, of artificial DNA. Indeed, since these genotypes allow inheritance of genetic material from robot parents selected for their fitness at a task, and mutations to the genotype – a few random changes to the 0s and 1s – allow for variation, we have all the necessary ingredients for Darwinian evolution: heredity, variation and selection.
The evolutionary procedure Artificial evolution typically consists of repeated rounds, or generations, of testing a population of candidate designs, and selecting preferentially the fitter ones to be parents of the next generation. The next generation consists of offspring that inherit the genetic material from their selected parents; much the same as farmers have been doing for thousands of years in improving their crops and their livestock. The initial population is based on completely random genotypes of artificial DNA, so the only direction given to the evolutionary design process is the indirect pressure of selection. The genotypes in following generations are usually mixed through sexual recombination, and further varied through mutations, so as to introduce further variety for selection to choose from. As far as possible this is an automated, hands-off process. The human designer’s input is limited to the mapping from genotype to phenotype, and the selection process that allocates fitness scores to each robot. This depends on the cognitive ability that is desired, and usually requires careful thought. To give a simple example, if one wishes to craft a fitness function intended to promote the movement of a robot across a crowded floor without bumping into anything, then one might be tempted to judge the fitness by how far the robot travels in a fixed time. Although high scores will be achieved through evolution, the result may well be disappointing as typically such scores will be gained by rapid rotation around a tight circle; it turns out to be necessary to craft the fitness function so as to give more credit for (fairly) straight movement and less for tight turns. Although the human designer has set up the scenario intended to select,
128 Inman Harvey over many generations, for the desired behaviour, it is important to note two things. First, no individual robot is given any feedback as to how well its behaviour is accumulating fitness – so later generations only succeed if it is in their inherited ‘genetic nature’ to behave appropriately. Second, more often than not the finally evolved robot nervous system is complex and opaque, it is difficult and maybe impossible to analyse just how the job is done.
Some experiments The techniques of ER have been developed since the beginning of the 1990s, and by now there are probably thousands of evolved robot designs, both in simulation and with real robots. Three major centers are Case Western Reserve University, where Beer and colleagues perform simulation experiments in ‘minimal cognition’ (Beer 2000); EPFL in Lausanne, Switzerland, where Floreano and colleagues evolve control systems for real robots (Floreano and Urzelai 2000); and our own group at Sussex, where we work both with simulations and real robots (Harvey et al. 1997). The types of cognitive behaviour that we can demonstrate in robots through these means start at the very simple and then work slowly up. We can generate visual awareness and recognition of particular distant objects or goals, and the ability to navigate towards them while avoiding obstacles. There have been studies on the origins of learning, where an agent needs to learn and relearn about correlations (between a visual landmark and its goal) that change at unexpected intervals (Tuci et al. 2003). There has been a successful attempt to recreate in a robot the equivalent to our human ability to adjust to wearing inverting glasses – after an extended period of time, weeks in the case of humans, the world that was seen as upside-down becomes familiar enough to allow us to get around and navigate successfully (Di Paolo 2000). Teams of identical robots have been evolved so as to effectively negotiate between each other their different roles in a joint operation (Quinn et al. 2003). All these behaviours display the hallmarks of intentionality at different levels of complexity, despite the fact that the robot nervous system or control system is nothing more than a dynamical system cobbled together through evolution. The selection pressures were designed through consideration of the intentional behaviour required, in ‘mentalese’ terms; but the genetic specification of the brain is in pure physical terms, the neutral language of DS. Analysis of the evolved networks typically shows no ‘perception module’ or ‘planning module’ in the brain, no obvious representational place-holders for internal representations of external objects or events. These experiments give us an existence proof for one possible way of generating these behaviours, and hence act as a challenge to those who claim that brains must be organized in some radically different way in order to produce these effects.
Evolution and the origins of the rational 129 All these experiments demonstrate rationalb adaptive behaviour, but not yet the rationala behaviour of human intelligence. What we have done is intended as echoing, in simplified form, the evolution of cognition in the relatively early days of life on this planet. Although there is plenty of work to be done before aiming explicitly at human rationality, the experiments in communication and coordination of behaviour between robots make one very small stepping stone aimed in that direction.
To summarize I have tried to give a feel for the working philosophy of someone working in new, non-classical AI, artificial life and evolutionary robotics. This means rejecting a lot of the philosophical baggage that has been associated with much of AI and cognitive science in previous decades. It means continually needing to explain oneself to others who start from a very different set of assumptions. There is a massive rift in AI and cognitive science at the end of the twentieth and beginning of the twenty-first century. I would suggest that this may be a sign of a Copernican revolution finally reaching cognitive science, up to a century or more after equivalent revolutions in biology and physics.
The Copernican revolution The history of science shows a number of advances, now generally accepted, that stem from a relativist perspective which (surprisingly) is associated with an objective stance towards our role as observers. The Copernican revolution abandoned our privileged position at the centre of the universe, and took the imaginative leap of wondering how the solar system would look viewed from the Sun or another planet. Scientific objectivity requires theories to be general, to hold true independently of our particular idiosyncratic perspective, and the relativism of Copernicus extended the realm of the objective. Darwin placed humans among the other living creatures of the universe, to be treated on the same footing. With special relativity, Einstein carried the Copernican revolution further, by considering the viewpoints of observers travelling near to the speed of light, and insisting that scientific objectivity required that their perspectives were equally privileged to ours. Quantum physics again brings the observer explicitly into view. Cognitive scientists must be careful above all not to confuse objects that are clear to them, that have an objective existence for them, with objects that have a meaningful existence for other agents. A roboticist learns very early on how difficult it is to make a robot recognize something that is crystal clear to us, such as an obstacle or a door. It makes sense for us to describe such an object as ‘existing for that robot’ if the physical, sensorimotor coupling of the robot with that object results in robot behaviour that can be correlated with the presence of the object. By starting the previous
130
Inman Harvey
sentence with ‘It makes sense for us to describe . . .’ I am acknowledging our own position here acting as scientists observing a world of cognitive agents such as robots or people; this objective stance means we place ourselves outside this world looking in as god-like creatures from outside. Our theories can be scientifically objective, which means that predictions should not be dependent on incidental factors such as the nationality or location or starsign of the theorist. When I see a red sign, this red sign is an object that can be discussed scientifically. This is another way of saying that it exists for me, for you, and for other human observers of any nationality; though it does not exist for a bacterium or a mole. We construct these objects from our experience and through our acculturation as humans through education. It makes no sense to discuss (for us humans to discuss) the existence of objects in the absence of humans. And (in an attempt to forestall the predictable objections) this view does not imply that we can just posit the existence of any arbitrary thing as our whim takes us. Just as our capacity for language is phylogenetically built upon our sensorimotor capacities, so our objects, our scientific concepts, are built out of our experience. But our phenomenal experience itself cannot be an objective thing that can be discussed or compared with other things. It is primary, in the sense that it is only through having phenomenal experience that we can create things, objective things that are secondary. Like it or not, any approach to the design of autonomous robots is underpinned by some philosophical position in the designer. There is no philosophy-free approach to robot design – though sometimes the philosophy arises through accepting unthinkingly and without reflection the approach within which one has been brought up. Computationalist AI has been predicated on some version of the Cartesian cut, and the computational approach has had enormous success in building superb tools for humans to use – but it is simply inappropriate for building autonomous robots. There is a different philosophical tradition which seeks to understand cognition in terms of the priority of lived phenomenal experience, the priority of everyday practical know-how over reflective rational knowing-that. This leads to very different engineering decisions in the design of robots, to building situated and embodied creatures whose dynamics are such that their coupling with their world leads to sensible behaviours. The design principles needed are very different; Brooks’ subsumption architecture is one approach, evolutionary robotics is another. Philosophy does make a practical difference.
Bibliography Beer, R.D. (1995) ‘On the dynamics of small continuous-time recurrent neural networks’, Adaptive Behavior, 3: 469–509. Beer, R.D. (2000) ‘Dynamical approaches to cognitive science’, Trends in Cognitive Sciences, 4 (3): 91–99.
Evolution and the origins of the rational 131 Brooks, R. (1999) Cambrian Intelligence: The Early History of the New AI, Cambridge, MA: MIT Press. Di Paolo, E.A. (2000) ‘Homeostatic adaptation to inversion of the visual field and other sensorimotor disruptions’, Proc. of SAB ’2000, Cambridge, MA: MIT Press. Floreano, D. and Urzelai, J. (2000) ‘Evolutionary Robots with on-line self-organization and behavioral fitness’, Neural Networks, 13: 431–443. Funahashi, K. and Nakamura, Y. (1993) ‘Approximation of dynamical systems by continuous time recurrent neural networks’, Neural Networks, 6: 801–806. Harvey, I., Husbands, P., Cliff, D., Thompson, A, and Jakobi, N. (1997) ‘Evolutionary robotics: the Sussex approach’, Robotics and Autonomous Systems, 20: 205–224. Lemmen, R. (1998) ‘Towards a non-Cartesian cognitive science in the light of the philosophy of Merleau-Ponty’, DPhil thesis, University of Sussex. Quinn, M., Smith, L., Mayley, G. and Husbands, P. (2003) ‘Evolving controllers for a homogeneous system of physical robots: structured cooperation with minimal sensors’, Philosophical Transactions of the Royal Society of London, Series A: Mathematical, Physical and Engineering Sciences, 361: 2321–2344. Tuci, E., Quinn, M. and Harvey, I. (2003) ‘An evolutionary ecological approach to the study of learning behaviour using a robot based model’, Adaptive Behavior, 10 (3/4): 201–222.
Part III
Cognition
7
How to get around by mind and body Spatial thought, spatial action Barbara Tversky
People draw on spatial knowledge when they find their ways back to their hotel while traveling as well as when they estimate the direction between their home town and their current one. Despite the clear relations among the tasks, they are studied by two disparate communities. The community that takes navigation in real space as its task is concerned with the determinants of accuracy, and in turn, the cues in the environment and the sensory-motor systems that make use of them. The community that takes spatial judgments as its task is occupied with systematic error, and in turn, the normal perceptual and cognitive processes that produce them. How can these communities be integrated? Both accept the improvement and refinement that selection by learning or by evolution can achieve. The mystery, then, is to account for the existence of systematic errors. This requires two steps: first, showing that the errors are a consequence of normal and useful perceptual and cognitive processes; second, by showing why the errors are resistant to change for the better by learning or evolution.
1 Space in mind, space in action Spatial thinking is special. It is multi-modal, involving not just sight, but sound, smell, and touch; all these modalities and more reveal where we are and what is around us. Knowledge of space is essential to survival; our lives depend on knowing how to get home, where is safe ground, what to handle. Spatial thinking serves as a basis for other thought, it takes us from the concrete to the abstract, applying facility in reasoning about spatial size, distance, direction, and transformations to drawing inferences and constructing theories in abstract domains. Language reveals the spatial nature of abstract thought: we say a field is wide open, she’s reaching for the stars, he has fallen into a depression, drawing away from friends. Despite the ubiquity of space and the necessity for knowledge of it, people think about space differently from the way space is measured, from the way space is conceived in physics or geometry or engineering. In those cases, space itself is primary, and entities are located in it. For people, space begins with the entities; they are primary. They are located and oriented
136
Barbara Tversky
with respect to each other and with respect to reference frames. These relations are not metric, but approximate, categorical, schematic. People interact in many spaces, the space of the body, the space around the body, and the space of navigation are prominent among them. Which objects and which reference frames are selected depends on the particular space, and the perceptions and actions it subserves. For the space of the body, body parts that are perceptually salient and functionally significant are prominent (Morrison and Tversky in press; Tversky et al. 2002). For the space around the body, the three axes of the body and those of the world are critical (Bryant et al. 1992; Franklin and Tversky 1990). For the space of navigation, the primary focus here, paths and landmarks form the conceptual skeleton. The space of navigation is the space that is too large to be seen at a glance. It has to be constructed, from different views, different encounters, even different modes, from experience, from maps, from language. Within psychology, two communities have studied the space of navigation. The community I call the mind community grew out of traditions in perception and cognition. The data of interest to this community come from studies of spatial judgments: what is the direction between Los Angeles and Algiers? The distance from Manchester to Glasgow? From Jerry’s apartment to Times Square? Such questions were not chosen randomly; they were selected to induce error, and they succeeded. In fact, the major findings have been systematic errors in the judgments, evidence used to analyze normal cognitive processes (e.g., Tversky 1981, 1993, 2000a, 2000b). The community I call the body community grew out of traditions in learning and animal behavior. The data of interest to this community come from studies of spatial behavior, demonstrating that birds, bees, rats, and even people, can find their ways back to nest or home. The major findings have been accuracy of the behavior, evidence used to analyze the cues, and perceptual-motor systems that yield accuracy (see, for examples, Gallistel 1990 and papers in the volume edited by Golledge 1999). Even greater than the differences in situations and explanations are the differences in perspective between the communities. The perspective of the mind community on the mind is its limitations, its fallibility. The perspective of the body community on the body is its precision, its fine-tuning. The mind community strives to elucidate the normal cognitive processes that result in error; the body community strives to elucidate the behavioral systems that yield accuracy. I plead guilty to exaggerating the positions of the two communities, heightening their differences. Naturally, there are similarities and points of contact. The mind folk and the body folk alike take learning and evolution for granted, and acknowledge that those processes can improve judgments and performance. If so, then the burden is on the mind community: why do systematic errors persist despite the selective pressures of years of learning and centuries of evolution? To answer this first requires a review of systematic errors and then an analysis of them.
How to get around by mind and body 137
2 Systematic errors in spatial memory and judgment 2.1 Distortions due to hierarchical organization Although the space of navigation is for most intents and purposes flat, people group and organize space hierarchically. An example so famous it appears as a question in Trivial Pursuit® came from a study by Stevens and Coupe (1978). They asked students (in San Diego) to indicate the direction from San Diego to Reno. Of course all realized correctly that Reno is north of San Diego. But, most of the informants thought erroneously that Reno is east of San Diego, when, in fact, it is west. Stevens and Coupe generated that example from their theory. According to their theory, people do not remember all possible directions between pairs of cities. Instead, they remember the approximate locations of the states and what cities are in what states. They then use the remembered relative locations of the states to infer the directions between cities contained in them. Since Nevada is on the whole east of California, people infer that cities in Nevada are east of cities in California. Stevens and Coupe found similar effects for maps they constructed according to these principles. Organizing space hierarchically distorts distance judgments as well as direction judgments. The general finding has been that distances within an entity, whether geographic entities, like a state or country, or conceptual entities, such as buildings differing in function or settlements differing by ethnicity, are underestimated relative to distances between entities (e.g., Hirtle and Jonides 1985; Portugali 1993). Hierarchical structure is reflected in reaction times to make judgments as well as errors of judgment; distance judgments for pairs of cities between states or countries are faster than distance judgments for pairs within the same geographic entity (Maki 1981; Wilton 1979). Notably, grouping affects abstract judgments as well as judgments of proximity. People judge pairs of members of their own social or political groups to be more similar on features unrelated to the basis for grouping than pairs where one member is from one’s own group and the other from another. This can be taken as evidence for the spatial basis of abstract judgments. Hierarchical organization has useful consequences as well. It has clear benefits in memory, and it facilitates inference. Knowing that a loquat is a fruit allows people who have never encountered one to make good guesses about it; that it is sweet, that it grows on trees, that it has seeds, that it is within a certain size range, that it can spoil. Knowing that Reno is in Nevada allows us to infer that gambling is legal and that the climate is dry. Knowing that someone is an engineer or belongs to a feminist organization encourages yet other inferences.
138
Barbara Tversky
2.2 Distortions due to perspective From a view point that allows us to see far away, the things we see in the distance appear telescoped, that is, they seem crowded relative to the things that are nearby. An analogous phenomenon occurs in judgment. For evidence, we move from San Diego to Ann Arbor, where students were asked to imagine themselves either in San Francisco or in New York City. Then they were asked to judge the distances between pairs of cities more or less equidistant on an east–west path across the United States: New York City, Pittsburgh, Indianapolis, Kansas City, Salt Lake City, and San Francisco. Those with the east coast perspective gave larger estimates for the east coast distances, especially New York–Pittsburgh, from those with the west coast perspective. Similarly, those with the west coast perspective gave larger estimates for the west coast distances, notably, San Francisco–Salt Lake City, from those with the east coast perspective (Holyoak and Mah 1982). Of course, this is one of the things Steinberg was telling us in his delightful New Yorker covers all those years. But the psychologists showed something more (whew!). Remember that all the informants were actually in Ann Arbor; the viewpoint was not their actual geographic position but rather an imagined one, and they were able to adopt either viewpoint with differing consequences. As for hierarchical grouping, so for perspective, these effects occur for abstract judgments as well as spatial ones. We readily perceive the uniqueness and variability of those close to us, but glom all those others, from another social or political group, together, as all alike. And surely a case can be made for being more sensitive to the distances and differences that surround us than for the distances and differences that are far away. 2.3 Distortions due to landmarks When someone local asks us where we live, we often respond with the closest landmark we think that person will know. Near DuPont Circle. Or the Bastille. Or Pombal. Landmarks seem to extend themselves to encompass whole neighborhoods. But ordinary buildings are just that. Consistent with this thinking, landmarks seem to draw ordinary buildings close to them, but not vice versa. A favorite cognitive sport at college campuses is to show the distorting effects of landmarks. People report that an ordinary building is closer to a landmark than the landmark to the ordinary building (Sadalla et al. 1980; McNamara and Diwadkar 1997). This robust phenomenon violates any metric account of spatial cognition; by a metric account, the distance from B to A must be the same as the distance from A to B. Like the previous distortions, the landmark effect occurs for abstract judgments as well as spatial ones; in fact, it was first demonstrated for abstract judgments. Rosch found that people judge magenta to be more similar to red than red to magenta, and an ellipse to be more similar to a circle than a
How to get around by mind and body 139 circle to an ellipse (Rosch 1975). Even more abstract, A. Tversky and Gati found that people think North Korea is more similar to Communist China than China to North Korea (Tversky and Gati 1978). Like landmarks, prototypes, such as red or circle or China, seem to define neighborhoods or categories, in this case, conceptual ones, including variations in them. Ordinary or variant cases do not. 2.4 Other errors These are but some of the systematic errors that have been documented; there are others. Route distances are judged longer when they have more turns (e.g., Sadalla and Magel 1980), or landmarks (e.g., Thorndyke 1981), or intersections (e.g., Sadalla and Staplin 1980) along the route. The presence of barriers that require detours along the route also lengthen estimates of route distance (e.g., Newcombe and Liben 1982). Curved features get straighter in the mind, for example, the Seine by Parisians (Milgram and Jodelet 1976) and the streets of Pittsburgh by well-seasoned taxi drivers (Chase and Chi 1981). Small angles and distances are overestimated while large ones are underestimated; as before, an error that appears in judgments of the abstract as well as of space (e.g., Kahneman and Tversky 1979; Poulton 1989).
3 An account of some errors The errors highlighted are not consequences of randomness or ignorance. Rather, they are systematic, predictable consequences of ordinary perceptual and cognitive processes. Forming mental representations of environments has much in common with forming mental representations of scenes (Tversky 1981). An early process in scene representation is distinguishing figures from grounds, not always an easy task as volumes of reversible figures have illustrated. Once figures have been distinguished, they are located and oriented with respect to other figures and with respect to a frame of reference. Relating figures to other figures and to reference frames organizes a scene, but may also create error. When figures are related to each other, they are mentally brought into greater alignment, a phenomenon similar to the Gestalt principle of grouping by proximity. To demonstrate, students, this time at Stanford, were asked which of a pair of maps of the world was the correct one. One map was correct; in the other, the relative positions of the continents were altered so that the United States was more aligned with Europe than it actually is and South America was more aligned with Africa than it actually is. A significant majority of students picked the incorrect map. Alignment works for north–south as well east–west. A significant majority of students selected an incorrect map in which South America was moved westwards to be more aligned with North America than it actually is. Alignment appears for direction estimates – Los Angeles is south, not
140
Barbara Tversky
north, of Algiers – although the majority of students answered this and similar comparisons incorrectly. Alignment also appears in memory for artificial maps, and for meaningless blobs. Figures induce their own set of axes, usually around an axis of elongation or of symmetry and that perpendicular to it. The axes induced by a figure may not correspond to the axes induced by a predominant external reference frame. According to rotation, the axes induced by the figure and those of the reference frame are mentally brought into greater correspondence. Thus, when asked to place a cut-out of South America in a north–south east–west reference frame, most students uprighted South America. Bay Area dwellers mentally upright the Peninsula, which runs north–west south–east. Consequently, they incorrectly report that Berkeley is east of Stanford, and Santa Cruz west of Stanford. As for alignment, rotation appears for artificial maps and meaningless blobs. Both these processes generating errors are, as noted, rooted in normal perceptual organization (Tversky 1981).
4 Accounting for the existence of error 4.1 Schematization underlies comprehension and representation The perceptual and cognitive processes just described that underlie comprehension of scenes schematize the information that the world provides. That is, they omit some information, and simplify and approximate other information. An inevitable consequence of schematization is error. That schematization also has benefits shall soon be apparent. 4.2 Schematization underlies integration Because the space of navigation is too large to be taken in at once, it comes in parts that must be integrated. How can the mind integrate different views, encounters, modalities into a sensible whole? One way would be to establish correspondences between the critical figures in each part, and then to organize them with respect to a common reference frame. Using reference objects and reference frames, of course, are exactly the processes that organize scene comprehension, and that have been shown to yield errors. Integrating across views, encounters, and modalities draw on the same processes as creating a representation. Perhaps the better way of putting it is the converse: the errors provide evidence for the processes. 4.3 Schematization alleviates cognitive load Now we have, through schematization, formed representations of parts and integrated them into larger representations that have some degree of coherence. Given the nature of the errors that persist, complete coherence cannot be claimed. Although schematized, the representations preserve the important
How to get around by mind and body 141 information, that of the relative locations and directions of the key figures. The next consideration is how the information is used to make spatial judgments, of, say, distance, direction, and size. In making judgments like the direction between Los Angeles and Algiers or the distance from the subway stop to Fanueil Market, people cannot rely on a mental atlas the way cartographers, or the AAA, or websurfers can rely on a physical one. People do not have a mental compendium of pre-stored integrated mental maps from which the relevant one can be extracted and inspected. What people seem to do is to draw on whatever knowledge they have that seems relevant, knowledge obtained from interaction in the environment, from maps, from language. That information has then to be integrated on the fly in ways just described. Then a judgment can be made on the integrated package. All this occurs in working memory, a network of brain activity notoriously limited in capacity (e.g., Baddeley 1990; Kahneman 1974). The more “things” held in working memory, the less capacity for computation; thus the richer the representation of the environment, the poorer the judgment. Schematizing representations leaves more capacity for judgment. Here’s a modern analogy that goes part way, instructively so: schematizing is like reducing bandwidth by compression. But in contrast to most current compression schemes, schematization compresses intelligently, by selecting the figures and relations that allow reconstruction of the world. 4.4 Spatial judgments are armchair judgments Whereas wayfinding happens in an actual environment, rich with cues, spatial judgments are done in the mind, a piece of minimalist art. For tennis, the weight of the racquet in the hand, the ping of the ball on the strings, the tug and release of the muscles, the give of the asphalt on the feet, all support the swing, as does the sight of the opponent’s moves and the thump of ball on the court. For wayfinding, the texture of the pavement, the noise of traffic or birds, the wind between the buildings or trees, the smells emanating from the stores or the fields, support keeping on track, as well as the sights and views of the changing scene. The varied and multimodal cues available in context provide information beyond what can be articulated or activated by the mind. Context promotes accuracy in more than one way. For one thing, it constrains behavior. The pedals of a bicycle constrain where the feet can go and how they can move. The roads and buildings in an environment constrain where we can turn and enter and exit. The mind can imagine, and, indeed, can believe, many things the world does not allow. For another thing, context provides cues to memory and performance. The decreasing traffic on the freeway reminds you that your exit is near; the sight of the bank on the corner prompts you to look for the subway entrance. The modern world even comes annotated, with street signs and directions to destinations. You don’t have to actively recollect these cues and signs as the world provides them for you. Another hi-tech analog: the
142
Barbara Tversky
world is a menu already pulled down; it turns a memory retrieval task into a simpler memory recognition task. Context means that schematic information, though erroneous, may be sufficient to avoid error in spatial behavior. In fact, the sorts of sketch maps and route directions that people give each other can and do leave out much information, and these have undergone generations of informal user testing. Not accidentally, sketch maps and route directions omit or distort the same information that mental representations omit or distort, for example, metric information about distance and direction (Tversky and Lee 1998, 1999). What sketch maps, route directions, and mental representations do preserve is paths and nodes for action; the environment supplements what is missing and disambiguates what is schematic.
5 Accounting for the persistence of error The case for the schematization processes that produce systematic errors in spatial judgments seems, if anything, to be overdetermined, much like the case for perceptual illusions. The very processes that produce error also facilitate the construction and integration of mental representations as well as inferences and judgments from them. Yet one might still wonder why such errors persist; should people learn to avoid not making errors? Accounting for the persistence of error is the next item on the agenda. 5.1 Spatial judgments are rarely repeated The conditions for effective learning, such as learning one’s way around a city or how to play a violin, typically require many trials in the same context. Context and repetition confer many benefits to increasing accuracy of behavior. Context constrains behavior, allowing some responses and not others. Context also provides rich cues to behaviors, where to turn, how to hold the fingers. Repetition with feedback provides the opportunity to correct errors and to learn the appropriate responses. Whereas those spatial behaviors that become accurate and finely-tuned do so under practice, spatial judgments are rarely repeated, especially with feedback. When they are, they are likely to be learned. For example, I now know that Berkeley is west of Stanford and Los Angeles south of Algiers. 5.2 Learning affects specifics, not processes or representations Knowing that Los Angeles is south of Algiers does not help me with the direction between Berkeley and Stanford or even the direction between Philadelphia and Rome. That knowledge is in the form of encapsulated footnotes, not generalized correctives to geographic regions. Learning corrects errors for specific facts, it does not affect the processes that generated the
How to get around by mind and body 143 errors. In fact, much of the data on alignment and rotation were collected immediately after a class lecture on systematic errors in spatial judgments. The very mechanisms and processes that produce errors are general purpose mechanisms and processes. They function across a broad domain of content and are useful in a wide range of contexts. The mechanisms and processes applied to making spatial inferences are among those that function in perception and comprehension of the world around us. 5.3 Correctives in context Thinking, indeed, believing that Rome is south of Philadelphia or that Santa Cruz is east of Stanford may never have consequences for me in the world. Even if I am asked and err, my questioner may have no reason to doubt me, indeed, may share my judgment. If I am driving from Stanford to Santa Cruz, I follow the highways, which do not err. Likewise, if I believe erroneously that a particular turn is a right angle, I will turn the direction the roads allow: if I believe a road or river is straighter than it is, I will again follow road or river rather than my erroneous beliefs. Many of the mechanisms that produce error are independent; consequently, so are the errors. The errors may conflict and cancel (e.g., Baird 1979; Baird et al. 1979). For the purposes of science, we ask only one question, provide only one cue, focus on one kind of error. But environments provide multiple cues, some of which may yield error, some not, and the errors are likely to be uncorrelated. The many affordances and cues available in context combined with a schematic overlay of the larger surrounding are likely to be sufficient for successful navigation.
6 A second look at spatial behavior in the wild Now that the case has been made for the existence and persistence of error in spatial judgment, we need to take a second look at spatial behavior in the wild. Despite, or perhaps because of selection by evolution and refinement by learning, navigation in the wild is by no means perfect. As for spatial judgments, clever experimentation yields errors and, at the same time, reveals the mechanisms used in normal, error-free navigation. Path integration is the task most commonly studied. In path integration, a traveling organism is continuously updating its position and orientation relative to a point, usually a start or end point. Path integration can be accomplished in several different ways, notably by computing over changes in heading and distances traveled (Golledge 1999: 122). To study path integration, a navigator is blindfolded, then traverses a path, then turns and continues some more, and finally points to or returns to the starting point. Bees, ants, hamsters, even people perform fairly well at this task, but do make systematic errors. Hamsters and bees typically overshoot (Etienne et al. 1999). People overshoot small distances and small turns and undershoot large ones (Loomis
144
Barbara Tversky
et al. 1999), a widespread error of judgment (Poulton 1989). But note what blindfolding has done. It has removed the cues in the environment that complement and typically correct the incorrect internal model. Notably missing are landmarks, and in fact, moving landmarks causes desert ants, and presumably others, to err (Muller and Wehner 1994). Path integration, then, is not perfect; it provides global information that is corrected by landmarks in the world.
7 Implications for rationality How do people successfully navigate in the world? There are several possibilities, and like most biological systems, presumably all are realized. One is local, by routines, a well-learned sequence of local actions. This can take us on well-trodden routes, such as from home to office, sometimes without awareness. This will not work for getting between well-trodden routes without going back to “Go,” for old routes with detours, or for computing new routes. Then a global plan seems needed. The contrast between global and local is at least at three levels. Plans encompass a large environment, often conceived of from above an environment, whereas action representations are confined to a particular scene, usually conceived of from within an environment (for discussion with applications to robotics, see Chown et al. 1995; Kuipers 1978, 1982; Kuipers and Levitt 1988). Plans are general and schematic, incompletely specified; actions, by contrast, are both specific and specified. Finally, plans are amodal, whereas actions are precise movements of particular parts of the body in response to precise and specific cues. A route map can be likened to a musical score or sewing instructions; much is left to the artistry of the traveler, the conductor, the tailor, or seamstress. The gap between the two research communities, those that study the mind’s judgments and those that study the navigator’s actions, appears to have narrowed. Both communities aim for an understanding of the complex mechanisms that underlie different spatial activities. Despite the emphasis of the mind community on systematic error and the body community on precise behavior, both find both erroneous and correct responding. For both, correct responding demonstrates the action of the proposed mechanisms, and errors, the boundaries of the theories and of the accurate behaviors. Systematic errors of judgment and navigation have survived millennia of evolution and years of learning because they derive from general systems that serve cognition well in apprehending the world as well as behaving in it. The factors operating on these mechanisms are multiple, and the result is trade-offs that may work in general, but not in specific cases. These mechanisms serve to integrate information, to remember information, to retrieve it, and to manipulate it. These processes entail schematization of the information, eliminating some, exaggerating some, distorting some, processes that inevitably produce bias and error. Whatever correctives the world supplies affect local and specific judgments and actions; they do not affect the general
How to get around by mind and body 145 mechanisms that generate them. Humans, by their own account, make errors of judgment and action in domains other than space (see Tversky and Kahneman 1983 for errors in abstract domains, and a different account of them). At a global level, then, the cognitive mechanisms appear rational; at the local level of behavior, biased. The challenge, then, is to build an account of rationality, evolution, and learning that encompasses behavior that at one level of analysis appears reasonable and at another, replete with error.
Acknowledgment Preparation of the manuscript was supported by Office of Naval Research, Grants Number N00014-PP-1-0649 and N000140110717 to Stanford University.
Bibliography Baddeley, A.D. (1990) Human Memory: Theory and Practice, Boston: Allyn and Bacon. Baird, J. (1979) “Studies of the cognitive representation of spatial relations: I. Overview,” Journal of Experimental Psychology: General, 108: 90–91. Baird, J., Merril, A., and Tannenbaum, J. (1979) “Studies of the cognitive representations of spatial relations: II. A familiar environment,” Journal of Experimental Psychology: General, 108: 92–98. Bryant, D.J. and Tversky, B. (1999) “Mental representations of spatial relations from diagrams and models,” Journal of Experimental Psychology: Learning, Memory and Cognition, 25: 137–156. Bryant, D.J., Tversky, B., and Franklin, N. (1992) “Internal and external spatial frameworks for representing described scene,” Journal of Memory and Language, 31: 74–98. Bryant, D.J., Tversky, B., and Lanca, M. (2001) “Retrieving spatial relations from observation and memory,” in E. van der Zee and U. Nikanne (eds), Conceptual Structure and its Interfaces with Other Modules of Representation, Oxford: Oxford University Press. Chase, W.G. and Chi, M.T.H. (1981) “Cognitive skill: implications for spatial skill in large-scale environments,” in J.H. Harvey (ed.), Cognition, Social Behavior, and the Environment, Hillsdale, NJ: Erlbaum, 111–136 Chown, E., Kaplan, S., and Kortenkamp, D. (1995) “Prototypes, location, and associative networks (PLAN): toward a unified theory of cognitive maps,” Cognitive Science, 19 (1): 1–51. Etienne, A.S., Maurer, R., Georgakopoulos, J., and Griffin, A. (1999) “Dead reckoning (path integration), landmarks, and representation of space in a comparative perspective,” in R.G. Golledge (ed.), Wayfinding Behavior: Cognitive Mapping and Other Spatial Processes, Baltimore, MD: Johns Hopkins Press, 197–228. Franklin, N. and Tversky, B. (1990) “Searching imagined environments,” Journal of Experimental Psychology: General, 119: 63–76. Gallistel, C.R. (1989) “Animal cognition: the representation of space, time and number,” Annual Review of Psychology, 40: 155–189.
146
Barbara Tversky
Gallistel, C.R. (1990) The Organization of Learning, Cambridge, MA: MIT Press. Golledge, R.G. (ed.) (1999) Wayfinding Behavior: Cognitive Mapping and Other Spatial Processes, Baltimore, MD: Johns Hopkins Press. Hirtle, S.C. and Jonides, J. (1985) “Evidence of hierarchies in cognitive maps,” Memory and Cognition, 13: 208–217. Holyoak, K.J. and Mah, W.A. (1982) “Cognitive reference points in judgments of symbolic magnitude,” Cognitive Psychology, 14: 328–352. Kahneman, D. (1974) Attention and Effort, New York: Prentice Hall. Kahneman, D. and Tversky, A. (1979) “Prospect Theory: an analysis of decisions under risk,” Econometrica, 47: 263–291. Kuipers, B. (1978) “Modeling spatial knowledge,” Cognitive Science, 2: 129–153. Kuipers, B. (1982) “The ‘Map in the head’ metaphor,” Environment and Behavior, 14: 202–220. Kuipers, B. and Levitt, T. (1988) “Navigation and mapping in large-scale space,” AI Magazine, 9(2). Loomis, J.M., Klatzky, R.L, Golledge, R.G., and Philbeck, J.W. (1999) “Human navigation by path integration,” in R.G. Golledge (ed.), Wayfinding Behavior: Cognitive Mapping and Other Spatial Properties, Baltimore, MD: Johns Hopkins Press, 125–151. Maki, R.H. (1981) “Categorization and distance effects with spatial linear orders,” Journal of Experimental Psychology: Human Learning and Memory, 7: 15–32. McNamara, T.P. and Diwadkar, V.A. (1997) “Symmetry and asymmetry of human spatial memory,” Cognitive Psychology, 34: 160–190. Milgram, S. and Jodelet, D. (1976) “Psychological maps of Paris,” in H. Proshansky, W. Ittelson, and L. Rivlin (eds), Environmental Psychology, 2nd edn, New York: Holt, Rinehart & Winston, 104–124. Morrison, J.B. and Tversky, B. (in press) “Bodies and their parts”, Memory and Cognition. Muller, M. and Wehner, R. (1994) “The hidden spiral: systematic search and path integration in desert ants, Cataglyphus fortis,” Journal of Comparative Physiology, A175: 525–530. Newcombe, N. and Liben, L. (1982) “Barrier effects in the cognitive maps of children and adults,” Journal of Experimental Child Psychology, 34: 46–58. Portugali, Y. (1993) Implicate Relations: Society and Space in the Israeli–Palestinian Conflict, the Netherlands: Kluwer. Poulton, E.C. (1989) Bias in Quantifying Judgments, Hillsdale, NJ: Erlbaum Associates. Rosch, E. (1975) “Cognitive reference point,” Cognitive Psychology, 7: 532–547. Sadalla, E.K. and Magel, S.G. (1980) “The perception of traversed distance,” Environment and Behavior, 12: 65–79. Sadalla, E.K. and Staplin, L.J. (1980) “The perception of traversed distance: intersections,” Environment and Behavior, 12: 167–182. Sadalla, E.K., Burroughs, W.J., and Staplin, L.J. (1980) “Reference points in spatial cognition,” Journal of Experimental Psychology: Human Learning and Memory, 6: 516–528. Stevens, A. and Coupe, P. (1978) “Distortions in judged spatial relation,” Cognitive Psychology, 10: 422–437. Thorndyke, P. (1981) “Distance estimation from cognitive maps,” Cognitive Psychology, 13: 526–550.
How to get around by mind and body 147 Tversky, A. and Gati, I. (1978) “Studies of similarity,” in E. Rosch and B.B. Lloyd (eds), Cognition and Categorization, Hillsdale, NJ: Erlbaum, 79–98. Tversky, A. and Kahneman, D. (1983) “Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment,” Psychological Review, 90: 293–315. Tversky, B. (1981) “Distortions in memory for maps,” Cognitive Psychology, 13: 407–433. Tversky, B. (1992) “Distortions in cognitive maps,” Geoforum, 23: 131–138. Tversky, B. (1993) “Cognitive maps, cognitive collages, and spatial mental models,” in A.U. Frank and I. Campari (eds), Spatial Information Theory: A Theoretical Basis for GIS, Berlin: Springer-Verlag, 14–24. Tversky, B. (2000a) “Levels and structure of cognitive mapping,” in R. Kitchin and S.M. Freundschuh (eds), Cognitive Mapping: Past, Present and Future, London: Routledge, 24–43. Tversky, B. (2000b) “Remembering spaces,” in E. Tulving and F.I.M. Craik (eds), Handbook of Memory, New York: Oxford University Press, 363–378. Tversky, B. (2001) “Spatial schemas in depictions,” in M. Gattis (ed.), Spatial Schemas and Abstract Thought, Cambridge, MA: MIT Press, 79–111. Tversky, B. and Lee, P.U. (1998) “How space structures language,” in C. Freksa, C. Habel, and K.F. Wender (eds), Spatial Cognition: An Interdisciplinary Approach to Representation and Processing of Spatial Knowledge, Berlin: Springer-Verlag, 157–175. Tversky, B. and Lee, P.U. (1999) “Pictorial and verbal tools for conveying routes,” in C. Freksa and D.M. Mark (eds), Spatial Information Theory: Cognitive and Computational Foundations of Geographic Information Science, Berlin: Springer, 51–64. Tversky, B., Kim, J., and Cohen, A. (1999) “Mental models of spatial relations and transformations from language,” in C. Habel and G. Rickheit (eds), Mental Models in Discourse Processing and Reasoning, Amsterdam: North-Holland, 239–258. Tversky, B., Morrison, J.B., and Zacks J. (2002) “On bodies and events,” in A. Meltzoff and W. Prinz (eds), The Imitative Mind: Development, Evolution and Brain Bases, Cambridge: Cambridge University Press, 221–232. Wilton, R.N. (1979) “Knowledge of spatial relations: the specification of information used in making inferences,” Quarterly Journal of Experimental Psychology, 31: 133–146.
8
Simulation and the evolution of mindreading Chandra Sekhar Sripada and Alvin I. Goldman
The subject of mindreading, also known as “theory of mind,” or “folk psychology,” has several dimensions or questions. The principal questions include: 1 2 3
How do naïve individuals go about the task of attributing mental states (especially to others)? How is the capacity, or skill, of mental attribution acquired in individuals? How do naïve individuals understand, or represent, mental state concepts, such as desire or belief?
In addition to these central questions about mindreading, there are some closely associated questions, for example: 4 5
What can be learned about mindreading from its psychopathology (e.g., autism or schizophrenia)? What is the evolutionary history of (human) mindreading?
This essay is addressed to questions (1) and (5). For any theory of how human beings mindread, i.e., what processes or routines they execute to arrive at their mental attributions, the question arises of how that theory fits with the evolution of the human brain. Though mindreading may be unique to the human brain, or at most confined to our very nearest relatives, there should nevertheless be a story of how this distinctive capacity evolved. Given two theories of mindreading that otherwise have comparable levels of confirmation, the theory that admits of a more defensible evolutionary account would be preferable. By contrast, a theory of mindreading is evidentially impaired if it is incompatible with any plausible evolutionary story. Thus, theories of mindreading can be “tested” by their mesh, or fit, with plausible accounts of human brain evolution. In this essay we are concerned with a particular theory of mindreading: the simulation theory (ST). More precisely, we are concerned with this theory as applied to a specific, narrowly circumscribed subdomain of mental
Simulation and the evolution of mindreading 149 attribution. Although most discussions of mindreading focus on propositional attitudes (desires, beliefs, hopes, intentions, etc.), we shall concentrate on a different class of mental states, the (so-called) basic emotions. These are happiness, sadness, fear, disgust, anger, and surprise (Ekman 1992). The detection of such states through facial expressions seems to be a primitive form of mindreading (as discussed, e.g., by Darwin 1872). Thus, there is more likely to be a definite and distinctive story of brain evolution connected with the facial mindreading of basic emotions than, perhaps, the mindreading of other types of mental states. At any rate, we have proposals to make concerning this subdomain of mindreading, so we restrict ourselves to this arena. In particular, we make no claim that the form or style of mindreading utilized in the facial mindreading of basic emotions generalizes to other mindreading domains.1
1 Paired deficits in face-based emotion recognition The mindreading task on which we concentrate may be called face-based emotion recognition (FaBER), where “recognition” is used in the sense of “attribution.” In another paper (Goldman and Sripada 2005), we argue that a simulation account of FaBER is well supported by recent work in neuropsychology; at least it is well supported for the mindreading of three of the basic emotions: fear, disgust, and anger. Let us review the neuropsychological findings and the reasons we adduce for their support of a simulationist account. The principal evidence consists in findings, some clinical and some experimental, that reveal a pattern of paired deficits between emotion production (or experience) and face-based recognition (attribution) of the same emotion. In other words, patients who are neurologically impaired in experiencing a given emotion are also impaired, or abnormal, in their ability to detect that emotion in tests of emotion discrimination from observed facial expressions. An initial example of this pattern was described by Ralph Adolphs and colleagues (Adolphs et al. 1994), a patient SM, who suffers from a rare metabolic disorder that resulted in bilateral destruction of her amygdalae. The amygdala is generally recognized as playing a prominent role in mediating fear, and SM was indeed abnormal in her experience of fear. Antonio Damásio, a co-author in the Adolphs et al. studies, writes: S[M] does not experience fear in the same way you or I would in a situation that would normally induce it. At a purely intellectual level she knows what fear is supposed to be, what should cause it, and even what one may do in situations of fear, but little or none of that intellectual baggage, so to speak, is of any use to her in the real world. (Damásio 1999: 66) When tested on face-based recognition tasks, SM was found to be substantially abnormal. In FaBER tests, subjects are presented with photographs or
150
C. Sripada and A. Goldman
video slides showing facial expressions, and are asked to identify the emotion states to which the expressions correspond. SM’s ratings of fearful faces correlated less with normal ratings than did those of 12 other brain-damaged control subjects. Another subject, NM, who also had bilateral amygdala damage, was studied by Sprengelmeyer et al. (1999). Like SM, NM was abnormal in his experience of fear. He was prone to dangerous activities, such as hunting jaguar in the Amazon River basin and hunting deer in Siberia while dangling from a helicopter. He was also found to exhibit a severe and selective impairment in face-based recognition for fear (that is, an impairment for fear but not other emotions). Similar findings were made with a larger sample of nine patients with bilateral amygdala damage (including SM). A precisely analogous finding – i.e., a finding of a paired deficit – has been made in connection with disgust. Calder and colleagues (2000) studied a patient NK who suffered insula and basal ganglia damage. Previous studies of disgust, in both animals and humans, had identified the anterior insula region as a region associated with disgust (Rolls and Scott 1994; Small et al. 1999). Using a questionnaire method for testing the experience of disgust, patient NK’s overall score for disgust was significantly lower than control subjects, whereas his scores for anger and fear did not significantly differ from those for controls. Moreover, tests of NK’s ability to recognize emotions in faces showed him to have a significant and selective impairment in disgust recognition. Finally, there is a parallel finding for the emotion of anger, though this time using an entirely different, experimental technique. A number of theorists have proposed that the dopamine system is specialized for the processing of aggression in the context of agonistic encounters in a wide variety of species, and this system plays an important role in mediating the experience of anger (Lawrence and Calder, 2004). In rats and a number of other species, dopamine levels are elevated in agonistic encounters. It has also been found that dopamine antagonists, which reduce the level of dopamine, selectively impair responses to agonistic encounters. Andrew Lawrence and colleagues (2002) hypothesized that the administration of a dopamine antagonist (specifically, sulpiride) in normal humans would disrupt face-based recognition of anger, while sparing the recognition of other emotions. This is exactly what they found. Following sulpiride administration, subjects were found to be significantly worse at recognizing angry faces, though they displayed no such impairments in recognizing expressions of any other emotions. So we have, in three distinct emotions, the same pattern: deficits in the experience of an emotion and deficits in the face-based recognition of that emotion reliably co-occur. A different kind of study, dealing exclusively with unimpaired emotions, provides additional evidence of a correlation between experiencing an emotion and perceiving a facial expression of the same emotion. Wicker et al. (2003) performed an fMRI study of disgust. Participants, who were
Simulation and the evolution of mindreading 151 entirely normal, did two types of tasks. First, they passively viewed movies of individuals smelling the contents of a glass, where the contents were either disgusting, pleasant, or neutral. Second, the same participants inhaled disgusting or pleasant odorants through a mask on their nose and mouth. Neuroimaging revealed that the same core areas of the brain were preferentially activated both during the experience of disgust and during the observation of disgust facial expressions. These areas were the anterior insula and the right anterior cingulate cortex. This again indicates a firm correlation between experiencing and observing (facial expressions of) the same emotion.
2 Simulation theory versus theory–theory What light does all this evidence shed on how people mindread (basic) emotions? Restricting attention to the two main types of mindreading approaches, ST and theory – theory (TT), we argue that the evidence supports ST for FaBER. To see why, consider the core ideas behind TT and ST. According to TT, a mindreader selects a mental state for attribution to a target based purely on inference from other information about the target. So TT is a purely information-based approach. It says that attributors engage in mindreading by deploying folk-psychological beliefs about people’s minds, about the ways that their mental states interact with the environment, their other mental states, and their behavior. The core idea in ST, by contrast, is that attributors select a mental state for attribution to a target by reproducing or “enacting” in their own minds the very state in question. In other words, an attributor replicates, or tries to replicate, a target’s mental state by undergoing the same or a similar mental process to one that the target undergoes. If she wants to attribute a future decision to a target, for example, she might try to replicate the target’s decision-making process in her own mind and use the output of this process as the decision to assign to the target. Now the evidence of reliable co-occurrence of deficits in emotion experience and deficits in the face-based recognition of the same emotion is related to the simulationist hypothesis. If a normal person successfully mindreads via simulation, then she undergoes the same, or a relevantly similar, process to the one the target undergoes in using or arriving at the target state. Someone impaired in experiencing a given emotion will be unable to simulate a process that includes that emotion. Attempts at simulation will fail. Thus, ST predicts that a person damaged in experiencing fear would have trouble mindreading fear. So the phenomenon observed in patient SM – a paired deficit in fear experience and fear recognition – is straightforwardly predictable under ST. Similarly, the correlation observed in patient NK – a paired deficit in disgust experience and disgust recognition – is predictable under ST. By contrast, there is no reason to expect such deficits under TT. There is no antecedent reason to expect impairment in having fear
152
C. Sripada and A. Goldman
to produce impairment in having representations about fear. So if mindreading of emotions via facial expressions proceeds in a purely information-based fashion, there is no reason to predict impairment in fear recognition, given impairment in undergoing fear. And similarly for disgust and anger. So in each case, ST predicts the paired deficits, whereas TT would not predict them. Perhaps TT could be supplemented with some ad hoc assumptions that would render this correlation predictable. But ST predicts them in a perfectly natural way, without ad hoc assumptions. Additional data from the paired deficit studies also run against the TT account. To explain poor performance on a FaBER task, TT could appeal, in principle, to either of two types of deficits: (1) loss of the capacity to discriminate the configural features of faces, or (2) loss of theoretical knowledge of generalizations linking configural facial features to the name of a target emotion. However, the paired deficit studies indicate that neither of these theorizing deficits obtains. In most of these studies, subjects performed perfectly normally on measures designed to identify difficulty with the perceptual processing of faces. For example, SM’s ability to recognize facial identity was fully preserved; she correctly identified 19 of 19 photographs of familiar faces, some of whom she had not seen for many years. Moreover, in the studies in question, deficits in FaBER routinely occurred alongside preservation of subjects’ general declarative knowledge regarding emotions. For example, patients with impairments in disgust recognition were able to provide plausible situations in which a person might feel disgusted, and do not show impaired knowledge about the concept of disgust. So the kinds of theorizing deficits to which TT might appeal were not in fact found in the subjects with paired deficits. For these and related reasons, we conclude that FaBER, a distinctive type of mindreading task, is executed via simulation, that is, by some sort of process that involves the enactment, or realization, at a subthreshold level, of the same emotion that the attributor mindreads in the target. This comports well with the Wicker et al. (2003) finding, for the case of disgust, that the same neural region distinctive to experiencing disgust is also activated when one observes a disgust expression in a target’s face. To say that a simulation process is involved in FaBER is not yet to say what specific simulation process is involved. In Goldman and Sripada (2005), we propose several candidate processes that might be the simulation process in question. We do not feel, however, that there is enough evidence at present to select one of these candidate processes. Nonetheless, we feel that current evidence is strong enough to underwrite the conclusion that some sort of simulation is involved.
3 The evolution of face-based emotion recognition Thus far, we have adduced clinical and experimental evidence suggesting that FaBER is subserved by simulation routines rather than theory-based routines. Now we turn to another type of evidence, evolutionary evidence,
Simulation and the evolution of mindreading 153 that bears on the question of what kinds of mechanisms or processes are utilized in FaBER. Little has been written about how evolutionary considerations might play a role in providing evidential support for either the ST or TT approach to mindreading. In what follows, we will introduce two different kinds of evolutionary argument that we feel support simulation routines in the domain of FaBER. Overall, evolutionary arguments of the type we’ll provide are not themselves decisive in favor of simulation. Rather, these evolutionary arguments complement existing lines of evidence, extending the evidential basis with which one might resolve competing hypotheses, as well as suggesting further avenues of inquiry. Before proceeding, it’s worth reemphasizing that FaBER differs in important ways from other forms of mindreading, such as attributions of propositional attitudes. Because of the differences between FaBER and other types of mindreading, it cannot be assumed that the processes characteristic of FaBER can be extrapolated to other types of mindreading. Nor can it be assumed that the evolutionary considerations relevant to FaBER will also be relevant to other kinds of mindreading. With these caveats in mind, we commence our discussion of two different kinds of evolutionary argument relevant to the question of what kinds of mechanisms or processes are utilized in FaBER.
4 The argument from ecological rationality The first evolutionary argument we present emphasizes the manner in which simulation routines might have been favored by natural selection because they efficiently exploit the information structure of the environment. Evolutionary psychologists have argued that when inferring the evolved structure of human psychology it is important to look at the adaptive problems that faced organisms in the ancestral past (Cosmides and Tooby 1992, 1994). Proponents of the so-called “ecological rationality” approach in evolutionary psychology have emphasized that theorists should pay heed to the informational structure of adaptive problems (Gigerenzer et al. 1999). They have argued, quite persuasively in our view, that natural selection often takes advantage of the recurrent informational structure of an adaptive problem in building inferential short cuts and heuristics. Here is a passage representative of this idea: Standard statistical models, and standard theories of rationality, aim to be as general as possible, so they make as broad and as few assumptions as possible about the data to which they will be applied. But the way information is structured in real-world environments often does not follow convenient simplifying assumptions. For instance, whereas most statistical models are designed to operate on data sets where means and variances are independent, Karl Pearson . . . noted that in natural situations these two measures tend to be correlated, and thus each can be
154
C. Sripada and A. Goldman used as a cue to infer the other. . . . While general statistical methods strive to ignore such factors that could limit their applicability, evolution would seize upon informative environmental dependencies such as this one and exploit them with specific heuristics if they would give a decision-making organism an adaptive edge. (Gigerenzer and Todd 1999: 19)
The social learning of food preferences in the rat nicely illustrates the underlying idea behind the evolution of ecologically rational heuristics. Omnivorous species such as the Norway rat (Rattus Norvegicus) face the problem that they must identify which of a large number of available foods are safe to eat and which are poisonous, where the costs of making an error can be quite severe. Through an elegant series of experiments, Bennett Galef and his colleagues have found that the Norway rat uses an efficient and highly effective heuristic for food selection: a food is safe to eat if its odor is detectable on the breath of one’s living conspecifics (Galef 1987). For obvious reasons, the property of being on the breath of one’s living conspecifics is reliably correlated with the property of being safe to eat. Were a food poisonous, then its odor would not be on the breath of one’s fellow living rats! In effect, the property of being on the breath of one’s conspecifics serves as a reliable and readily detectable proxy for a much more difficult to detect property – being safe to eat. It is not surprising, then, that natural selection exploited this reliable correlation present in the environment in building this simple and efficient social learning heuristic for food selection in the Norway rat. In this section, we offer an argument in favor of simulation routines for FaBER that is based on considerations of “ecological rationality” along these lines. We begin by noting that mindreading is an activity that occurs, almost exclusively, in cases in which the target and the attributor are both members of the same species. For example, human beings routinely seek to detect and classify the mental states of their friends, family, neighbors, and others in their social milieu. Thus a reasonable generalization is that mindreading, except in rare and atypical cases, is an intra-species attribution task. This fact appears to be so obvious it is hardly worth noting. But the fact that attributor and target are members of the same species is quite significant because it changes quite dramatically the informational structure of mindreading conceived of as an adaptive problem. Because the task of mindreading occurs among members of the same species, it is a reliable feature of the mindreading task that the target of one’s mental attributions can be presumed to share mental processes of the attributor. Why should this relevant similarity reliably obtain? Members of one’s own species are related to one another by close ties of common ancestry. Barring cases of severe disorders or defects, individuals within a species can be expected to share, to a substantial degree, the same cognitive mechanisms, structures and processes. Moreover, if we restrict ourselves to
Simulation and the evolution of mindreading 155 the mental processes relevant to FaBER, the degree of sharing will be yet higher still. FaBER concerns the attribution of emotions and, as we noted earlier, basic emotions are thought to have a strong innate and speciestypical basis. For this reason, members of one’s own species can be reliably expected to share virtually the same cognitive equipment of the kind relevant to FaBER. Simulation is a heuristic that works precisely by taking advantage of this type of sharing. Whether or not a simulator employs a tacit premise that the target is relevantly like herself, the routine can yield good results only if there is, in fact, a relevant similarity between them (Heal 1986; Goldman 1989). When this relevant similarity obtains, the organism can exploit its own cognitive equipment to make accurate attributions of mental states to the target. The organism does not need to possess databases of information or specialized mechanisms of inference required for theory-based mindreading. Rather, it simply exploits the fact that its own mental processes are an excellent facsimile of the target’s. To sum up, then, simulation is an efficient heuristic if the attributor and the target share relevantly similar cognitive mechanisms and processes. It seems plausible that natural selection will have exploited the fact that this relevant similarity will have reliably obtained during the ancestral past in building simulation routines for FaBER.
5 The argument from exaptation The second kind of evolutionary argument we consider emphasizes the evolvability of simulation routines. Natural selection often builds new capacities from existing ones, rather than starting from scratch. Traits that originally evolved for other uses, and are subsequently co-opted for their current purpose are called exaptations (Gould and Vrba 1982), and the exapting of preexisting traits is a familiar pathway by which natural selection builds many complex traits. Mother Nature is a tinkerer rather than a planner – it is easier for her to redeploy existing capacities than to design an entirely novel structure. So if a complex trait can be evolved from a preexisting one, other things being equal, that is a more likely evolutionary path. The evolution of flight in bird species illustrates the idea of exaptation nicely. Many theorists have noted that traits such as feathered wings for flight in birds would have had a difficult time evolving, since it would hardly be any use at all to have just one part of a wing, while evolving entire feathered wings all at once would be highly improbable. The process of exaptation helps address this difficulty. According to one influential hypothesis, feathers originally evolved for the purpose of thermoregulation, while wing-like appendages originally evolved for the purpose of catching insects. According to this hypothesis, these two traits were later co-opted in building feathered wings for the purpose of flight (Ostrom 1974, 1979, cited in Gould and Vrba 1982). More generally, exaptation provides a crucial evolutionary pathway for building complex new capacities. When an existing
156
C. Sripada and A. Goldman
capacity is exapted for another purpose, this reuse of the existing capacity serves to “smooth” the evolutionary gradient over which natural selection operates, making the new capacity much more readily evolvable. By the criterion of evolvability, simulation routines for FaBER appear to be more evolvable than theory-based routines. This is because simulation routines naturally arise by co-opting preexisting capacities that the organism already possesses. The fundamental idea of simulational mindreading is that the attributor exploits his or her own cognitive equipment to select attributions of mental states to others. In the case of FaBER the individual would somehow use her own emotion system for the purpose of emotion attribution. Theory-based approaches to FaBER, by contrast, would require the organism to be outfitted with brand new functionally complex inferential mechanisms and/or informational databases, in order to acquire a capacity for mindreading emotions. Thus simulation routines have an evolutionary advantage over theory-based routines in that they can exploit preexisting mechanisms and processes for the purposes of mindreading. We shall identify several types of preexisting processes or traits that might be eligible for exaptation for mindreading purposes, and in particular for the purposes of FaBER. In the evolutionary scenario we envision, important precursors for the routines that underlie FaBER are the processes utilized in the closely related phenomenon of emotion contagion. A number of theorists have noted the phenomenon of emotion contagion in the animal world, including human beings. In emotion contagion, the occurrence of an emotion in one individual plays a causal role in generating a similar emotion in another individual. This is a kind of simulation, though not yet a specimen of mindreading simulation because the “receiver” need not impute the mental state experienced to the “sender.” However, once a process of emotion contagion is established, it could easily be exapted into simulational mindreading. In what follows, the structure of our argument is as shown here. First we offer several evolutionary rationales for why the phenomenon of emotion contagion would have been adaptive and would evolve. Then we sketch several models of the processes that underlie emotion contagion. Finally we show that there is a readily available evolutionary pathway by which the processes that underlie emotion contagion can be transmuted into simulation routines for mindreading. Our account highlights the manner in which various preexisting mechanisms and processes might have been redeployed for new purposes during the course of the evolution of simulation routines for FaBER, making such routines more readily evolvable. The case of disgust illustrates the adaptive rationale behind emotion contagion quite vividly. As many writers point out (e.g., Wicker et al. 2003), it would generally be adaptive to experience disgust when others display this emotion. Disgust is frequently experienced in response to a food item that should not be eaten. If an individual observes a conspecific having such a response to a food item, it would be adaptive for that individual to have the
Simulation and the evolution of mindreading 157 same disgust response vis-à-vis that food item, in order to induce avoidance. A similar logic applies to other sources of disgust, such as disease or putrefaction, the avoidance of which would also be adaptive. The hypothesis that disgust is contagious in this way predicts that in normal subjects, the neural mechanisms implicated in the production of an emotion should be automatically activated when emotion-expressive faces are visually presented. In the Wicker et al. (2003) study described earlier, precisely this pattern was found in the specific case of disgust. The case of fear provides another example of the adaptive rationale for emotion contagion. Fear is often produced by an environmental stimulus that is a danger or threat to the subject, and often to others in the subject’s vicinity. For this reason, it will frequently be adaptive to experience fear when another displays this emotion – even if one does not see the rationale for the fear oneself. The strategy of emotion contagion with respect to fear undoubtedly yields many false positives. Nevertheless, it may be evolutionarily favored because of the long-term fitness benefits that accrue from following the adage “better safe than sorry.” Thus there may be strong selection pressures favoring emotion contagion with respect to fear. Furthermore, the logic we’ve applied to disgust and fear may also generalize to other emotions. If others in one’s group detect a rationale in their immediate environment for experiencing some emotion, it may well be an adaptive strategy to experience this emotion oneself, even if one does not detect the rationale oneself. Other lines of evidence do indeed support the existence of emotion contagion for these other emotions as well (see Hatfield et al. 1994 for a review). What is the structure of the processes that underwrite emotion contagion? One model of how emotion contagion might be implemented is that it is mediated by feedback from facial expressions. A number of authors have noted that manipulation of the facial musculature, either voluntarily or involuntarily, has a causal effect in generating, at least in attenuated form, the corresponding emotion state and its cognitive and physiological correlates (Tomkins 1962; Laird and Bressler 1992). According to this so-called facial feedback model, when an emotion expressive face is displayed, subjects rapidly and covertly mimic this facial expression. The generation of this mimicked facial expression in turn serves to produce experience of the corresponding emotion state. Support for the facial feedback model comes from a number of studies which find that when presented with emotionexpressive faces, subjects rapidly and covertly display electromyographically detectable activation of facial musculature in a pattern that mimics the displayed faces (Lundquist and Dimberg 1995; Dimberg and Thunberg 1998). Another hypothesis about how emotion contagion might be implemented proposes that there is a much more direct relationship between the observation of an emotion-expressive face and experience of the corresponding emotion. According to what we may call the direct resonance model, observation of emotion-expressive faces directly triggers the corresponding emotion
158
C. Sripada and A. Goldman
state in the observer, without mediation by the facial musculature. This is the idea behind Gallese’s (2001, 2003) “shared manifold hypothesis,” and is suggested by Wicker et al. (2003: 661) when they speak of an automatic sharing, by the observer, of the displayed emotion. The existence of mechanisms that underwrite this “direct resonance” for emotions would parallel findings of mirror-neuron matching systems found in monkeys and humans, in which internal action representations, normally associated with producing actions, are triggered during the observation of, or listening to, someone else’s corresponding actions (Gallese et al. 1996; Kohler et al. 2002; Rizzolatti et al. 2001). Of course there may be other hypotheses about the processes that underwrite emotion contagion. But now let us turn to the question of how the phenomenon of emotion contagion might provide an evolutionary argument in favor of simulation routines for FaBER. Suppose there is a process that underwrites emotion contagion in the case of disgust, where this process operates by means of facial feedback, direct resonance, or perhaps some other means. This process generates the experience of disgust in an individual when s/he observes someone else’s disgust-expressive face. Once a process of this sort is established, it can readily be exapted into simulational mindreading. The point is simple. If emotion state E in sender S reliably triggers a corresponding emotion state E in receiver R, at least when R sees S’s facial expression, then R is in a position to capitalize on his own experience of E to impute it to S. She simply has to acquire a process or propensity to classify her own state appropriately and assign the so-called classified state to S. This process is a simulation process because it involves duplication or replication of the state imputed to the target within the attributor. So there is a readily available evolutionary pathway to a simulation routine for face-based disgust attribution. In contrast, no similar evolutionary pathway is readily available to a theory-based routine for face-based disgust attribution. Similar reasoning would apply to other emotions in which there are established processes of contagion.2 The evolutionary argument we are offering here should not be seen simply as a kind of “just so” story for which there is no independent evidential support. Rather, as we’ve emphasized, there are good independent reasons to believe that the phenomenon of emotion contagion exists. There is a large literature in psychology and related disciplines that attests to the pervasiveness of emotion contagion in animal and human interactions. Furthermore, there are good independent reasons to believe that simulation routines are in fact utilized in FaBER. The paired deficit data we’ve adduced, and other kinds of data, suggest that emotion recognition is causally dependent on emotion production, as the simulation hypothesis predicts. What our current evolutionary argument suggests is that there is a plausible evolutionary scenario that links these two independently corroborated claims. In particular, this scenario envisions that simulation routines for FaBER may have emerged as simple transmutations of processes underlying emotion
Simulation and the evolution of mindreading 159 contagion, where these emotion contagion processes originally emerged for other non-mindreading-related purposes. Thus we believe that our evolutionary scenario extends the evidential support for the simulation account of FaBER, and is not merely a “just so” story. Finally, the evolutionary scenario we’ve proposed in which simulation routines for FaBER are exapted from preexisting emotion contagion routines is open to further empirical testing. An especially important kind of data in assessing evolutionary hypotheses is comparative data from various animal species. Such data is often invaluable in piecing together the evolutionary trajectory by which some trait emerged. Emotion contagion has been investigated extensively in a number of animal species, including pigeons, rats, apes, and monkeys (for a review, see Preston and De Waal 2002), and its existence has been confirmed in many of these species. Studies of emotion recognition in non-human animals, in contrast, are virtually non-existent. A notable exception is the recent study by Linda Parr that finds that chimpanzees may have at least rudimentary abilities to recognize the emotion states of conspecifics (Parr 2001). Further study aimed at identifying the processes that subserve emotion contagion and emotion recognition in various animal species, as well as other kinds of comparative data, will undoubtedly further enhance the evidential basis for assessing the evolutionary hypothesis we have sketched.
Notes 1 Elsewhere, however, Goldman develops the case for a simulation (or simulationtheory hybrid) account of other domains of mindreading (Goldman, in preparation). 2 The processes of emotion contagion might themselves have evolved via the redeployment of preexisting mechanisms. The facial feedback model, in particular, naturally lends itself to this interpretation. According to a widely accepted account, the production of basic emotions such as fear, anger, and disgust is subserved by so-called affect programs (Ekman 1992), which are innate and phylogenetically old neural mechanisms that underwrite specific emotions. Upon triggering by the appropriate stimuli, individual affect programs generate a suite of coordinated responses including an emotion-specific facial expression. A natural interpretation of the facial feedback model for emotion contagion is that it evolves by co-opting preexisting links within affect programs between the experience of an emotion and emotion-specific facial expressions. Facial feedback essentially amounts to running these preexisting links in the reverse direction.
Bibliography Adolphs, R., Tranel, D., Damásio, H., and Damásio, A. (1994) “Impaired recognition of emotion in facial expressions following bilateral damage to the amygdala,” Nature, 372: 669–672. Calder, A.J., Keane, J., Manes, F., Antoun, N., and Young, A.W. (2000) “Impaired recognition and experience of disgust following brain injury,” Nature Reviews Neuroscience, 3: 1077–1078.
160
C. Sripada and A. Goldman
Cosmides, L. and Tooby, J. (1992) “Cognitive adaptations for social exchange,” in J. Barkow, L. Cosmides, and J. Tooby (eds), The Adapted Mind: Evolutionary Psychology and the Generation of Culture, New York: Oxford University Press, 163–228. Cosmides, L. and Tooby, J. (1994) “Origins of domain specificity: the evolution of functional organization,” in L. Hirschfeld and S. Gelman (eds), Mapping the Mind, Domain Specificity in Cognition and Culture, New York: Cambridge University Press, 85–116. Damásio, A. (1999) The Feeling of What Happens, New York: Harcourt Brace and Company. Darwin, Charles (1972) The Expression of the Emotions in Man and Animals, London: Murray. Dimberg, U. and Thunberg, M. (1998) “Rapid facial reactions to emotional facial expressions,” Scandinavian Journal of Psychology, 39: 39–45. Ekman, P. (1992) “Are there basic emotions?,” Psychological Review, 99(3): 550–553. Galef, B.G. (1987) “Social influences on the identification of toxic foods by Norway rats,” Animal Learning and Behavior, 15: 327–332. Gallese, V. (2001) “The ‘shared manifold’ hypothesis: from mirror neurons to empathy,” Journal of Consciousness Studies, 8(5–7): 33–50. Gallese, V. (2003) “The manifold nature of interpersonal relations: the quest for a common mechanism,” Philosophical Transactions of the Royal Society (Series B: Biological Sciences), 358(1431). Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996) “Action recognition in the premotor cortex,” Brain, 119: 593–609. Gigerenzer, G. and Todd, P. (1999) “Fast and frugal heuristics: the adaptive toolbox,” in G. Gigerenzer, P. Todd and the ABC Research Group, Simple Heuristics that Make Us Smart, New York: Oxford University Press, 3–36. Gigerenzer, G., Todd, P., and the ABC Research Group (1999) Simple Heuristics that Make Us Smart, New York: Oxford University Press. Goldman, A.I. (1989) “Interpretation psychologized,” Mind and Language, 4: 161–185. Goldman, A.I. (in preparation) Simulating Minds: The Philosophy, Psychology and Neuroscience of Mindreading, New York: Oxford University Press. Goldman, A.I. and Sripada, C.S. (2005) “Simulationist models of face-based emotion recognition,” Cognition, 94: 193–213. Gould, S. and Vrba, E. (1982) “Exaptation: a missing term in the science of form,” Paleobiology, 8: 4–15. Hatfield, E., Cacioppo, J. and Rapson, R. (1994) Emotional Contagion, New York: Cambridge University Press. Heal, J. (1986) “Replication and functionalism,” in J. Butterfield (ed.), Language, Mind and Logic, Cambridge: Cambridge University Press, 135–150. Kohler, E., Keysers, C., Umilta, M.A., Fogassi, L., Gallese, V. and Rizzolatti, G. (2002) “Hearing sounds, understanding actions: action representation in mirror neurons,” Science, 297: 846–848. Laird, J.D. and Bressler, C. (1992) “The process of emotion experience: a self-perception theory,” in M. Clark (ed.), Review of Personality and Social Psychology, vol. 13: Emotion, New York: Sage Publishers, 213–234. Lawrence, A.D. and Calder A.J. (2004) “Homologizing human emotions,” in D. Evans and P. Cruse (eds), Emotions, Evolution and Rationality, New York: Oxford University Press, 15–47.
Simulation and the evolution of mindreading 161 Lawrence, A.D., Calder, A.J., McGowan, S.M., and Grasby, P.M. (2002) “Selective disruption of the recognition of facial expressions of anger,” NeuroReport, 13(6): 881–884. Lundquist, L. and Dimberg, U. (1995) “Facial expressions are contagious,” Journal of Psychophysiology, 9: 203–211. Ostrom, J.H. (1974) “Archeopteryx and the origin of flight,” Quarterly Review of Biology, 49: 27–47. Ostrom, J.H. (1979) “Bird flight: how did it begin?,” American Scientist, 67: 46–56. Parr, L. (2001) “Cognitive and physiological markers of emotion awareness in chimpanzees (Pan Troglodytes),” Animal Cognition, 4: 223–229. Preston, S. and De Waal, F. (2002) “Empathy: its ultimate and proximal basis,” Behavioral and Brain Sciences, 25(1): 1–20. Rizzolatti, G., Foggasi, L., and Gallese, V. (2001) “Neurophysiological mechanisms underlying the understanding and imitation of action,” Nature Reviews, Neuroscience, 2: 661–670. Rolls, E.T. and Scott, T.R. (1994) “Central taste anatomy and neurophysiology,” in I.R.L. Doty (ed.), Handbook of Olfaction and Gustation, New York: Dekker. Small, D.M., Zald, D.H., Jones-Gotman, M., Zatorre, R.J., Pardo, J.V., Frey, S., and Petrides, M. (1999) “Brain imaging: human cortical gustatory areas: a review of functional neuroimaging data,” NeuroReport, 10: 7–14. Sprengelmeyer R., Young, A.W., Schroeder, U., Grossenbacher, P.G., Federlein, J., Buttner, T., and Przuntek, H. (1999) “Knowing no fear,” Proceedings of the Royal Society (Series B: Biology), 266: 2451–2456. Tomkins, S. (1962) Affect, Imagery, Consciousness: The Positive Affects, vol. 1, New York: Springer. Wicker, B., Keysers, C., Plailly, J., Royet, J.-P., Gallese, V. and Rizzolatti, G. (2003) “Both of us disgusted in my insula: the common neural basis of seeing and feeling disgust,” Neuron, 40: 655–664.
9
Enhancing and augmenting human reasoning Tim van Gelder
Two pioneers of computing and intelligence If you ask philosophers or cognitive scientists to name the most important pioneer in the general field of computing and intelligence, they would probably pick Alan Turing, the English logician and computer scientist who arguably did more than any other single person to put disciplines such as computer science, cognitive science, and artificial intelligence on a solid theoretical footing. If you asked the same question in Silicon Valley, you might well be told of someone much less well-known in academic circles, even though his impact on the academy has been at least as pervasive. Douglas Engelbart was not a theoretician but an engineer, and his effect has been not on the ideas and debates but on the day-to-day activity of scientists and philosophers, as well as virtually every other “knowledge worker” in the ever-expanding information-based economy. Engelbart was concerned with the power and productivity of human intellectual activity. Where Turing focused on making computers smart, Engelbart focused on using computers to make us smarter: artificial intelligence (AI) versus intelligence augmentation (IA). His concern, moreover, was with intelligence not in the abstract or in the laboratory, but in real, everyday work situations, dealing with the kind of urgent practical problems upon which the quality of human life directly turns. In a landmark technical report “Augmenting Human Intellect: A Conceptual Framework,” he wrote: By “augmenting human intellect” we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions, better solutions, and the possibility of finding solutions to problems that before seemed insoluble.
Enhancing and augmenting human reasoning 163 And by “complex situations” we include the professional problems of diplomats, executives, social scientists, life scientists, physical scientists, attorneys, designers – whether the problem situation exists for twenty minutes or twenty years. We do not speak of isolated clever tricks that help in particular situations. We refer to a way of life in an integrated domain where hunches, cut-and-try, intangibles, and the human “feel for a situation” usefully co-exist with powerful concepts, streamlined terminology and notation, sophisticated methods, and high-powered electronic aids. (Engelbart 1962) To meet this challenge, Engelbart and his team invented an integrated cluster of tools and techniques, including the mouse, multiple windows, hypertext linking, integrated text and graphics, and many others. Collectively, these innovations form the foundations of personal computing. They transformed the computer from a symbol manipulator used only by highly trained specialists into a piece of everyday office equipment, used by even the most minimally competent to expedite their ordinary intellectual activities (Bardini 2000). About 50 years ago Turing predicted that by around the current time computers would come to be widely accepted as intelligent, as measured by his own famous test (Turing 1950). This prediction turned out to be well wide of the mark. A decade later, Engelbart predicted that in a similar time frame computers would be widely used to augment human intelligence. This prediction turned out to be right on the money, thanks in no small part to his own contributions. In “Augmenting Human Intellect,” he described a system in which statements were entered into the computer using voice recognition, and were then displayed on a screen. With the aid of a keyboard and a “light pen,” these statements could be edited, deleted, and spatially grouped; and then links could be drawn to indicate the argumentative structure, i.e., the evidential or logical relationships among the statements. Once this argumentative structure had been displayed, the user had a range of techniques for changing how it was viewed: for example, parts of the structure could be selectively hidden; the user could zoom in or out; some parts could be magnified more than the others; and so forth. In the previous paragraph I described this system in prose, and left it to you (the reader) to imagine the system in action. This was a deliberate if feeble attempt to illustrate the feat involved in conceiving such a system back in 1962. For when Engelbart first described this reasoning-manipulating system, nothing remotely like it existed; indeed, all the essential components (on-screen editing, etc.) which are so familiar to us today, and out of which we might build our own mental conception of the system, had yet to be developed.
164
Tim van Gelder
Argument mapping A core aspect of Engelbart’s proposed argumentation support system was the use of node-and-link-type diagrams for exhibiting and then manipulating argumentation structure. How did he get this idea? Apparently he was at that time unaware of work done by other people using diagrams to represent the structure of evidence or reasoning.1 Innovations often result from the insight that an idea which works well in one domain might be carried over and applied in a similar way in a quite different domain. For example, a brilliant engineer of an earlier era, James Watt, developed his centrifugal governor for regulating the speed of a steam engine as an adaptation of similar devices already in use for automatic control in windmills (van Gelder 1995). Presumably Engelbart, as an engineer, was aware of ways in which box and arrow diagrams could be used to assist thinking about complex structures; for example, the use of flowcharts. When he asked how reasoning might be represented, these models were already at hand. Independent work in the diagrammatic representation of complex argumentation goes back decades before Engelbart’s proposal. In the early twentieth century, Charles Wigmore had developed a sophisticated graphical system for analysing and representing the structure of evidence in legal proceedings (Wigmore 1913). In this system of nodes and links, a node might represent a piece of evidence such as J. was the author of the letter, though it was in a fictitious name, and a line between that node and others represented its role in a complex structure of argument aimed ultimately at establishing the guilt of the defendant. To my knowledge Wigmore’s legal maps are the earliest occurrence, anywhere in the world, of argument maps, i.e., diagrammatic representations of the structure of natural language (“informal”) argumentation. If that is right,2 it is truly remarkable, for the basic idea in argument mapping is very simple. Arguments have structure, often quite complex, and everyone knows that complex structure is generally more easily understood and conveyed in visual or diagrammatic form. That is why, for example, we have street maps rather than verbal descriptions of the layout of cities. This simple principle has been applied in any number of domains far beyond the mapping of spatial layout, yet it was apparently not until the twentieth century that the inferential structure of real-world argumentation was graphically depicted. There had of course been many different kinds of diagrams used in various ways in logic and reasoning (Gardner 1983), with Venn diagrams for syllogistic reasoning perhaps the most well-known example. None of these, however, depicted the inferential relationships between whole propositions of ordinary argumentation, which is the essence of argument mapping; they were mostly concerned with elucidating logical structure of a more finegrained nature. Wigmore started something of a tradition within the study of legal reasoning (Anderson and Twining 1998), though it was at most a minor
Enhancing and augmenting human reasoning 165 tributary of legal theory, and almost completely irrelevant to the torrent of actual legal practice. Argument mapping also emerged in various other quarters of the academy, most notably in philosophy with Stephen Toulmin’s influential book The Uses of Argument (Toulmin 1958). Interested in argument but dissatisfied with the tools offered by the logical tradition, Toulmin developed a simple diagrammatic template intended to help clarify the nature of everyday reasoning. This template is now widely used for displaying the structure of informal reasoning, especially common in fields such as rhetoric, communication, and debating. Subsequently, argument structure diagrams became commonplace in textbooks and classrooms in the area of instruction in critical thinking or informal logic (e.g. Govier 1988). These argument structure diagrams are usually very simple, and used mainly for pedagogical reasons; they are like training wheels, intended to be thrown away after certain basic principles had been understood and skills (supposedly) acquired. They are not used to help sophisticated reasoners think more effectively about complex real-world issues. The idea that argument mapping might be a tool supporting real thinking, rather than an instrument of academic analysis or an educational stepping-stone, first arose with the work of Robert Horn and his co-workers in the 1990s. His most well-known achievement is Can Computers Think? (Horn 1999), a series of 7 poster-sized charts of the main lines of argument in some 50 years of academic debate in the philosophy of artificial intelligence – a debate whose starting point was (to close the loop) none other than Alan Turing’s claim that computers are capable of thinking and that there will eventually be thinking computers. This massive map is like a largescale map of a country’s road system, in which positions correspond to cities or towns, highways to main lines of argument, and secondary roads to detailed debates. Studying the map, the reader can rapidly and easily identify the main contours of the debate, follow out some particular thread, and rapidly switch between studying the fine detail and seeing how that detail fits into the larger structure or argument (Holmes 1999). The maps can help readers overcome one of the greatest challenges involved in research, the process of assimilating the existing literature to the point where one understands how the main arguments relate and where a useful contribution can be made. This process can take many years, and often is never really completed; many researchers have only a foggy sense of the issues and arguments in the field outside their own topic of immediate interest.
Computer supported argument mapping The Can Computers Think? series of maps was a kind of argument mapping megaproject. It involved a team of people working for a number of years researching and then diagramming a very complex and often quite abstruse debate. The maps needed careful design and were eventually sent to a
166
Tim van Gelder
specialist printing service and were then distributed much like any other book. The resulting series of maps have many of the virtues but also many of the problems and limitations of a large printed map of the world. • •
• •
They are expensive to produce, requiring time and specialist expertise for background research, map design, printing, and distribution. By their very nature, they cover only one topic. The Can Computers Think? maps are a great help if your interest is in whether computers can think, but are not much use at all if it is in whether animals can think or whether President John F. Kennedy was killed by a conspiracy involving the CIA. The maps are essentially static objects; you can study them, but not interact with them. Yet interaction is important for understanding and learning. They cannot be modified by users. They go out of date, as new research extends the terrain to be mapped and suggests better ways to construe the existing terrain.
For argument mapping to become an everyday form of thinking support, these kinds of challenges would have to be overcome; Horn-style megamaps are valuable but far from the whole story. This is where the diagrammatic methods of argument mapping connect with Engelbart’s vision of the computer as a tool for intelligence augmentation. Instead of providing static, prepackaged argument maps, why not provide tools for people to cheaply and easily create their own maps on whatever topic they choose? By the 1990s such tools had become a serious possibility, because the technological infrastructure was already in place. Most “knowledge workers” had or could easily afford to obtain personal computers with monitors, graphics capability, email, and colour printers. Such systems could be used to create argument maps; indeed they were used by Horn and his team. The only missing element was software which would transform the personal computer into a special-purpose argument mapping tool (rather than a generic tool which, with a lot of time and effort, could be used to create argument maps). There are now a number of such software packages available. Probably the best examples are Reason!Able, Araucaria, and Athena. All were developed in educational or academic contexts, and all are essentially similar in being based around a workspace or canvas upon which “box and arrow” graphs of reasoning structures can be easily drawn and re-drawn.
Enhancing and augmenting human reasoning The core contention of this chapter is that with the emergence of these new software packages, we are finally starting to see Engelbart’s 1962 vision of intelligence augmentation by means of computer-based support for human
Enhancing and augmenting human reasoning 167 reasoning being realized. Computer-supported argument mapping is actually starting to help people think more effectively. This works in two ways, which I call enhancement and augmentation. First, computer supported argument mapping (CSAM) can enhance human reasoning by helping strengthen peoples’ intrinsic reasoning skills, i.e., the skills they deploy unconsciously when engaging in reasoning in everyday or professional contexts. Second, CSAM can augment human reasoning by being used “on the job” to help people perform more effectively. In augmentation, CSAM tools are used to extend our intrinsic or unaided capacities, in much the way polevaulters can reach much greater heights than high-jumpers through skillful use of a pole. They are not learning aids, but thinking equipment; indeed, in a certain sense, they become part of the mind itself.
Enhancing human reasoning: CSAM in education Almost everyone, it seems, accepts that one major aim of education is to cultivate thinking skills generally, and in particular the skills of general informal reasoning and argumentation. Unfortunately, aims and outcomes are not always the same. There is incontrovertible evidence that many people emerge from secondary or even tertiary education with general reasoning and argument skills that are under-developed, sometimes woefully so. Perhaps the most substantial body of evidence on this topic is that collected by psychologist Deanna Kuhn and reported in her book The Skills of Argument. Her starting point was the bleak observation that: Seldom has there been such widespread agreement about a significant social issue as there is reflected in the view that education is failing in its most central mission – to teach students to think. (Kuhn 1991: 5) To get a better fix on the problem, she conducted intensive structured interviews with 160 people drawn from a wide range of age groups, occupations, and education levels. She found that while most people are quite able and quite ready to form and express opinions on complex and controversial matters, more than half cannot reliably exhibit basic skills of reasoning and argument in relation to those opinions (and so are not rationally entitled to them). For example, most people will readily hold an opinion as to why some youths stay away from school, but over half cannot provide any genuine evidence at all for their position (let alone good evidence). When asked something like “What evidence can you provide that your account of why some youths stay away from school is the correct one?” everyone will say something, but in a majority of cases what they say does not constitute evidence; it might be a restatement of the position, a digression, or an illustration, but not information which would properly induce a rational person to have more confidence in the account.
168
Tim van Gelder
Anyone with experience teaching undergraduates will recognize the problem Kuhn was diagnosing. A great many students enter university with only feeble understanding, and no mastery, of general reasoning and argument skills. Far too many of those students exit the other end in a similar state of cognitive impairment. Small wonder: overwhelmingly, students get virtually no explicit instruction in these principles and procedures. Their institutions and instructors seem to assume that students either already have the skills, or that they will pick them up by osmosis, imitation and practice, with feedback given haphazardly and in tiny slivers. This works about as well as expecting dog-paddlers to become water polo players without ever showing them how to swim properly. The most widespread and deliberate approach to confronting this problem is to provide direct instruction in the form of a one-semester undergraduate subject. These subjects, known by names such as Critical Thinking, Informal Logic, and Introduction to Reasoning, are usually provided by philosophy departments as first-year electives. They aspire to help students to reason better, and are often advertised as having this effect. However it is far from clear that they succeed. On the one hand, there is little serious positive evidence of substantial gains in reasoning skills among students taking such subjects. Students’ performance on the final test, on its own, cannot be taken as such evidence, since it incorporates whatever competence they brought with them to the subject at the start. To know how much students improved, you’d have to measure that initial level and subtract it from their final performance – something that is rarely done. On the other hand, there is a worrying amount of negative evidence. The preponderance of studies which have attempted in some reasonably rigorous way to quantify the gains attributable to taking such subjects have found that they provide little or no benefit. For example, at the University of Melbourne we used the Watson–Glaser Critical Thinking Appraisal to pre- and post-test students in a conventionally-taught one-semester first-year critical thinking subject. Students performed at essentially the same level in both tests, which was a very disappointing result, considering that they should have improved a modest amount due simply to maturation and being at university. More generally, reviews of studies of attempts to improve critical thinking have tended to pessimism; for example McMillan (1987) reviewed 27 studies and concluded that “the results failed to support the use of specific instructional or course conditions to enhance critical thinking.” Pascarella (1989) found no statistically significant effect of taking logic courses on growth in critical thinking in first-year university students. In fairness, it should be noted that some studies of individual subjects have found gains, and some reviewers have been broadly optimistic (e.g., Halpern 2002). The truth is that empirical studies of growth in reasoning and argument skills at university are the proverbial dog’s breakfast, and while it seems clear to me that the overall trend is against any benefit, any reviewer can pick and
Enhancing and augmenting human reasoning 169 choose evidence to support their pre-given theoretical perspective (a very uncritical exercise!).3 Of course, most instructors believe (one hopes!) that their offerings are making some worthwhile difference to skills, and their beliefs are grounded in their direct, informal observations: they can see their classes improving under their careful guidance! Further, every instructor can provide plenty of anecdotal evidence; it seems you must be doing something right when you meet a student years later who sincerely attests that taking your class has helped them greatly ever since. But informal observation and anecdotal evidence can also point the other way. My own interest in the topic grew out of my experience teaching critical thinking for four years, and giving up in despair. It seemed my students could not have improved much, since they were so bad at the end it was scarcely credible that they were much worse at the start. Similar gloomy assessments have been voiced by veterans in the field. After 30 years trying to teach introductory critical thinking, Doug Walton said: I wish I could say that I had a method or technique that has proved successful. But I do not, and from what I can see, especially by looking at the abundance of textbooks on critical thinking, I don’t think anyone else has solved this problem either. (Walton 2000) Further, informal observation and anecdotal evidence are notoriously untrustworthy. Bloodletting survived as a medical practice for thousands of years based in large part on practitioners’ direct observation of success with the technique – that is, patients who improved after bloodletting. Of course many patients sickened or died, but this could easily be explained away; after all, the patients were only being treated with bloodletting because they were seriously ill, and even a good technique cannot be expected to perform miracles! More generally, informal observation and anecdotal evidence are the foundations of every form of quackery and pseudoscience, from homeopathy and phrenology to Freudian psychotherapy and the selection and remuneration of executives. A form of evidence that has suckered legions of investigators in other fields should not be endorsed by instructors of critical thinking, no matter how sincere and well-meaning they are. If it is true that conventional instructional techniques are largely or typically ineffective, what might be done to improve the situation? The discussion above, of argument mapping, suggests an obvious possibility. The whole point of argument mapping is to display the structure of reasoning and argument more clearly and explicitly, so that the reader can follow the reasoning more easily and thus think more effectively about the issues. Perhaps, if argument mapping were to form the basis for instruction in reasoning, students would find logical structure easier to grasp and logical procedures easier to master.
170 Tim van Gelder This idea is hardly original. For decades, many teachers of informal logic have believed that studying and producing diagrams of arguments can help students understand the structure of reasoning, and thereby help improve their reasoning and argument skills. Indeed, at least as far back as Michael Scriven’s classic Reasoning (1976), it is quite standard for introductory textbooks to have a section on argument diagramming. However, these diagramming activities are generally limited to simple argument structures with at most a handful of nodes, and play only a minor role in the overall pedagogical approach.4 Why is this? Here are some conjectures: •
• •
Instructors and textbook authors assume that very simple diagrams are sufficient to give students the general idea, and that there is no additional benefit in continuing to use diagrams for more complex structures. Instructors and students have not had good tools for producing, manipulating, and distributing argument diagrams, and so diagramming has been slow and inconvenient. Diagrams of the sort normally used have at least one severe usability problem. In these diagrams, a claim is usually represented by a label such as a number, often placed in a circle (“box”). Somewhere outside the diagram, the correspondence between labels and claims is set up. The person reading the diagram must hold these correspondences in their mind; they must, for example, remember that claim 3 is Smith asserted that Jones was not in the house at the time of the robbery, while claim 4 is Jones answered the phone at the house soon after the robbery. But our minds are severely limited in their capacity to maintain such internal databases. The cognitive burden becomes increasingly unmanageable as the number of nodes increases beyond a small handful. Beyond this point, the diagram either becomes opaque, or the reader must continually refer elsewhere to find out what claim a particular number refers to. Given this mental labor, argument structure diagrams of this sort rapidly become useless as their size increases.
The latter two considerations are practical problems, and can now be largely overcome using the new argument mapping software packages; and the first consideration is only an assumption, maintained largely because in practice it could not be tested. So, can instruction based on argument mapping be applied even to quite complex structures and substantially boost reasoning skills? The Reason! Project at the University of Melbourne has been investigating this question. We have produced a novel design for a one-semester subject, in which students undergo a structured training regime consisting of many exercises which almost always involve constructing and manipulating argument maps. These mapping activities are supported by a specially-
Enhancing and augmenting human reasoning 171 built software package, Reason!Able. The software makes it possible for students to: •
• •
•
•
Rapidly assemble diagrams by (a) using simple point and click operations to build an argument tree, and (b) typing the claims into the relevant boxes. The software “scaffolds” these construction activities in the sense that students can create nothing but argument structures; these structures are “syntactically” well-formed even if their content is confused or incoherent. Modify argument diagrams by deleting nodes, ripping nodes off the tree, drag-and-dropping nodes or even whole branches into different positions, etc. View argument diagrams in various ways. The software allows students to pan (scroll around), enlarge and reduce, zoom in on particular parts, simultaneously view a part and the whole, and to rotate the entire structure. These options allow students to rapidly reposition their viewpoint so as to maximize comprehension of the complex structure. Evaluate arguments, and represent their evaluations by superimposing on the argument structure a layer of colors standing for relevant logical qualities. For example, the likelihood that a particular premise is true is represented by shades of blue or gray; the strength of a whole reason5 is represented by a shade of green. In this way the argument diagram displays both structural and evaluative information in the one place; it presents relevant information with a density and immediacy just not possible in prose. Distribute argument diagrams in standard ways – e.g., copying diagram images into documents or presentations, emailing files, placing files on websites, and printing out as many copies as desired, perhaps in full color.
There are two main kinds of exercise in the Reason! approach. In a critical evaluation exercise, the student takes an argument as presented by somebody else, identifies its structure and then evaluates it. The student then compares the resulting diagram with a “model answer,” i.e., a diagram produced by the instructor; this provides a kind of feedback on the success of her attempt. In a production exercise, the student develops an argument of her own; this involves not only working out the structure of the argument, but evaluating that argument in order to determine that it is in fact solid. Does the Reason! method work? To answer this we have conducted a series of studies in which students are pre- and post-tested using an objective test, the California Critical Thinking Skills Test. In study after study, the average gain is around 4 points or about 20 percent of the pre-test score. In other words, students consistently perform about 20 percent better on this particular test at the end of the one-semester of training based on argument mapping.
172
Tim van Gelder
A 20 percent gain may not sound like much, but in generic cognitive skill acquisition, this is substantial. To show this, we have to first convert the results to the lingua franca of empirical studies of the impact or effectiveness of some intervention or technique, known as the effect size. The effect size is just the difference expressed as a proportion of the natural variation in the population, or what is technically referred to as the standard deviation. In the case of studies of critical thinking instruction, the effect size is the average gain divided by the standard deviation of their performance on the test.6 When we make this conversion, we find that students learning with the Reason! method consistently show gains with an effect size of around 0.8. How big is this? Here are some comparisons: • •
•
•
Cohen, the statistician who developed the concept of effect size, suggested that we might informally think of an effect size of 0.3 as small, 0.5 as medium, and 0.8 or above as large (Cohen 1988). Take a group of students enrolled in a typical first-year non-critical thinking subject (e.g., a political science subject). Most of these students would be getting no direct critical thinking subject. Averaging over all available studies, we find that they gain around 0.3SD over one semester. (This increase plateaus after the first year; in other words, after their first year, students improve their skills only very slowly.) Alarmingly, this is about the same as the average gain over many studies of students enrolled in a subject where there is some critical thinking instruction, which is itself about the same as the gain for students enrolled in a full, dedicated, one-semester critical thinking subject. In other words, standard critical thinking instruction adds no value to the overall experience of being an undergraduate. The Reason! approach is therefore much more beneficial than such instruction. Based on other studies, students normally gain around 0.8SD over an entire undergraduate education. In other words, the Reason! method can compress the expected benefit of university education, in terms of critical thinking skill development, into a 12-week period. In IQ measurement, a standard deviation is 15 points. Thus, if students were to increase their IQ by the same magnitude, they would be gaining one point per week for 12 weeks.
In short, students using the Reason! method do make strong gains. Importantly, these gains seem to hold up over time. When we re-test the students a year down the road, their skills are at the post-test level; they have not “regressed” back to their starting levels. Instruction based on argument mapping seems to have produced a stable, permanent elevation in their critical thinking ability. Many reports of a more anecdotal nature support the general view that computer-supported argument mapping really does lead to enhanced general critical thinking skills (Twardy, 2004).
Enhancing and augmenting human reasoning 173
Augmenting human reasoning: CSAM in professional practice When designing and building the Reason!Able software, we assumed that it was just an educational tool. Like training wheels on a bicycle, it would be useful in an early phase of learning, but would eventually become useless or even a hindrance, and so would be left behind. However, something quite unexpected happened when students used the software to practice their critical thinking skills on increasingly elaborate arguments. It became apparent that the maps produced in the software, and the interactions it supported, made the arguments much easier to understand. In other words, the software-supported argument mapping seemed to be extending the capacity of students to handle the sort of complexity found in real-world deliberation. This suggested that computer-supported argument mapping might be useful outside the educational context, whenever somebody tries to think their way through some complex set of arguments. In particular, it might be useful for people working in professions, government, or business who must regularly engage in complex reasoning activities. Used this way, CSAM would not be enhancing an individual’s basic reasoning skills. Rather, it would be augmenting their basic capacities, leveraging the power of their biological thinking apparatus. CSAM would not be an educational steppingstone, but rather an everyday tool or piece of equipment, extending thinking capacities just as construction equipment extends our building capacities. To understand how this might work, consider just one way in which CSAM can be used to augment intelligence – in this case, the collective intelligence of a group of people attempting to achieve rational resolution on some contentious issue through argumentation. It is quite common in organizations for a team or committee to get together to argue things out, i.e., to engage in a kind of collective deliberative process in which opinions and arguments are expressed, objections raised, and so forth. This often takes place in a meeting room, with group members arranged around a large table, perhaps with the most senior person at the head; somebody opens the debate and then it is an argumentative free-for-all; at the end of the meeting, if all goes well, there is some consensus over the core issues. An alternative to this standard practice is to use real-time CSAM as the framework within which to represent and conduct the disputation. In this approach, the participants still gather in a room, but attention is focused on a screen or wall, onto which is projected an image of an argument map representing the current state of the debate. The map evolves as participants make their argumentative “moves.” Collective deliberation mediated by real-time CSAM has a number of advantages over its traditional counterpart: •
Expanded Grasp. Individual participants are able to comprehend more of the relevant arguments. Instead of having to construct and maintain
174
•
•
•
•
Tim van Gelder elaborate representations “in the mind” – a very laborious and errorprone activity – participants can rely on the argument map as an up-todate representation of the state of the debate, and scan the map as necessary in order to maintain understanding of both the overall structure of the argument and any particular part of it. Indeed, the map could be said to have become their mental representation of the debate, a representation which merely happens to be located outside the head. Common Mental Representations. Normally, each participant can remember or hold in mind only a part of overall debate, and each participant retains a somewhat different part. This causes tremendous inefficiencies in collective deliberation. When the deliberation is supported by argument mapping, all participants are attending to the one argument map, and so they have a common mental representation of the arguments. At that point, their minds could be said to overlap (they are “of the same mind”). Targeted contributions. In ordinary face-to-face argumentation, contributions to the debate fly in all directions; often it is hard to see exactly where somebody’s brilliant argument actually fits, and what difference it makes. Using argument mapping, however, contributions can be required to be targeted at a specific place on the map, and to make some specific contribution at that place (e.g., providing evidence against a particular proposition). Depersonalized debate. An endemic problem in collective deliberation is that positions and arguments tend to be associated with particular people, introducing a range of emotional and social considerations which interfere with cool-headed rational evaluation and constructive engagement in the debate. CSAM defuses this problem by having participants attend to the map and the arguments represented upon it rather than the people making the arguments. It thus “de-personalizes” the debate, making the whole exercise more productive. Organizational memory. Real-time CSAM generates a large map representing all the arguments, a map which then forms part of the organization’s collective “memory” of the debate. If some rational consensus is reached, the grounds for that consensus can be accessed more easily by referring to the archived map than by using unaided recollection or other traces such as minutes, summaries or reports.
Given these apparent advantages, does real-time CSAM in fact improve the overall quality of collective deliberation? It is certainly plausible that it does, but to my knowledge, there is no hard evidence on this topic. It simply has not been investigated properly (though see various chapters in Kirschner et al. 2002). There is plenty of informal or anecdotal evidence that CSAM-supported deliberation is in fact superior, but such evidence should be taken as at most suggestive. Real-time CSAM supporting group deliberation is only one way in which
Enhancing and augmenting human reasoning 175 CSAM can be used in the workplace to augment human capacities. Others include: •
•
•
•
Case development. Individuals, teams, or organizations often produce positions (policy statements, recommendations, conclusions) which must be defended by elaborate, convincing argumentation, i.e., there must be a solid case for the position. CSAM can be used to develop the case with greater clarity and efficiency than is usually possible, leading to stronger arguments and better-structured documents presenting those arguments. Communicating Complex Arguments. Argument maps are dramatically more efficient than ordinary prose for communicating the structure of complex arguments. Thus, argument maps can be used to increase the efficiency of communication. For example, currently in law firms around the world, partners frequently commission junior lawyers to draft letters of advice for clients. These letters present arguments, often involving quite complex reasoning, in traditional prose format. The partner must identify and critique the reasoning before the advice is released to the client. This process is slow, difficult, and unreliable. An argument map presents the same reasoning but in an easily assimilable, unambiguous form, allowing much more rapid comprehension, evaluation, and feedback to the junior lawyer. The time in resources involved in producing the final written advice for the client can thus be substantially reduced. Critical Review. When an organization releases a particularly controversial report, it likes to be sure that it has rock-solid arguments to back up its position. Conversely, when critiquing a report whose message is unwelcome, it is useful to be able to identify the arguments so as to be able to target criticisms with greatest effect. In both cases, there is a process of critical review, in which the argument presented in prose is assessed for quality. Argument mapping is a way of exposing the arguments for critical scrutiny, so that the review process can have maximum impact. Design Rationale. When a team designs a complex artifact, such as a new car or piece of software, many design decisions must be made along the way. These decisions are usually made by a process of collective deliberation. In other words, the team members get together and consider the arguments for and against particular choices. Design rationale is the process of supporting design decisions with clear, strong and persuasive arguments, and recording those decisions for later review. CSAM techniques can support design rationale by providing simple, transparent displays of the reasoning behind a given decision (Buckingham Shum 1996).
Augmenting reasoning in professional contexts through the use of CSAM is quite new, and has been used in only a few organizations. This can be
176
Tim van Gelder
explained, in part, by the limitations of the available tools. The current CSAM technologies are just crude first steps in the direction of the sort of thinking support tools which will, eventually, transform organizational practices.
Embodied reason Assuming that CSAM does effectively enhance and augment human reasoning, why does it work? How does it help? What is it about CSAM which makes the difference? The answer, I think, is that CSAM makes reasoning more “embodied” than do traditional practices and technologies. To see this point, note first that evolution did not bequeath homo sapiens with special, dedicated apparatus for high-level thinking. Rather, we evolved cognitive machinery for other purposes, machinery which just happened to be recruitable for increasingly sophisticated thinking activities, including reasoning and argument. CSAM works because it taps into these available resources more directly than traditional, prose-based ways of engaging in reasoning. In other words, it exploits our embodiment more effectively than traditional modes of argumentative expression. What, after all, is our brain for? Its primary task, after maintaining basic bodily functions, is coordinating movements in relation to the opportunities and threats in our physical environments. The cognitive machinery evolution has provided is largely devoted to such coordination. We must be able to tell what is around us and rapidly respond to what we find. Hence we have exceedingly fast and powerful hard-wired capacities to distinguish visual features such as line, color, shape, and position in space, and to recognize the complex, shifting patterns these features can generate. We also have extensive circuits whose primary role is to control and direct the muscular activity whose aggregate expression is the movement of our limbs, the relocation of our whole bodies, and the manipulation of objects around us. These actions play out in sequences whose shape and timing unfold in ways whose order and purpose amount to intricate dances with our changing environments. Higher cognition is possible not because God, evolution or any other benign creator granted us immaterial minds or dedicated neurobiological apparatus. Rather, it is possible just insofar as we are able to redeploy the basic equipment whose primary role is the more mundane business of maintaining and moving our bodies in relation to our surrounds. We can reason not because we have a special ratiocination module, but because our sensorimotor control loops can be requisitioned and ordered to operate in fields of grammatically structured and evidentially interrelated objects. This is certainly a remarkable development, but it is evidence not of custom-built mental tools for logic but rather of small transitions which happened to nudge our brain’s existing equipment over a threshold and into a new range of capacities.
Enhancing and augmenting human reasoning 177 Computer-supported argument mapping is effective because it allows us to redeploy this equipment to the reasoning task more effectively. When identifying or comprehending argument structures, we can take advantage of our ability to recognize line, color, shape and position in space simply because argument maps, unlike prose, utilize line, color, shape and position in space to convey information about argument structure. When constructing or modifying argument structures, we can take advantage of our ability to move our limbs and objects around us simply because computersupported argument mapping supports virtual versions of these basic physical activities. In short, reasoning supported by CSAM is far more like playing with blocks in the kindergarten, or playing with rocks in the vegetable garden, or chasing prey on the savannah, than is reasoning supported by spoken or written prose. The abstraction and complexity of evidential structure is translated, as much as possible, into concrete forms mirroring the particular set of primitive capacities which constitute homo sapiens’ evolutionary endowment. Embodied interaction with argument structures gives us new powers to generate, comprehend and communicate complex argumentation. We become more “at home” with complex reasoning because we are already deeply “at home” with our bodies and our surrounds.
The future of CSAM CSAM is less than two decades old. Software which supports argument mapping in practice and not just in theory – software that is genuinely usable by people other than the developers – is less than a half-dozen years old. The field is still in its infancy. Inevitably, and rapidly, current technologies will be superseded in every dimension. It is worth speculating briefly about where this technological evolution will take us. An interesting glimpse of the future was provided in the recent movie Minority Report. In that movie, the “pre-cognitions” (visions of future events) of a special group of seers were displayed on a very high resolution, semi-circular, room-sized screen. Large quantities of image and text were manipulated by a character with no physical connection to the screen or computer other than “light gloves” on his hands. Using all the degrees of freedom afforded by two arms and hands, the character could direct the information flow by elaborate bodily performances whose closest analogy would be the motions of a conductor directing a full orchestra in the climax of a late romantic symphony. The information processing in Minority Report, referred to as “scrubbing the image,” was not argument mapping, though it did involve reaching conclusions on the basis of slivers and patterns of evidence. The only problem with the Minority Report scenario is that it purports to describe the state-of-the-art in information processing technology in the year 2054. This is a very timid vision. In fact, many of the components are already available in “off the shelf” versions. Setups of the kind imagined in the movie will be with us, even common, within a decade.
178
Tim van Gelder
By 2054 – perhaps well before – high-resolution “displays” will have been miniaturized and implanted inside our skulls, feeding their signals not via light into the retina but via direct electrical contact into our neural circuitry. Our mental control of these displays will not be mediated by motions of the arms and cumbersome technologies such as the mouse; rather, it will be direct “thought” control as the display reacts to the cloud of electromagnetic signals generated by our thinking processes. Artificial intelligence (assuming this is available in only twice the period Turing originally envisaged) will automatically and invisibly aid us in generating and manipulating the displayed information. And of all this, we will be completely unconscious, just as when thinking through a philosophical problem, we are completely unconscious of the details of the operation of our brains. The CSAM technology will have become a permanent prosthetic extension to our biological capacities, a seamless “mindware-upgrade” (Clark 2003) no more remarkable than, today, a pair of spectacles is a vision upgrade. CSAM will be just one among many respects in which computer technologies will invisibly augment human cognition by providing resources and capacities which complement the strengths and weaknesses of our biological equipment.7
Argument mapping, rationality, and human nature In his weighty tome Making It Explicit, Robert Brandom selects rationality as the most profound principle of demarcation between “us” and other things: We . . . [are] the ones who say “we” . . . Saying “we” in this sense is placing ourselves and each other in the space of reasons, by giving and asking for reasons for our attitudes and performances . . . [it is] to identify ourselves as rational – as the ones who live and move and have our being in the space of reasons . . . (Brandom 1994: 4–5) We are the ones who, as he puts it elsewhere, play the game of giving and asking for reasons. It is the playing of this game which makes us what we are; and since we are the ones who developed and maintain the game – designers, players, and referees – we are self-constituting beings: this expressive account of language, mind, and logic . . . is an account of the sort of thing that constitutes itself as an expressive being – as a creature who makes explicit, and who makes itself explicit. We are sapients: rational, expressive – that is, discursive – beings. But we are more than rational expressive beings. We are also logical, self-expressive beings. We not only make it explicit, we make ourselves explicit at making it explicit. (Brandom 1994: 650)
Enhancing and augmenting human reasoning 179 It almost goes without saying that this “having our being in the space of reasons” is physically mediated. Playing the game requires us to do things, and these doings are physical processes. For example, to be a rational, discursive agent, you must be able to say things – or more generally, to make assertions. We humans most commonly, and primordially, make assertions through bodily motions such as exhalations and flexing of vocal cords. Without some such physical activity, it would be impossible to participate in any way. Since playing the game is physically mediated, the nature of the mediation can be varied and possibly improved. We can introduce new tools or technologies which alter and perhaps extend our capacity for rational engagement. Perhaps the most obvious and momentous example is the development of writing. Writing allowed us to produce stable representations of moves in the game of giving and asking for reasons. At any given time writing is supported by a particular set of technologies (e.g., quills and parchment), and over time these technologies have improved (ball-point pens, now word-processors and the world wide web). The deployment of ever-more sophisticated supporting technologies allows us to play the game more effectively, to make moves that are more informed, more responsive, more nuanced, and more widely communicated. We can, quite simply, perform at a higher level than was possible when our physical means were more limited. Of course, this outcome applies to us collectively; it does not imply that any one of us is a more powerful player than was, say, Socrates or Aquinas. As a group, however, our rational self-expression is vastly richer than that of more primitive societies, just as contemporary Australian Rules Football is a vastly more elaborate and skillful exercise than the anarchic scrimmages from which it emerged. CSAM is a late development in the history of humanity’s attempts to improve the game of giving and asking for reasons by engineering new ways to play it more effectively. In some ways it is an incremental advance, bundling recent technologies and established methods into a new package. Yet incremental advances – such as the adaptation of the wine press into a printing press – can sometimes induce dramatic transformations the manner and level at which intellectual performances are conducted. It is too early to be sure, but this may turn out to be the case with computer-supported argument mapping. For the first time, it seems, we can effectively escape the tyranny of prose as the obligatory medium of argumentative expression. Graphical displays of argument structure are patently better suited to our biologically-given cognitive architectures than prose presentations; properly designed software and hardware are giving us the ability to produce, manipulate and distribute these displays as easily, if not more so, than we can craft argumentative prose. In other words, CSAM is emerging as by far the most effective means of making explicit our reasonings and deliberations. And if, as Brandom argues in his unique and almost impenetrable style, in making explicit we make
180
Tim van Gelder
ourselves explicit, CSAM will be a medium through which humans can selfrealize more profoundly than ever before. Since playing the game of giving and asking for reasons is what makes us what we are, playing the game better – or playing a more sophisticated game – is a transformation, or at least evolution, of our nature as sapient beings. Thus Engelbart’s vision of computers augmenting human intelligence is, properly understood, a vision of human self-transformation through a bootstrapping process in which our current, technologically augmented intellectual capacities enable us to refashion the spaces and practices within which we ontologically selfconstitute. Moreover, his crucial insight was that computer technology will be more profoundly and intimately connected with the process of selfconstitution through enhanced rational self-expression than any previous technological forms.
Notes 1 D. Engelbart personal communication, 13 Jan 2003. 2 I would be very surprised if there were not prior examples somewhere, but my own searches and queries have not yet turned up any. If anyone knows of argument maps prior to Wigmore, I would appreciate hearing about it. 3 The problem here is the informal literature review – a process whose loose constraints allow reviewers, when presented with the sort of heterogeneous body of literature typically found in the social sciences, to select, massage and present evidence so as to support their favored positions. One way to overcome this problem is to engage in meta-analysis. A proper meta-analysis of studies of critical thinking growth is sorely needed. 4 In this tradition, Alec Fisher’s textbook The Logic of Real Arguments (Fisher 1988) lies at one extreme; it takes on the challenge of handling “sustained theoretical arguments,” and deploys structure diagrams which sometimes have a dozen or more nodes. 5 In the Reason! theory of argument structure, a reason is a complex object which always has multiple premises; these are what informal logicians usually refer to as linked premises. 6 When comparing different studies, it is more correct to use the “population” standard deviation for the test (the one found in the manual produced by the developers of the test) rather than the standard deviation of the particular group of students in the study. 7 Anyone who doubts the general plausibility of the speculations should stay tuned to The Harrow Technology Report, http://www.TheHarrowGroup.com. My conjectures will surely be wrong in detail, but we can be equally sure that something just as dramatic will be true.
Bibliography Anderson, T. and Twining, W. (1998) Analysis of Evidence: How to Do Things with Facts Based on Wigmore’s Science of Judicial Proof (revised edn), Evanston/Chicago, IL: Northwestern University Press. Bardini, T. (2000) Bootstrapping: Douglas Engelbart, Coevolution, and the Origins of Personal Computing, Stanford, CA: Stanford University Press.
Enhancing and augmenting human reasoning 181 Brandom, R. (1994) Making It Explicit: Reasoning, Representing, and Discursive Commitment, Cambridge, MA: Harvard University Press. Buckingham Shum, S. (1996) “Design argumentation as design rationale,” in A. Kent and J.G. Williams (eds), The Encyclopedia of Computer Science and Technology, vol. 35, New York: Marcel Dekker, 95–128. Clark, A. (2003) Natural Born Cyborgs: Why Minds and Technologies are Made to Merge, Oxford: Oxford University Press. Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences, Hillsdale, NJ: Lawrence Erlbaum Associates. Engelbart, D. (1962) Augmenting Human Intellect: A Conceptual Framework, Menlo Park, CA: Stanford Research Institute. Fisher, A. (1988) The Logic of Real Arguments, Cambridge: Cambridge University Press. Gardner, M. (1983) Logic Machines and Diagrams (2nd edn), Chicago, IL: University of Chicago Press. Govier, T. (1988) A Practical Study of Argument (2nd edn), Belmont, CA: Wadsworth. Halpern, D.F. (2002) Thought and Knowledge: An Introduction to Critical Thinking (4th edn), Hillsdale, NJ: Lawrence Erlbaum Associates. Holmes, R. (1999) “Beyond Words,” New Scientist, July 10, 32–37. Horn, R.E. (1999) Can Computers Think? MacroVU, Inc. Kirschner, P.J., Buckingham Shum, S.J., and Carr, C.S. (eds) (2002) Visualizing Argumentation: Software Tools for Collaborative and Educational Sense-Making, London: Springer-Verlag. Kuhn, D. (1991) The Skills of Argument, Cambridge: Cambridge University Press. McMillan, J. (1987) “Enhancing college student’s critical thinking: a review of studies,” Research in Higher Education, 26: 3–29. Pascarella, E. (1989) “The development of critical thinking: does college make a difference?,” Journal of College Student Development, 30: 19–26. Scriven, M. (1976) Reasoning, New York: McGraw-Hill. Toulmin, S. (1958) The Uses of Argument, Cambridge: Cambridge University Press. Turing, A. (1950) “Computing machinery and intelligence,” Mind, 59: 433–460. Twardy, C. (2004) “Argument maps improve critical thinking,” Teaching Philosophy, 27(2): 95–116. van Gelder, T.J. (1995) “What might cognition be, if not computation?,” Journal of Philosophy, 91: 345–381. Walton, D. (2000) “Problems and useful techniques: my experiences in teaching courses in argumentation, informal logic and critical thinking,” Informal Logic, 20, (Teaching Supplement #2): 35–38. Wigmore, J.H. (1913) The Principles of Judicial Proof: As Given by Logic, Psychology, and General Experience, and Illustrated in Judicial Trials, Boston: Little, Brown.
Index
Adams, Henry 110 adaptive thinking 2 Adolphs, Ralph 149 “Alice” program 109 amygdala damage 149–50 ancestors, character states of 29–32, 35 anger, feelings of 150, 152 anti-handshake strategies 79 Arbuthnot, John 2, 18–19, 22, 36 argument from design see design argument argument mapping 164–5, 169; see also computer-supported argument mapping Arnold, Matthew 110 artificial creatures 9–10, 118–19, 126–7 artificial intelligence (AI) 9–10, 109, 113, 119–21, 129–30, 162, 178 artificial selection 9–10 Aumann, Robert 64, 66–8 “babbling equilibria” 61 Baldwin, James Mark 3–4, 40 Baldwin effect 3–5, 40–57; double form of 55–6; as genetic assimilation 48–50, 55 bargaining games 5, 62; cheap talk in 69–79 basins of attraction 68–9, 79 Bateson, Patrick 51 Bayesianism 90 Beer, R.D. 128 birth records, use of 18 “body community” 10–11, 136, 144 Boysen, Sarah 94–9 brain activity, limits to 141
brain damage 12 brain structure 103, 107–8 Brandom, Robert 178–80 “breathing spaces” 4, 47 Brooks, R. 120, 130 “Busy Beaver” program 110 Calder, A.J. 150 California Critical Thinking Skills Test 171 canalization 50 Cartesian cut 120–1, 130 Central Dogma of genetics 103–4 chance: compared with intelligent design 20–1; compared with natural selection 35; likelihood value of producing complex adaptive design 2 chimpanzees 94–9, 159 Churchland, P.M. 88, 91 Churchland, P.S. 88 co-evolutionary relationships 85–6, 93 Cohen, J. 172 common knowledge shared by population members 5 comparative likelihood 3, 17, 36 complex adaptations 17, 32–4 complex arguments, communication of 175, 177 complex behavioural traits 41 complex situations 162–3 complexity, concept of 3 component place optimization 8 compression of data 107–8 computer programs 122, 124 computer-supported argument mapping
Index 183 (CSAM) 13–14, 165–7, 170–80; future of 177–8 computers, general use of 163 connectionism 123–4 consciousness, human 7 construal of models 89–90 context for information and behaviour 141–3 Copernican revolution 129 Coupe, P. 137 creationism 34, 36 critical review process 175 critical thinking 14, 168–73 CTRNNs (continuous time recurrent neural networks) 125–6 cuckoos 53 Damásio, Antonio 149 Darwin, Charles (and Darwinian evolution) 19, 115–18, 126–9 degrees of belief 90 Dennett, Daniel 6–7, 87–8 depersonalization of debate 174 Descartes, René 120 design argument 17–19; problems with 20–2, 35–6 design rationale process 175 Dewey, John 99 dichotomous characters 31–2 direct resonance model 157–8 disgust, feelings of 150–2, 156–8 division of labour 117 DNA 8, 10, 103–4, 107, 127 dopamine 150 Dretske, Fred 87 drift see pure drift; selection plus drift Duhem, P. 36 dynamical systems approach to cognition 114, 125–6 effect size concept 172 Einstein, Albert 129 eliminativism 88 “Eliza” program 109 emotion contagion 13, 156–9 emotions, experience of 12–13; see also face-based emotion recognition Engelbart, Douglas 162–6, 180 environmental variation 43–7, 50
errors in spatial memory and judgment 10–11, 137–45; persistence of 142–3; reasons for 140–2 evolutionarily stable states 62–70 evolutionary biology 85 evolutionary games 61–6 evolutionary psychology 153 exapation 155 “executive function” 97–9 extinction of animal species 4 eye design 21–2, 32–4 eye-of-the-interpreter issues 85, 88, 99 face-based emotion recognition (FaBER) 12–13, 149–59; evolution of 152–3 facial feedback model 157–8 fear, experience of 149–51 Fisher, R.A. 19 fitness functions 28, 31–3, 127 “flat norms of reaction” 44, 50 flight in birds, evolution of 155 Floreano, D. 128 Fodor, Jerry 6, 87, 91 folk physics 86 folk psychology 85–93, 96–9, 151; as a model of thought 6–7, 88–92; as social and linguistic practice 7; see also “mindreading” folk wisdom 61 frequency-dependent selection 53 fruit flies 50–2 functionalism 8, 109 Galef, Bennett 154 Gallese, V. 158 game theory 5, 61–2, 79 Gati, I. 139 genetic assimilation 4, 42, 47–50, 54–7 genetic code 8 genetic control 43–4, 56–7 genetic fixity 4, 43–4, 47–9, 53 genetic mutation 4 genotypes 127 Gigerenzer, G. 13, 99, 154 goal-directed behaviour 126 goal emulation 46 Gopnik, Alison 96 Halpern, D.F. 168
184
Index
Hamilton, W.D. 19 herring-gulls 54 hierarchical organization 137–8 Hinton, G. 48 Horn, Robert 165–6 human nature 14 humans, distinctiveness of 117 inbreeding and incest avoidance 31–2 inheritance of acquired characteristics 3 innate traits 40–5 Innateness Hypothesis 103 intelligence augmentation (IA) 162, 166–7, 173, 180 intelligent design 17–21, 27, 34, 36; compared with chance 20–1 internalization of phenomena 98–9 “interpretation facts” 87–8 “interpretation stance” view of psychology 6 know-how and knowing-that 130 Kreps, David 66–7 Kuhn, Deanna 167–8 Kullbach-Leibler information 73 lactose tolerance 53 Lamarck, J.B. de 3, 40 Land, M. 33 Lande, R. 24 Lang, B. 97–8 language: and social relations 98; and thought 95–7, 117; use of 9, 119–20 Law of Likelihood 19, 30 Lawrence, Andrew 150 learning 40–57, 142 Lewis, David 5, 61–3, 73 likelihood analysis 2–3, 12, 17–19, 25, 34–6 Lindley, D. 74 Lycan, William 87 McMillan, J. 168 Marxen, Heiner 110 maximum likelihood, hypo-thesis of 30 Mayr, E. 33 means to ends 46 memory see organizational memory; working memory
mental representation, evolution of 85–8 mentalese (language) 119–20, 125, 128 metaphor, use of 120 mimicry 46 mind see philosophy of mind; theory of mind “mind community” 10–11, 136, 144 “mindreading” 11–13, 91, 148–56, 159 Minority Report (film) 177 modelling 10, 89 modus tollens 18–19, 36 Monte Carlo simulation 68–74 morphogenesis 103–4 Nash equilibria 62–5, 71 nativism 6, 8, 103–4, 107–8 natural selection 3–5, 10, 17–19, 22, 27, 31–6, 40–50, 57, 86, 153, 155; and cognitive mechanisms 7; compared with chance 35 neural networks 48, 125–6 neural pathways 45 neuroanatomy 106–7 neuroimaging 12–13 neuron arbor morphogenesis 104 neuropsychology 149 niche construction 4, 42, 47, 52–4, 57; social learning as 54–5 Nilsson, D. 33 Norway rats 154 Nowlan, S. 48 ontogenetics 6, 40 optimality, concept of 3 optimization in nervous systems 8 organizational memory 174 Orzack, S. 35 paired deficits 149–52 Paley, William 2, 20–2, 36 Parr, Linda 159 parsimony, principle of 29–33 Pascarella, E. 168 path integration 143–4 Pearson, Karl 153–4 Perner, J. 97–9 phenomenal experience, primacy of 130
Index 185 phenotypes 22–7, 40–53, 127 Phillips, Emo 110 philosophy, aim of 119 philosophy of mind 8–10, 85, 104–7, 113, 118–20 phylogenetic trees 28–30 Piaget, Jean 107 polar bears’ fur 22–9, 32–5 polymorphism 68–74, 79 posterior probabilities 19 Potter, Beatrix 119–20 pre-play signalling 61–3, 79–80 “prion” diseases 103 prior learning 49, 52–5 prior probabilities 19 Prisoner’s Dilemma 63–4 propositional attitude ascription 12 pure drift (PD) hypothesis 22–35 Pythagoras 107 quantum physics 129 random genetic drift 2–3, 22 rational choice theory 5–6, 61–2, 79–80, 94 rationality 7–8, 115, 118, 144–5, 178; different senses of the word 118; ecological 153–4 Reason!Able software 166, 170–3 reason, embodiment of 176 reasoning skills, improvement of 13–14, 167–70, 177 representation, concept of 122–5 robotics, evolutionary 9, 114, 120–1, 125–30 Robson, A.J. 63 Rosch, E. 138–9 route maps 144 saliency 5 Salvini-Plaven, L. 33 schematization of information 140–2 scientific advances 129–30 Scriven, Michael 170 selection plus drift (SPD) hypothesis 22–35 semantic properties 6–7, 85 sender-receiver games 62–3 sensitivity analysis 35
sensory-motor abilities 14 sex ratio 18–19, 22 shared manifold hypothesis 158 signalling: costless 5–6, 61–2, 79; unequal interests in 5–6 simple coordination thesis 87–8 simulations and simulation theory (ST) 6, 12–13, 97, 148–59 social context 86 social learning 4, 41–8, 52, 54, 154; as niche construction 54–5; significance of 55–7 social reality 7–8 sociobiology 31 space cognition 10–11, 135–6 species-chauvinism 115, 118 Spelke, Elizabeth 96 Sprengelmeyer, R. 150 Stag Hunt games 5, 64–9, 74, 79 statistical models 153–4 Sterelny, Kim 87 Stevens, A. 137 stimulus enhancement 45–6 subsumption architecture 130 Tensarama 105, 108–9 theory of mind 97–8, 109; see also “mindreading” “theory-theory” (TT) concept 6–8, 12–13, 89–92, 151–3 thinking skills 167, 171, 173; see also reasoning skills Thompson, D’Arcy 107, 109 tip species 29, 32 Todd, P. 13, 99, 154 tools, use of 40–5, 48 Toulmin, Stephen 165 trait-individuation problem 85 trait values 23–4, 32; independent evidence on 28–31; trends in 35–6 transient information on evolutionary processes 79 Turing, Alan 162–3, 165, 178 Turing machines 107–8, 121–2 von Neumann architecture 109 Waddington, C.H. 48, 50–2 Walton, Doug 169
186
Index
watchmaking analogy 20–1, 36, 116 Watkins, John 42, 47, 49 Watt, James 164 Weismann, Augustus 40 Whorf, Benjamin Lee 95 Wicker, B. 150–2, 156–8 Wigmore, Charles 164
wiring-and-connection facts 86–8, 92 Wittgenstein, Ludwig 9, 119 woodpecker finches 40–5 working memory in the brain 141 writing, development of 14, 179 Zahavi, A. 5