Cognition and Technology
Cognition and Technology Co-existence, convergence and co-evolution
Edited by
Barbara Gora...
65 downloads
1805 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Cognition and Technology
Cognition and Technology Co-existence, convergence and co-evolution
Edited by
Barbara Gorayska University of Cambridge
Jacob L. Mey University of Southern Denmark
John Benjamins Publishing Company Amsterdam/Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data Cognition and Technology : Co-existence, convergence and co-evolution / edited by Barbara Gorayska and Jacob L. Mey. p. cm. Includes bibliographical references and indexes. 1. Human-machine systems. 2. Human-computer interaction. 3. Cognition. I. Gorayska, Barbara. II. Mey, Jacob. TA167.C62 2004 004’.01’9--dc22 isbn 90 272 3224 5 (Eur.) / 1 58811 544 5 (US) (Hb; alk. paper)
2004050172
© 2004 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Table of contents
Introduction: Pragmatics of Technology Barbara Gorayska and Jacob L. Mey Part I Theoretical issues
1
23
Towards a science of the bio-technological mind Andy Clark
25
Language as a cognitive technology Marcelo Dascal
37
Relevance, goal management and cognitive technology Roger Lindsay and Barbara Gorayska
63
Robots as cognitive tools Rolf Pfeifer The origins of narrative: In search of the transactional format of narratives in humans and other social animals Kerstin Dautenhahn The semantic web: Knowledge representation and affordance Sanjay Chandrasekharan Part II Applications
109
127
153
173
Cognition and body image Hanan Abdulwahab El Ashegh and Roger Lindsay
175
Looking under the rug: Context and context-aware artifacts Christopher Lueg
225
vi
Table of contents
Body Moves and tacit knowing Satinder P. Gill
241
Gaze aversion and the primacy of emotional dysfunction in autism Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
267
Communicating sequential activities: An investigation into the modelling of collaborative action for system design Marina Jirotka and Paul Luff Part III Coda “The end of the Dreyfus affair”: (Post)Heideggerian meditations on man, machine and meaning Syed Mustafa Ali
303
331
333
Martin Luther King and the “ghost in the machine” Will Fitzgerald
345
Name index
355
Subject index
365
Introduction Pragmatics of Technology Barbara Gorayska and Jacob L. Mey University of Cambridge / University of Southern Denmark
In presenting earlier Cognitive Technology (CT) volumes to the public (Gorayska and Mey, 1996; Marsh, Gorayska and Mey, 1999), we as the editors took up a mode of collaboration between ourselves and our contributors that we had found useful in collections of articles: papers are submitted, accepted, and revised — but always with this one purpose in mind: viz., to bring about a unified perspective incorporating the intentions of the contributors and the editors, while respecting the individual contributions’ diversity and independence. In the present case, we venture to present the ideas of the contributors (and some of our own ideas) in a new garb, designed to bring out the ‘convergence, co-existence and co-evolution’ that we think are characteristic for the field of Cognitive Technology. Studies in Cognitive Technology examine human condition in the technological world. They are, as Clark (2002, p. 21 and this volume) puts it, “… in a very real sense, the stud[ies] of ourselves”. Such studies are an absolute necessity for the world and its inhabitants, in order for us to survive the onslaught and temptations of revolutionary new technologies and their sometimes aggressive practitioners. To properly paint this picture of a threatening, but also promising future, where technology and the cognitive makeup of its users also converge, co-evolve and co-exist, we must trace our steps back into the murky past when the discipline was originally conceived. It is no secret that the field of CT, despite all convergence, has evolved in ways that are not always easy to combine. In effect, even the term ‘Cognitive Technology’ has been the subject of acrid disputes and ground-claiming in the past. But in this field, as in all other areas of scientific research, the important thing to do is to let things develop, ‘co-evolve’, on their own, so as to obtain the maximum ‘convergence’ in the midst of ‘coexisting’, sometimes clashing views.
2
Barbara Gorayska and Jacob L. Mey
The contributors to this volume come from widely diverging orientations, scientific as well as geographical. Despite this, the common concern of all is: How to make the most of technology without ‘losing our soul’, as one of the editors once put it (Gorayska 1994, p. 441). If this can be obtained by offering the present collection to the interested general and specialist public, we will have reached our aim, and provided a forum for even more co-evolution and convergence, while safeguarding the peaceful coexistence of the various strands making up the pattern of cognitive-technological studies in our days.
A flashback It all started one sunny day in 1993, in old Hong Kong, where the two editors (BG and JM) finally met, after having corresponded about various common matters of interests for a number of years. Over lunch (in which Inger Mey, Jonathon Marsh, N. V. Balasubramanian, Ho Mun Chan, Orville Lee Clubb, Laurence Goldstein, Kevin Cox and Brian Anderson also participated), the idea of CT as a new discipline, combining findings from computer science, philosophy, psychology and pragmatics was discussed. In Hong Kong, interest in CT as an emerging academic discipline arose when a small group of academics at the City University of Hong Kong began exploring the ways in which developments in information technology had implications for human cognition. They were originally inspired by two factors. The first was the Fabricated World Hypothesis proposed by Gorayska and Lindsay (in support of their investigations of how lay people understand and cognitively utilize the concept of relevance (Gorayska and Lindsay, 1993, first presented as Gorayska and Lindsay, 1989a)): “It is usual in psychology to treat the external world as a brute given, and to see perception, memory and action as processes driven by its immutable characteristics. In fact this is not so. In learning his route, the postman does not struggle to find an algorithm to fit an arbitrary collocation of house numbers, the number system is designed to make a convenient algorithm possible. We wish to propose the Fabricated World Hypothesis. There is a deliberate allusion here to the Carpentered World Hypothesis of Segall, Campbell and Herskovits (1966). Segall et al. argued that some aspects of our conscious experience, particularly the geometrical illusions of Muller-Lyer, are a consequence of expectations induced by the regularities present in human constructions. Windows are almost always rectangles, never trapezoidal and so on. We
Pragmatics and Technology
believe that this feedback effect of human artifacts upon the human mind is extremely widespread: and furthermore that it is the supremely important principle by which urban man controls his behaviour and releases processing capacity. It is almost banal to say that the human environment is organised around goal satisfaction. Areas of town may be most suitable for shopping, business or leisure, buildings are almost always organised around particular goals: bars, hairdressers, libraries, and so on. Within buildings, rooms are organised around goal-oriented activities: into kitchens, bathrooms, bedrooms, etc. Even within a room there may be a dining table, and similar goal-oriented functional specialization. What is not so banal is that the principles underlying the fabrication of the external environment, make an immense difference to the processes involved in perception, memory and goal-management. To put this in a challenging way: much of human memory is in the environment. The fabricated environment is such that simple algorithms suffice to generate effective behavior within it. In particular for present purposes, in the fabricated environment situations are such as to satisfy the execution conditions of very few plans at one time.” (Gorayska and Lindsay, 1989b, pp. 16–17)
Another source of influence came from the work of Gorayska and Cox (1992) on user-friendly interfaces to expert systems. In the spirit of the Fabricated World Hypothesis, the authors saw expert systems as extensions of the human mind. Consequently, they argued that the locus of control with respect to the language of expression at the interface ought to be placed in the user’s mind and not (as it was the case then, and is still now, predominantly) in the machine. Those days also witnessed the advent of multi-media, anticipated to have a tremendous influence on the delivery of education and training, mass communications and advertising, and manufacturing product design. Multi-media developments benefited from studies in the overlap area between cognitive processing and the organization and design of Information Technology (IT) equipment for the working environment. They aimed at improving overall system performance, by optimizing the effectiveness and efficiency of the human agents. The view taken by the investigators (of what came to be known as human factors) was that this efficiency was heightened by minimizing stress and fatigue and maximizing comfort, safety, and job satisfaction. Within that broad paradigm, several different areas of investigation emerged. Cognitive Ergonomics (Card, Moran, and Newell, 1983 and 1980), for example, focused mainly on things such as the direction of eye movements, spatial proximity of input sources, and degrees of complexity in information displays so as to maximize the efficiency with which both the equipment and the information
3
4
Barbara Gorayska and Jacob L. Mey
provided were used. Cognitive Engineering (Norman, 1986; Norman and Draper, 1986, Rasmussen, 1988) dealt with the notion of an ‘internal model’ (the user had of a system and the system had of the user) in order to achieve optimal user-control and system-flexibility. Engineering Psychology (Wickens, 1992) provided technology designers with psychological profiles of users that helped eliminate design flaws and ensure that the capabilities and limitations of the human were accounted for in optimal design. None of these approaches, however, involved examination of the semantics, syntax, and pragmatics of information itself, and how its form of delivery might impact on the cognitive make-up of users. Such studies would have to involve factors influencing the nature and design of the interface between human cognition and IT processes, as well as products externalizing human thought processes. The Hong Kong CT interest group felt that the study of the relationship between all such IT developments and the human mental processes of forming cognitive schemata was not only timely but necessary. The term Cognitive Technology was coined to express the necessity of exploring the developmental co-dependence between the human mind and tools it interacted with: “[CT] explores the nature of the information made available due to such technological advances, how as a result of this information the human/Information Technology interaction influences cognitive developments in humans and how outcomes of such interactions and influences provide feedback effects on future advances in IT.” (Balasubramanian, Gorayska and Marsh, 1993, p. 4)
An elaborated version of this manifesto was first presented at the conference on New Visions of the post-industrial society: the paradox of technological and human paradigms, in Brighton, UK (Gorayska and Mey, 1994) and appeared in print as (Gorayska and Mey, 1996c). Inger and Jacob Mey brought to the round table a complementary but distinct tradition. In Europe (especially Scandinavia), the United States, and Japan, the consideration of the processes at the interface of human cognition and technology focused on the ways that the ‘toolness’ of computer artifacts interacts (or interferes) with the intentions and needs of the user, and how the user adapts to the ‘toolness’ in various ways, while keeping his or her autonomy vis-à-vis the tool. Inspired and supported by the Scandinavian labor movement, Pelle Ehn and a group of workers at Aarhus University, Denmark, looked into the question “how well the user interface relates to the practice and language the user is familiar with” (Ehn 1988, p. 434). This perspective was not far from that
Pragmatics and Technology
advocated by the group working with Donald Norman in California, who as early as 1986 had published a collection of articles under the telling acronym of ‘UCSD’ (to be read as either ‘User-Centered System Design’, or more whimsically, ‘University of California at San Diego’ — the place where their research was being conducted). Ehn acknowledges his indebtedness to Norman’s and his co-workers’ ideas, when he directly refers to Hutchins et al.’s article in the 1986 Norman book on the matter of “the level of tools in the user interface”. While “the primitive command of a Turing machine gives the user the tools to perform a task that can be done with a computer artifact, … there are not many users that would be helped by this in their ordinary work practice.” (1988, p. 434). Ehn then refers to Alan Perlis’ famous quip (also quoted by Hutchins et al., 1986, p. 101) about the “Turing tar-pit in which everything is possible but nothing of interest is easy”, and its converse, the “over-specialized system where operations are easy, but little of interest is possible” (Hutchins et al., 1986, p. 103). It is in the interface between the tool’s affordances and the human’s intentions that the proper dialectic interaction can take place. Ideally, the ‘toolness’ of the tool should gradually diminish, as the tool itself recedes into the background, as the useful factotum whose presence we rely on without actually ever seeing it. This is also the notion of the ‘invisible tool’, as propounded in an early article by Mey (1988) and also taken up by Norman in his 1999 book The Invisible Computer. In all these matters, what was really at stake was the question: given that adaptation is necessary to deal with in an increasingly complicated technological environment, in which direction should the adaptation take place? Should the humans adapt to the computer, or should the computer be adapted to human needs? On the face of it, an ecological, human-oriented solution would be to let the human be the ‘adaptee’, the one that is being adapted to, while the computer is the one that adapts. But the situation becomes more complex when we consider the fact that, in making the computer adapt to our needs, we have already taken a big bite out of the adaptive process: we are ‘natural-born cyborgs’ (as Clark has expressed it; 2001a), for whom using, and identifying with, tools (including now the computer) is as natural as is breathing. In a way, we are already ‘pre-approved’ for computer use: we are as much destined to be computer-consumers as we are potential acceptors and executors of credit card offers and practices. The real question was, therefore, not if we should adapt, but how we should do it, and on whose premises. This is where CT entered the picture. The ‘adaptivity’ problem (Mey, 1998) that the Europeans, Japanese, and some
5
6
Barbara Gorayska and Jacob L. Mey
Americans had focused on, tied in with the developments at the Hong Kong end of the CT world. Hence it was not enough to re-invent and reverse the old slogan from the 1933 Chicago World Fair (quoted by Norman, 1996, p. 253): ‘Science Finds, Industry Applies, Man Conforms’, to make technology the one “wh[ich] should do the conforming” (ibid.); a balanced view of adaptation should take into account the fact that ‘man’ is in principle a conformist, that is, a flexible tool-maker and user; and that to get the most out of our technology, we have to include the cognitive human self. Which is precisely what CT came to be all about. As Andy Clark puts it, the “tools (i.e. the technology] and culture [i.e., the cognitive environment] are indeed as much determiners of our nature as products of it” (this volume, p.31). The essence of the human dealing with the computer technology is that we are able to bridge both ‘gulfs’ (in Hutchins, Hollan and Norman’s terminology; 1986, p. 94): that of evaluation and that of execution, in order to be able to deal with both ‘worlds’: that of cognition and that of technology.
Establishing the ground Following the fruitful discussions at the momentous ‘round table’ lunch in 1993, the two “worlds” finally found a common expression in the collaborative ground work for the First International Cognitive Technology Conference, CT’95, which took place in Hong Kong in 1995. The call for papers distributed in a variety of available media read: “Cognitive Technology (CT) is the study of the interaction between people and the objects they manipulate. It is concerned with how technologically constructed tools/aids (A) bear on dynamic changes in human perception, (B) affect natural human communication, and (C) act to control human adaptation. Cognitive systems must be understood not only in terms of their goals and computational constraints but also in terms of the external physical and social environments that shape cognition. This can yield (A) technological solutions to real world problems, and (B) tools designed to be sensitive to the cognitive capabilities and affective characteristics of their users. CT takes a broader view of human capability than the current Human Computer Interface research and talks of putting more of the ‘human’ into the interface without attempting to simulate ‘humanness’ on machines. It is primarily concerned with how cognitive awareness in people may be amplified by technological means and considers the implications of such amplification on what it means to be human. It should appeal to researchers across disciplines
Pragmatics and Technology
who are interested in the socio-cultural and individual implications of developments in the interface between technology and human cognition. Any technology which provides a tool has implications for CT; computer technology has special importance because of its particular capacity to provide multi-sensory stimuli and emulate human cognitive processes.” (CT’95 call for papers, first published on the Internet in December 1993)
At the time of CT’95, several new important developments happening at the interface of cognition and technology were in the offing. One was the trend to consider the process of externalizing-internalizing-externalizing of the mind as a constant loop: “When we externalize our minds, we create an object. This object, in its turn, is not just an object in space: it is something that we consider, relate to, love or hate, in short, work with in our minds, hence internalize. In very simple cases, the object is ‘just’ an object for vision (as Descartes seemed to think); more sophisticated ‘considerations’ include the mirroring that takes place when the child discovers its own image as separate from itself […], or when we evaluate a mind product as to its ‘adequacy’, and compare it to the original representation that we had ‘in mind’. Consequently, removing this check can have some strange and unexpected effects, as in the cases where an artist loses the use of one of his senses: the near-blind Monet, the deaf Beethoven, who continued to externalize their minds, but with unmistakably different (but not necessarily artistically inferior) outcomes. The re-internalized object is different from the one that started the externalizing process: it retains a tie to its origin, but has also become strangely independent. It now has a life of its own, and at a certain point in time, it is its turn to become externalized. This process continues until we think the result is adequate, and in the meantime, every new version interacts dialectically with the previous one. It supersedes it, but cannot replace it quite.” (Gorayska and Mey, 1996a, p. 6)
The externalization-internalization-externalization loop gave rise to the second emergent factor — the dichotomy between Cognitive Technology (CT) and Technological Cognition (TC): “The term Cognitive Technology may be too narrow for our intentions. It serves well to describe those issues which deal with determining approaches to tool design meant to ensure integration and harmony between machine functions and human cognitive processes. Unfortunately, it does not adequately describe a number of other issues, in particular, those concerns which relate to the identification and mapping of the relationship between technological products and the processes by which human cognitive structures adapt. We see these
7
8
Barbara Gorayska and Jacob L. Mey
two types of issues as constituting related but distinct areas of investigation which are best kept separate but must be given closely aligned treatment. We therefore reserve the term Cognitive Technology to refer to methodological matters of tool design, and propose the term Technological Cognition to refer to the theoretical explorations of the ways in which tool use affects the information and adaptation of the internal cognitive environments of humans. Human cognitive environments are constituted by the set of cognitive processes which are generated by the relationships between various mental characteristics. These environments serve to inform and constrain conscious thought. Under the new schema, theoretical developments in Technological Cognition would find concrete expression in the constructed artifacts produced by Cognitive Technology. It is this dichotomy which forms the basis for our argument and the grounds from which to develop a framework for analysis.” (Gorayska and Marsh, 1996, p. 28)
Third, and most importantly, the separation of methodologies for tool-design (externalization processes) and the processes by which existent tools in turn technologize the mind (internalization processes) allowed us to put the explorations of both on a firm pragmatic footing: “Cognitive technology thus turns necessarily (although not automatically) into technological cognition (Gorayska & Marsh, 1996). I said: ‘not automatically’, because the necessity is one that we need to realize and make our own. The conditions for using technology are not in the technology alone, but in the minds of the users. To vary Kant’s famous dictum, cognition without technology is empty, but technology without cognition is dangerous and blind. Our minds need the computer as a tool, but we need to consciously integrate the computer into our minds before we start using it on a grand scale. In this way (and in this way only), rather than being a mindless technological contraption, the computer may become a true tool of the mind. Technology creates gadgets that can be put to use without regard for their essential functions. What, in the end, such gadgets do to ourselves and to our environment, however, is not necessarily anybody’s concern in their actual conditions of use, in which the mind-captivating (not to say mind-boggling) fascination of advanced technology allows us to focus on the intermediate stage between intent and effect, the purely technological one. To take a simple example, pressing a button is, in itself, a neutral activity; yet it can in the end cause a door bell to ring just as well as it may detonate a nuclear explosion. ‘I just pressed a button’, the pilot who launched Fat Boy on Hiroshima could have said. And if Timothy McVeigh (one of the persons indicted, and subsequently convicted, in the April 19, 1995 bombing of the Alfred P. Murrah Federal Building in Oklahoma City that took 165 lives)
Pragmatics and Technology
would have had to kill all those 165 people by hand, he never would have gotten beyond the first two or three, especially if he had started out with the babies. Now he could just connect some wires, and leave it at that. Technology does the work; our minds are at rest. Garrison Keillor, in his News from Lake Wobegone, aired on radio station WBEZ, Chicago, on 12 August 1995, offered a philosophical reflection on Halloween pranks. His point was that you are responsible even for the unknown, and unintended, effects of the fun you have (the example was of some boys at Halloween disconnecting a box car and sending it out on the tracks). What determines your responsibility is the outcome; he called this a strictly ‘outcome-based morality’. Applying this to our subject from a slightly different point of view, one could talk of a pragmatic view of technology and cognition. The pragmatics of cognitive technology (CT) deals with technology’s effects on the users and their environment; a pragmatic view of technological cognition (TC) implies that we inquire into the conditions that determine that cognition, that is the conditions under which users can cognize their technology, in order to realize the effects of their technological efforts. We need pragmatics in our CT so that it can be environmentally correct; we need pragmatics for our TC to be morally sound.” (Mey, 1995; italics in original)
Recent developments in CT and in the philosophy of mind echo this statement. To take the latter first, Andy Clark in his recent book, Mindware (2001b), has proposed the concept of ‘wideware’ to cover environmental extensions of the human mind. Under this concept, he subsumes both the Fabricated World Hypothesis (the principles underlying the fabrication of the external environment make an immense difference to the processes involved in perception, memory and goal-management) and our earlier observation that technology externalizes processes (algorithms) which originate in the mind: “… what is distinctive about human thought and reason may depend on a much broader focus than that to which cognitive science has become most accustomed, one that includes not just body, brain, and the natural world, but the props and aids (pens, papers, PCs, institutions) in which our biological brains learn, mature and operate. A short anecdote helps set the stage. Consider the expert bartender. Faced with multiple drink orders in a noisy and crowded environment, the expert mixes and dispenses drinks with an amazing skill and accuracy. But what is the basis of this expert performance? Does it all stem from finely tuned memory and motor skills? By no means. In controlled psychological experiments (Beach 1988, cited in Kirlik, 1988, p. 707), it becomes clear that expert skill involves a delicate interplay between internal and environmental factors. The experts
9
10
Barbara Gorayska and Jacob L. Mey
select and array distinctly shaped glasses at the time of ordering. They then use these persistent cues so as to help recall and sequence the specific orders. Expert performance thus plummets in tests involving uniform glassware, whereas novice performances are unaffected by any such manipulations. The expert has learned to sculpt and exploit the working environment in ways that transform and simplify the task that confronts the biological brain. Portions of the external world thus often function as a kind of extraneural memory store.” (Clark, 2001b, p. 141)
The need to consider moral issues in tool-design, accentuated by the pragmatic processes of tool use, has been reiterated by Chan and Gorayska (2001) in their ‘Critique of Pure Technology’. Echoing Wittgenstein (1953), they remind us that the meaning of a piece of technology is a consequence of how it is actually being used. Following Kant (1781), they point out that the way technologies will be used by people is limited by human inherent and enjoined perceptive capabilities. Therein lies a danger of tool misuse. What we need to work towards is an increased awareness 1) of the unthinking use of pure technology, i.e. technology decontextualized from the cognitive processes activated through its use, and 2) of the processes through which we become habituated to the cognitive scaffold (Clark, 1997) that technology provides. How can this be achieved? “This can be achieved through an open dialogue — an ongoing and collective deliberation of how our environment and our lives should be shaped by technical means and what kinds of risks we as the public can collectively bear. Such collective deliberations — which ultimately lead to collective responsibility in the face of adversity — require access of a wider public to details of the design and the already known evidence of potential short-term and long-term risks in adopting a given technology. […] The parallel [to Kant] in the case of pure technology is that we also need to acknowledge the limits of technology and admit the fact that human-related problems in a technology-oriented society have to be dealt with not only at the level of cognitive, rational, technology, but also at the cognitive, emotive, psychological and spiritual levels. Most importantly, they will have to be dealt with at the ethical, social and political levels as well.” (Chan and Gorayska, 2001, p. 474)
By virtue of bringing the pertinent questions of CT to the attention of a wider audience, this book is an invitation to such public, collective deliberations.
Pragmatics and Technology
The CT agenda It has been said that you don’t really understand a scientific problem area until you have been able to formulate some pertinent questions belonging to its remit. In this perspective, it seems to us that the first question above is subordinated to the second one, and that this latter question encapsulates the answers to the first one. For, if we can argue successfully for the need for some urgent problems to be solved in the area of CT, then of course the need for a forum discussing those questions has been established, and the audience will manifest itself: ‘If you build it, they will come’. If we endorse CT as the study of the pragmatic cognitive processes ongoing at the dialectic boundary area between human tool users and tools used by humans, we will immediately be able to recognize a whole array of problems: – – – – – –
What is the tool to a human, and what is the human to a tool? (see Gorayska, Marsh and Mey, 2001) What does it mean to say that ‘the human comes first in technology’? Where are the limits for computerizing human activities — if there are any? How does the tool, especially the computer tool, change the mind (as opposed to, or as a complement to, the mind changing the (computer) tool)? Can we have too much technology (‘the computer taking over’)? What are the secondary (‘hidden’) effects of this technology (cf. Salomon 1993; Gorayska and Mey, 1996a)?
More importantly, however, one issue that looms large in such considerations is the pertinent methodological question that has permeated CT (conceived of as a scientific discipline) to date, namely; What methods do we as investigators need to employ when we study technological cognition for the purpose of designing or evaluating cognitive tools? In other words, How do we study and apply the pragmatics of technology? And more concretely, What (kind of) question(s) do we want to address in CT, and how are we going to address them? If we want to find out what we do when we do CT in design we must first ask ourselves what it means to do CT in general. Since CT is essentially to do with the relationship between humans and their tools, the following general aspects of ‘doing CT’ are worthy of attention: –
The question of what constitutes a cognitive tool. All artifacts are in some measure cognitive artifacts and can be situated on a continuum of purposeful use between the extremes of raw material and the mind (Gorayska, Mey and Marsh, 2001). Inasmuch as we employ them as mental prostheses, AI
11
12
Barbara Gorayska and Jacob L. Mey
–
–
–
–
–
robots (Pfeifer, 2002, reprinted in this volume), event modeling languages (Jirotka and Luff, this volume), or natural language (Dascal, 2002, reprinted in this volume) that help designers analyze and better understand phenomena in the real world, are in fact instances of cognitive tools. So are narratives (Dautenhahn, 2002, reprinted in this volume). Could the same be said about care-givers who help newborn babies acquire cognitive skills by simply interacting with them (Lindsay and Gorayska, 2002, reprinted in this volume)? Is it more fruitful, from the standpoint of CT, to define cognitive tools in purely functional terms? Could there be natural cognitive technologies, i.e., spontaneously evolved, functionally determined, often modular (depending on relevance discontinuities; Lindsay and Gorayska, op. cit.), mental processes that constitute the techne of the mind, as first proposed by Meenan and Lindsay (2002) and further evidenced by El Ashegh and Lindsay and by Bowman et al. (both this volume)? The question of co-evolution of the mind and tools it creates. Can minds and tools indeed co-evolve and, if so, how can this evolutionary co-dependence be shown? For interesting discussions relevant to this issue see, for example, Mithen (1996), Lindsay (1999), Clark (2002, reprinted here) and Dautenhahn (2002, reprinted here). The question of adaptivity vs. adaptability, as raised by Mey (1998), and discussed above: can we distinguish between the two in practice? Some examples of how technology can coerce behavioral change in its users can be found in Gill’s chapter (this volume). The question of the ‘transparent tool’, the tool that you use without noticing that you are using it (following Mey 1988, Norman 1999): what cognitive effects, if any, is it likely to have on their users? Dautenhahn (2002, reprinted in this volume) provides an excellent example. Arguments presented in Ali (2002) and Lueg (2002), both reprinted in this volume, are also relevant to this question. The dangers of the ‘can-must-can’ spiral, or the dialectics of ‘tool overbidding’: once you have created a tool that can do a thing, you are obliged to use it, and create a tool that can do even more, which then puts you under the obligation to use that tool, and so on ad infinitum. (We have seen sad examples of this spiral in the arms race, or the race to fill our lives with useless gadgetry that requires us to buy even more useless gadgetry, and so on.) The unthinking use of technology (discussed by Chan and Gorayska, 2001): can we predict it and how can we avoid it? The meaning of a tool lies in its use. If misapplied with little thought, the tool use can have disastrous
Pragmatics and Technology
–
–
–
–
consequences. The extreme case of such misapplication of use with respect to the computer that brackets emotion from cognition has come to be known in CT as the Schizophrenia Problem (Janney, 1997, further discussed in Ali, 2002, reprinted here). The related question of human wants and needs: is what we want, really what we need, or are our technological needs the result of someone else’s wants? In other words, we have to realize that cognition is not neutral, but always ‘embodied’ in some human or technological context (Clark, 2000 & 2002, reprinted in this volume). What are the social, psychological and cognitive mechanisms in enjoining need? For some answers to this question, see Lindsay and Gorayska (2002, reprinted in this volume) and Lindsay (1996). The dialectics between a routine operation facilitating our daily needs, and the mind-numbing sterilization of our daily activities through routine operation (possibly resulting in the computer user’s ‘screen rage’, in analogy to the frustrated driver’s ‘road rage’): what technologies do we need to overcome that sterilization? In this connection, a provocative piece by Will Fitzgerald (this volume) may serve as a poignant reminder. In humanizing technology, there are two directions our work can take: one is ‘from the inside out’, the other ‘from the outside in’. By this we mean that we either can take our point of departure in the human (head and body) and try to get our technological environment to fit the perceived dimensions that we have of it; or we can work from the environment as a given, and try to fit the human (body and head) into it: which direction maximizes human benefits? We will dwell some more on this distinction in the following. (See also the arguments in Jirotka and Luff, this volume.) The question of motives and benefits: we need to carefully consider what motivates technology inventors and developers, what specifies the broader scientific frameworks within which they operate, and furthermore, what motivates design perspectives. Some such perspectives, proposed for CT, include Empirical Modeling (Beynon, 1995), Cognitive Dimensions (Blackwell et al., 2001; Kutar et al., 2001), the Reflective Questioning Perspective (Gorayska and Marsh, 1996 & 1999), Goal-Based Relevance Analysis (Lindsay and Gorayska, 2002, reprinted in this volume, and Lindsay (1996)), and, with relevance to the latter, the Affordance View (Chandrasekharan, this volume). The paramount question is whether such considerations of these and other proposed perspectives on practice can put us in a better position to understand, at a deeper level, a design process? Can this benefit the formulation of a broader, synthesizing CT methodology?
13
14
Barbara Gorayska and Jacob L. Mey
–
The question of contributions from the related disciplines: To what extent can empirical research in those disciplines further the design of humane technologies? Arguments, experiments and case studies relevant to this question are presented in this volume in the chapters by Roger Lindsay and Barbara Gorayska (2002, reprinted here), Marina Jirotka and Paul Luff, Hanan A. El Ashegh and Roger Lindsay, Sarah Bowman et al., Satinder Gill, and Kerstin Dautenhahn (2002, reprinted here).
Changing nature, changing us To grasp the distinction alluded to above (‘inside out’ vs. ‘outside in’), let’s walk through an imagined example of a prosthesis (cf. Mey, 2000): an artificial limb, such as a hand. If we want to define the artificial hand from the inside out, we go about defining it by its specific functions, such as we see it from the point of view of our body and our understanding of its functions in relation to our needs. If we define the hand from the outside in, the emphasis is on the existing conditions, and the question to ask is: What kind of hand (or prosthesis, in general) would fit these conditions, and how are we going about ‘technologizing’ them? Among the questions that may turn up here are: – – – –
How much does visual and tactile resemblance play a role in defining the hand? Is a hook acceptable as a prosthesis (cf. the movie ‘Edward Scissorhands’)? Does the prosthesis have to be attached to a body, or can it operate independently, by remote control? Do we have to call the prosthesis a ‘hand’ at all (given that we use the word ‘hand’ metaphorically in a number of contexts where the resemblance is nil, e.g. when we talk about the hands on a dial or a clock, or a ‘side’ or ‘direction’, as in the Japanese Yamanote ‘the hilly section of Tokyo’, lit. ‘the mountain’s hand’)?
Given our conception of CT, how would we deal with these problems? Here, typically, the question of ‘design’ in tool manufacturing crops up — a much maligned term, since it has been taken over by certain professional ‘designers’ who operate more for their own good than for that of others. As to the conditions for designing a tool from the outside in, these are similarly well-known: one starts out with a specification of the problem, then goes on to figure out how the tool will fit in. Consider the activity of fruit-or-
Pragmatics and Technology
berry-picking. We consider the human hand as the primary tool for this activity: from times immemorial, people have been picking berries with their fingers, apples and pears with their hands. But consider the boys who want to have some walnuts out of a tree. The nuts hang too high, and they cannot reach them by hand. What do they do? They throw branches and other objects into the tree, trying to make the nuts fall down. This may work for nuts, that don’t get damaged by the fall. But if you were to do this to avocados, the result would be disastrous. Here, the environmental conditions dictate another solution, which one of us (JM) in fact saw practiced in Brasília (the capital of Brazil) one day, when walking to school: a man was maneuvering a tall pole with a cup-like attachment at the end. Placing the pole with the cup under the avocado, he then carefully manipulated a little knife that was attached to the end of the pole, with the help of strings running down the length of the pole. After several tries, the avocado landed successfully in the cup and was lowered safely down to the owner of the tree (or the user, whichever the case might have been — we believe nobody in Brasília owns those trees outside the apartment blocks). In this case, the ‘hand’ was completely externalized and detached from the body; it fulfilled its functions in part because of its non-likeness to a ‘real’ hand (one wouldn’t even call such a prosthesis a ‘hand’, we believe). From the primitive stick thrown into the tree to this sophisticated contraption, we notice an increasing ‘instrumentalization’ (along the lines sketched in Gorayska et al., 2001): the primitive tool becomes a sophisticated instrument. But at the same time, its role as a prosthesis shifts character: from being in touch with the body, it becomes part of the environment rather than of the human. Externalizing our activities, thus, does not mean that we have to replicate them; they can be ‘reborn’ in different shapes, sometimes even unrecognizably or unidentifiably so. Figure for example, that some genetic engineering process would be able to produce an avocado with a hard bolster, a bit like a cocoa nut. Then the whole operation of avocado picking could be effected by a device like the one we use for cherry harvesting: a tractor with a big mechanical arm clamps the tree, spreads out a sheet underneath it, and proceeds to shake the cherries out of the tree and on to the sheet, which is then folded mechanically and emptied in the container attached to the tractor. The activity of fruit picking, which was originally thought of as specifically hand and fingeroriented, has now been transformed into something which happens completely outside the body. Using these examples as our guideline, we now are able to formulate a few more pertinent questions to test our CT practice:
15
16
Barbara Gorayska and Jacob L. Mey
–
–
–
How do we have to think about the material conditions and parameters for using a prosthesis, or for that matter, any tool? Taking, again, the case of the hand, we would have to distinguish between US and European parameters for the opening of doors: the up-down movement that is typical for European latch-type door openers is strategically different from the rotating movement that is appropriate to the turning knobs that are standard in the US. We should be thinking also of other ‘affordances’ for hand interaction: a technology that would be proper for berry picking would not necessarily be fit for hand-shaking at conventions, for instance, or for the apostolatus manuum, ‘the ‘apostolate of the hands’, as Catholic priests were wont to call their greeting obligations at parish and other religious events. To properly assign functions to the prosthesis (the ‘hand’), we need to consider the various functional movements that are involved in hand-toworld interaction, and differentiate our prosthesis accordingly: – grasping vs. picking in relation to the nature of the objects (solid vs. scattered, flat vs. protruding, substantial vs. leafy, and so on); – pushing vs. pulling (e.g., buttons vs. knobs); – vertical vs. horizontal direction (levers vs. handles); – lifting vs. stroking (e.g., a baby or a cat).
An interesting example of how the design of the prosthesis feeds back into the design of the natural hand and vice versa is furnished by the development and progress of the common computer instrument known as the ‘mouse’. In the beginning, the mouse was just a simple button on a wire, not much different from a lamp switch, that could be moved around. When the mouse became ‘on line’, that is, when its movements could be said to emulate and control the movements of the cursor on the screen, new possibilities turned up. Some researchers started to think about the fact that the human user not only had hands, but also feet; and that the mouse-like instrument, by having only one button, did not exploit the full capacity of the human hand steering it, which had five fingers (plus the fact that there were two hands; but since people had to use at least one of them for typing, this was never a serious option). Hence, some of these people set out to design ‘mice’ that could remedy this defect. The five-finger mouse and the foot mouse were actually developed as early as the mid-eighties, in Japan, at the Department of Computer Sciences of Osaka University, where JM was witness to how the only person (a graduate student) who was able to manipulate both the foot- and the five-fingered mouse (and both at the same time!) obtained quite striking achievements as a result of this
Pragmatics and Technology
successful coordination. However, as one of the researchers who had developed the five-finger gadget (and who was not even able to operate it himself) remarked: ‘It’s a nice toy, but the only person who ever will ever be able to use it is Katoo-san, and it took him almost two months’. The moral of the story is, perhaps, not that the invention of the fivefingered mouse (or the foot mouse) in itself was contrary to the ideas of CT, as we see them, but that this particular invention was developed out of a wrong idea about making prostheses, namely that they should be as close as possible to the human body part that they replaced. The relevance of using fingers was overlooked. In actual fact, there isn’t much that we specifically use all of our five fingers for, singly and individually (except if we are prestidigitators or consummate piano players). Most of the time, we use the hand as such, or the fist, e.g., for throwing a ball, or punching someone in on the nose. Here, the five fingers cooperate in the movement and do not work on their own, as in the case of the five-fingered mouse. In other words, the practice of CT that went into this invention was misguided in that it took the wrong point of departure; or better, the point of departure itself was wrong, given the circumstances. The world of the mouse and the keyboard do not require more than one button to regulate the movement of the cursor. Even if there are two or three buttons on a mouse, our hunch is that most users will be content to use just the one, and resort to the other, or other two, only in cases of necessity, such as when executing special functions (which, of course, could also be realized in other ways, too, just as it is the case with most of the mouse’s functions, which can be replaced by a control stroke plus a character, if we desire to avoid tendonitis and computercaused carpal tunnel syndrome and other side benefits of ‘mousing’!)
Conclusion As the above examples and discussions have shown, the need to do CT does not stop at the edge of the human-computer interface. CT penetrates far down into the reaches where the cognitive mind reigns supreme, and from where it prepares its forays into the world of technology. In the other direction, CT reaches out to the farthest shores of technological development, bringing back the lost tribes of tooling humans into the cognitive fold. In this perspective, we can think of prostheses and other (computerized) tools as instruments helping to effect this return to humanity, the cognitive tools being extensions of the
17
18
Barbara Gorayska and Jacob L. Mey
human body (such as in the case of the artificial limb) and of the human mind (as in Gorayska and Lindsay’s ‘fabricated world’ or Clark’s ‘wideware’), but at the same time devices that change the body and inspire the mind. An analogy from music may help us to understand this double problematic. It has become common place these days among music buffs to demand that baroque and other older music be played on ‘original’ instruments. That usually means that we construct instruments that are as close as possible to their originals (which, in itself, is a dubious proposition: no modern Stradivarius will ever come close to the beauty of sound and shape that is inherent in the few remaining originals). But one could attack the problem of reproducing technology in a quite different way, viz., by asking: If Bach had lived today, would he have been happy to only play on his old foot-and-hand driven, bellow-powered monster of a church organ, if a modern, electrically powered and electronically steered instrument were available to him? Would Beethoven have preferred to hammer out his sonatas on the ‘Hammerklavier’ rather than on a contemporary Yamaha or Steinway grand? (Of course, he was deaf anyway, so perhaps it wouldn’t matter in his case…) We think the answer is obvious. Being great minds, not merely great composers and/or performers, these cognitive geniuses would have recognized the tool for what it was: an extension of the mind, not an impediment to its development. They would have jumped at the innovative potential of the new tools, and have dispatched, perhaps with a sad smile, the old instruments to the junkyard of musical history, to be retrieved only for nostalgic purposes. Using the new tools, they would have created a totally new music, in dialectical interaction with their modern instruments. On the other hand, using the retro technology of our days, the most we can do in playing Bach on a replicated 17th century clavichord is to mechanically re-create the old work, but without the inspiration and interaction that went into the original performance, which itself was a unique product of the dialectics between then-cognition and then-technology. Similarly, when dealing with modern technological and cognitive artifacts, it behooves us to take a bold stance and demand of ourselves that we face the challenges hands-on and with open eyes. Technological cognition inspires cognitive technology; it should never clamp the brakes on our development, whether cognitive or technological. Cognitive Technology, as a scholarly discipline, sees it as its purpose in life to offer a modest contribution towards realizing that aim, a push in what we believe is the right direction.
Pragmatics and Technology
References Ali, S. M.(2002). The end of the “Dreyfus Affair”: (Post)Heideggerian meditations on man, machine and meaning. International Journal of Cognition and Technology, 1(1), 85–96. Reprinted in this volume. Balasubramanian, N. V., B. Gorayska & J. Marsh (1993). Establishment of a Cognitive Technology research group. Technical Report TR-93-04. Department of Computer Science. Hong Kong: City University of Hong Kong. Beach, K. D. (1988). The role of external mnemonic symbols in acquiring an occupation. In M. N. Gruneberg & R. N. Sykes (Eds.), Practical Aspects of Memory: Current Research and Issues 1, pp. 342–346. New York: Wiley. Beynon, M. (1995). Empirical modelling for educational technology. In J. Marsh, C. L. Nehaniv & B. Gorayska (Eds.), Proceedings of the Second International Cognitive Technology Conference: 54–68. August 25–28, Aizu, Japan Blackwell, A. F., C. Britton, A. Cox, T. R. G. Green, C. Gurr, G. Kadodo, M. S. Kutar, M. Loomes, C. L. Nehaniv, M. Petre, C. Roast, C. Roe, A. Wong & R. M. Young (2001). Cognitive dimensions of notations: Design tools for Cognitive Technology. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 325–341. Berlin: Springer. Bowman, S., L. Hinkley, J. Barnes & R. Lindsay (this volume). Gaze aversion and the primacy of emotional dysfunction in autism. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 267–301. Amsterdam: John Benjamins. Card, S. K., T. Moran & A. Newell (1980). The keystroke-level model for user performance time with interactive systems. Communications of the ACM, 23: 396–410. Card, S. K., T. Moran & A. Newell (1983). The psychology of human-computer interaction. Hillsdale, N. J.: Erlbaum. Chan, H-M. & B. Gorayska (2001). Critique of pure technology. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 463–476. Berlin: Springer. Chandrasekharan, S. (this volume). The Semantic Web: Knowledge representation and affordance. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 153–172. Amsterdam: John Benjamins. Clark, A. (1997). Being there: Putting the brain, body, and world together again. Cambridge, Mass.: MIT Press. Clark, A. (2001a). Natural-born cyborgs? In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 17–24. Berlin: Springer. Clark, A. (2001b). Mindware. Oxford: Oxford University Press. Clark, A. (2002). Towards a science of the bio-technological mind. International Journal of Cognition and Technology 1(1), 21–33. Reprinted in this volume. CT’95 (1993). Call for Papers for the First International Cognitive Technology Conference. Internet publication. Hong Kong: City University of Hong Kong.
19
20
Barbara Gorayska and Jacob L. Mey
Dascal, M. (2002). Language as a Cognitive Technology. International Journal of Cognition and Technology 1(1), 35–61. Reprinted in this volume. Dautenhahn, K. (2002). The origins of narrative: In search of the transactional format of narratives in humans and other social animals. International Journal of Cognition and Technology 1(1), 97–123. Reprinted in this volume. Ehn, P. (1988). Work-oriented design of computer artifacts. Stockholm: Arbetslivscentrum (distributed by Almquist & Wiksell International). El Ashegh, H. A. & R. Lindsay (this volume). Cognition and body image. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 175–223. Amsterdam: John Benjamins. Fitzgerald, W. (this volume). Martin Luther King and the “Ghost in the Machine”. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, coevolution, pp. 345–353. Amsterdam: John Benjamins. Gill, K. S (1996). Information Society: New media, ethics and Postmodernism. Berlin: Springer. Gill, S. (this volume). Body moves and tacit knowing. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 241–265. Amsterdam: John Benjamins Publishing Company. Gorayska, B. (1994). How to lose the soul of language. Journal of Pragmatics 22: 536–547. Gorayska, B. & K. Cox (1992). Expert systems as extensions of the human mind. AI & Society 6: 245–262. Gorayska, B. & R. Lindsay (1989a). Metasemantics of relevance. The First International Congress on Cognitive Linguistics. Print A265. L. A. U. D. (Linguistic Agency at the University of Duisburg) — Catalogue: Pragmatics, 1989. Available from http:// www.linse.uni-essen.de:16080/linse/laud/shop_laud. Gorayska, B. & R. Lindsay (1989b) On relevance: Goal dependent expressions and the control of planning processes. Technical Report 16. School of Computing and Mathematical Sciences. Oxford: Oxford-Brookes University. (First published as Gorayska and Lindsay 1989a). Gorayska, B. & R. Lindsay (1993). The roots of relevance. Journal of Pragmatics 19(4): 301–323. Gorayska, B. & J. Marsh (1996). Epistemic technology and relevance analysis: Rethinking Cognitive Technology. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 27–39. Amsterdam: North Holland. Gorayska, B. & J. Marsh (1999). Investigations in Cognitive Technology: Questioning perspective. In B. Gorayska, J. Marsh & J. L. Mey (Eds.) Humane interfaces: Questions of methods and practice in Cognitive Technology, pp. 17–43. Amsterdam: North Holland. Gorayska, B., J. Marsh & J. L. Mey (2001). Cognitive Technology: Tool or instrument. In Meurig Beynon, Chrystopher L. Nehaniv & Kerstin Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 1–16. Berlin: Springer. Gorayska, B. & J. L. Mey (1994). Cognitive Technology. In Karamjit S. Gill (Ed.) Proceedings of the conference on New Visions of the Post-industrial Society: The paradox of technological and human paradigms, SEAKE Centre, Brighton 1994. Reprinted in Karamjit Gill (Ed.) 1996, pp. 287–294.
Pragmatics and Technology
Gorayska, B. & J. L. Mey (1996a). Of minds and men. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 1–24. Amsterdam: North Holland. Gorayska, B. & J. L. Mey (Eds.) (1996b). Cognitive Technology: In search of a humane interface. Amsterdam: North Holland. Gorayska, B. & J.L. Mey (Eds.) (1996c). AI & Society 10, Special Issue on Cognitive Technology. Hutchins, E. L., J. D. Hollan & D. A. Norman (1986). Direct manipulation interfaces. In D. A. Norman & S. W. Draper (Eds.) User-centered computer design, pp. 87–124. Hillsdale, N. J.: Erlbaum. Janney, R. W. (1999). Computers and psychosis. In J. P. Marsh, B. Gorayska & J. L. Mey (Eds.), Humane Interfaces: Questions of methods and practice in Cognitive Technology, pp. 71–79. Amsterdam: Elsevier Science. (An earlier version of this paper appeared as Janney, R. W. (1997). The prosthesis as partner: Pragmatics and the Human-Computer Interface. In J. P. Marsh, C. L. Nehaniv & B. Gorayska (Eds.), Proceedings of the Second International Cognitive Technology Conference CT’97: Humanizing the Information Age, pp. 1–6. IEEE Computer Society Press.) Jirotka, M. & P. Luff (this volume). Communicating sequential activities: An investigation into the modelling of collaborative action for system design. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 303–330. Amsterdam: John Benjamins. Kant, I. (1781). Critique of pure reason. (Translated by Norman Kemp Smith.) London: MacMillan, 1933. Kirlik, A. (1988). Everyday life environments. In W. Bechtel and G. Graham (Eds.), A Companion to Cognitive Science, pp. 702–712. Oxford: Blackwell. Kutar, M. S., C. L. Nehaniv, C. Britton & S. Jones (2001). The Cognitive dimensions of an artifact vis-à-vis individual human users: Studies with notations for the temporal specification of interactive systems. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 344–355. Berlin: Springer. Lindsay, R. (1996). Cognitive Technology and the pragmatics of impossible plans — a study in Cognitive Prosthetics. AI & Society. 10, 273–288. Special issue on Cognitive Technology. Lindsay, R. (1999). Can we change our minds? The impact of computer technology on human cognition. In B. Gorayska, J. Marsh & J. L. Mey (Eds.) Humane interfaces: Questions of methods and practice in Cognitive Technology, pp. 45–69. Amsterdam: North Holland. Lindsay, R. & B. Gorayska (2002). Relevance, goal management and Cognitive Technology. International Journal of Cognition and Technology 1(2), 187–232. Reprinted in this volume. Lueg, C. (2002). Looking under the rug: Context and context aware artifacts. International Journal of Cognition and Technology 1(2), 287–302. Marsh, J., B. Gorayska, & J. L. Mey (Eds.) (1999). Humane interfaces: Questions of methods and practice in Cognitive Technology. Amsterdam: North Holland. Meenan, S. & R. Lindsay (2002). Planning and the neurotechnology of social behaviour. International Journal of Cognition and Technology 1(2), 233–274.
21
22
Barbara Gorayska and Jacob L. Mey
Mey, J. L. (1988). CAIN and the invisible tool: Cognitive Science and the Human-Computer Interface. Journal of the Society of Instrument and Control Engineers (Tokyo) 27(1), 247–252. Mey, J. L. (1995). Cognitive Technology — Technological Cognition. In: Proceedings of the First International Cognitive Technology Conference, August 1995, Hong Kong. Reprinted in AI & Society (1996) 10, 226–232. Mey, J. L. (1998). Adaptability. In: Concise encyclopedia of pragmatics, pp. 5–7. Oxford: Elsevier Science. Mey, J. L. (2000). The computer as prosthesis. Hermes, Journal of Linguistics 24, 14–29. Mithen, S. J. (1996). The prehistory of the mind: A search for the origins of art, religion and science. London: Orion Books Ltd. Norman, D. A. (1986). Cognitive Engineering. In D. A. Norman & S. A. Draper (Eds.), User Centred Systems Design, pp. 31–63. Hillsdale, N. J.: Erlbaum. Norman, D. A. (1993). Things that make us smart. Reading, Mass.: Addison-Wesley. Norman, D. A. (1999). The invisible computer. Cambridge, Mass.: MIT Press. Norman, D. A. & S. W. Draper (Eds.) (1986). User-centered computer design. Hillsdale, N. J.: Erlbaum. Pfeifer, R. (2002). Robots as cognitive tools. International Journal of Cognition and Technology 1(1), 125–143. Reprinted in this volume. Rassmussen, J. (1988). Information Processing and Human-Machine Interaction: An approach to Cognitive Engineering. New York: North Holland. Salomon, G. (Ed.). (1993). Distributed cognition: Psychological and educational considerations. Cambridge: Cambridge University Press. Segall, M. H., D. T. Campbell & M. J. Herskovits (1966). The influence of culture on visual perception. Indianapolis: Bobbs-Merrill. Wickens, C. (1992). Engineering Psychology and Human Performance. 2nd edition. New York: Harper Collins. Wittgenstein, L. (1995). Philosophical Investigations. Oxford: Blackwell.
Part I
Theoretical issues
Towards a science of the bio-technological mind* Andy Clark Indiana University
“Soon, perhaps, it will be impossible to tell where human ends and machines begins”. Maureen McHugh, China Mountain Zhang, p. 214.
1.
A sketch
The study of Cognitive Technology is, in a very real sense, the study of ourselves. Who we are, what we are, and even where we are, are all jointly determined by our biological natures and the web of supporting (and constraining) technologies in which we live, work and dream. We humans, I would argue, are naturally pre-disposed (in ways unique to our species) to create cascading torrents of non-biological structure within which to think and act. We do not need neural implants and prosthetic limbs to count as Nature’s very own Cyborgs. For we are, and long have been, bio-technological symbionts: reasoning and thinking systems spread across biological matter and the delicately codetermined gossamers of our socio-technological nest. This tendency towards bio-technological hybridisation is not an especially modern development. On the contrary, it is an aspect of our humanity which is as basic and ancient as the use of speech, and which has been extending its territory ever since. We see some of the ‘cognitive fossil trail’ of the Cyborg trait in the historical procession of potent Cognitive Technologies that begins with speech and counting, morphs first into written text and numerals, then into early printing (without moveable typefaces), on to the revolutions of moveable typefaces and the printing press, and most recently to the digital encodings that bring text, sound and image into a uniform and widely transmissible format. Such technologies, once up-and-running in the various appliances and institutions
26
Andy Clark
that surround us, do far more than merely allow for the external storage and transmission of ideas. They actively re-structure the forms and contents of human thought itself. And there is no turning back. What’s more, the use, reach and transformative powers of these technologies are escalating. New waves of user-sensitive technology will bring this ageold process to a climax, as our minds and identities become ever more deeply enmeshed in a non-biological matrix of machines and tools, including software agents, vast searchable databases, and daily objects with embedded intelligence of their own. We humans have always been adept at shaping our minds and skills to fit our current tools and technologies. But when those tools and technologies start trying to fit us, in turn — when our technologies actively, automatically, and continually tailor themselves to us, just as we do to them — then the line between tool and user becomes flimsy indeed. Such technologies will be less like tools and more like part of the mental apparatus of the person. They will remain tools in only the thin and ultimately paradoxical sense in which my own unconsciously operating neural structures (my hippocampus, my posterior parietal cortex) are tools. I do not really ‘use’ my brain. There is no user quite so ephemeral. Rather, the operation of the brain makes me who and what I am. So, too, with these new waves of sensitive, interactive technologies. As our worlds become smarter, and get to know us better and better, it becomes harder and harder to say where the world stops and the person begins. What are these technologies? They are many, and various. They include potent, portable machinery linking the user to an increasingly responsive worldwide-web. But they include also, and perhaps ultimately more importantly, the gradual smartening-up and interconnection of the many everyday objects which populate our homes and offices. This brief note, however, is not going to be about new technology. Rather, it is about us, about our sense of self, and about the nature of the human mind. The goal is not to guess at what we might soon become, but to better appreciate what we already are: creatures whose minds are special precisely because they are naturally geared for multiple mergers and coalitions. Cognitive technologies, ancient and modern, are best understood (I suggest) as deep and integral parts of the problem-solving systems we identify as human intelligence. They are best seen as proper parts of the computational apparatus that constitutes our minds. If we do not always see this, or if the idea seems outlandish or absurd, that is because we are in the grip of a simple prejudice: the prejudice that whatever matters about MY mind must depend solely on what goes on inside my own biological skin-bag, inside the ancient fortress of skin and skull. But this fortress has been built to be breached. It is a
Towards a science of the bio-technological mind
structure whose virtue lies in part in it’s capacity to delicately gear its activities to collaborate with external, non-biological sources of order so as (originally) to better solve the problems of survival and reproduction. Thus consider two brief examples: one old (see the Epilogue to Clark, 1997) and one new. The old one first. Take the familiar process of writing an academic paper. Confronted, at last, with the shiny finished product the good materialist may find herself congratulating her brain on its good work. But this is misleading. It is misleading not simply because (as usual) most of the ideas were not our own anyway, but because the structure, form and flow of the final product often depends heavily on the complex ways the brain co-operates with, and depends on, various special features of the media and technologies with which it continually interacts. We tend to think of our biological brains as the point source of the whole final content. But if we look a little more closely, what we may often find is that the biological brain participated in some potent and iterated loops through the cognitive technological environment. We began, perhaps, by looking over some old notes, then turned to some original sources. As we read, our brain generated a few fragmentary, on-the-spot responses which were duly stored as marks on the page, or in the margins. This cycle repeats, pausing to loop back to the original plans and sketches, amending them in the same fragmentary, on-the-spot fashion. This whole process of critiquing, re-arranging, streamlining and linking is deeply informed by quite specific properties of the external media, which allow the sequence of simple reactions to become organised and grow (hopefully) into something like an argument. The brain’s role is crucial and special. But it is not the whole story. In fact, the true power and beauty of the brain’s role is that it acts as a mediating factor in a variety of complex and iterated processes which continually loop between brain, body and technological environment. And it is this larger system which solves the problem. We thus confront the cognitive equivalent of Dawkins’ (1982) vision of the extended phenotype. The intelligent process just is the spatially and temporally extended one which zigzags between brain, body and world. Or consider, to take a superficially very different kind of case, the role of sketching in certain processes of artistic creation. Van Leeuwen, Verstijnen and Hekkert (1999) offer a careful account of the creation of certain forms of abstract art, depicting such creation as heavily dependent upon “an interactive process of imagining, sketching and evaluating [then re-sketching, re-evaluating, etc.]” (Op. cit., p. 180). The question the authors pursue is: why the need to sketch? Why not simply imagine the final artwork “in the mind’s eye” and then execute it directly on the canvas? The answer they develop, in great detail and
27
28
Andy Clark
using multiple real case-studies, is that human thought is constrained, in mental imagery, in some very specific ways in which it is not constrained during on-line perception. In particular, our mental images seem to be more interpretatively fixed: less able to reveal novel forms and components. Suggestive evidence for such constraints includes the intriguing demonstration (Chambers and Reisberg, 1989) that it is much harder to discover (for the first time) the second interpretation of an ambiguous figure (such as the duck/rabbit) in recall and imagination than when confronted with a real drawing. Good imagers, who proved unable to discover a second interpretation in the mind’s eye, were able nonetheless to draw what they had seen from memory and, by then perceptually inspecting their own unaided drawing, to find the second interpretation. Certain forms of abstract art, Van Leeuwen et al. go on to argue, likewise, depend heavily on the deliberate creation of “multi-layered meanings” — cases where a visual form, on continued inspection, supports multiple different structural interpretations. Given the postulated constraints on mental imagery, it is likely that the discovery of such multiple interpretable forms will depend heavily on the kind of trial and error process in which we first sketch and then perceptually (not merely imaginatively) re-encounter visual forms, which we can then tweak and re-sketch so as to create a product that supports an increasingly multi-layered set of structural interpretations. This description of artistic creativity is strikingly similar, it seems to me, to our story about academic creativity. The sketch-pad is not just a convenience for the artist, nor simply a kind of external memory or durable medium for the storage of particular ideas. Instead, the iterated process of externalising and re-perceiving is integral to the process of artistic cognition itself. One useful way to understand the cognitive role of many of our self-created cognitive technologies is thus as affording complementary operations to those that come most naturally to biological brains. Consider here the connectionist image (Rumelhart, McClelland and the PDP Research Group, 1986; Clark, 1989) of biological brains as pattern-completing engines. Such devices are adept at linking patterns of current sensory input with associated information: you hear the first bars of the song and recall the rest, you see the rat’s tail and conjure the image of the rat. Computational engines of that broad class prove extremely good at tasks such as sensori-motor co-ordination, face recognition, voice recognition, etc. But they are not well-suited to deductive logic, planning, and the typical tasks of sequential reasoning. They are, roughly speaking, “Good at Frisbee, Bad at Logic” — a cognitive profile that is at once familiar and alien. Familiar, because human intelligence clearly has something of that flavour. Yet
Towards a science of the bio-technological mind
alien, because we repeatedly transcend these limits, planning family vacations, running economies, solving complex sequential problems, etc., etc. A powerful hypothesis, which I first encountered in Rumelhart, Smolensky, McClelland and Hinton (1986), is that we transcend these limits, in large part, by combining the internal operation of a connectionist, pattern-completing device with a variety of external operations and tools which serve to reduce various complex, sequential problems to an ordered set of simpler pattern-completing operations of the kind our brains are most comfortable with. Thus, to borrow the classic illustration, we may tackle the problem of long multiplication by using pen, paper and numerical symbols. We then engage in a process of external symbol manipulations and storage so as to reduce the complex problem to a sequence of simple pattern-completing steps that we already command, first multiplying 9 by 7 and storing the result on paper, then 9 by 6, and so on. The value of the use of pen, paper, and number symbols is thus that — in the words of Ed Hutchins; “[Such tools] permit the [users] to do the tasks that need to be done while doing the kinds of things people are good at: recognising patterns, modelling simple dynamics of the world, and manipulating objects in the environment.” (Hutchins, 1995, p. 155)
This description nicely captures what is best about good examples of cognitive technology: recent word-processing packages, web browsers, mouse and icon systems, etc. It also suggests, of course, what was wrong with many of our first attempts at creating such tools — the skills needed to use those environments (early VCR’s, word-processors, etc.) were precisely those that biological brains find hardest to support, such as the recall and execution of long, essentially arbitrary, sequences of operations. See Norman (1999) for further discussion. The conjecture, then, is that one large jump or discontinuity in human cognitive evolution involves the distinctive way human brains repeatedly create and exploit various species of cognitive technology so as to expand and re-shape the space of human reason. We — more than any other creature on the planet — deploy non-biological elements (instruments, media, notations) to complement our basic biological modes of processing, creating extended cognitive systems whose computational and problem-solving profiles are quite different from those of the naked brain. In this way human brains maintain an intricate cognitive dance with an ecologically novel, and immensely empowering, environment: the world of symbols, media, formalisms, texts, speech, instruments and culture. The computational circuitry of human cognition thus flows both within and beyond the head, through this extended network in ways which
29
30
Andy Clark
radically transform the space of human thought and reason. Such a point is not new, and has been well-made by a variety of theorists working in many different traditions. This brief and impressionistic sketch is not the place to delve deeply into the provenance of the idea, but some names to conjure with include Vygotsky, Bruner, Latour, Dennett, Hutchins, Norman and (to a greater or lesser extent) all those currently working on so-called ‘situated cognition’. My own work on the idea (see Clark 1997, 1998, 1999, 2001a) also owes much to a brief collaboration with David Chalmers (see our paper, ‘The Extended Mind’ in ANALYSIS 58(1), 1998, p. 7–19). I believe, however, that the idea of human cognition as subsisting in a hybrid, extended architecture (one which includes aspects of the brain and of the cognitive technological envelope in which our brains develop and operate) remains vastly underappreciated. We cannot understand what is special and distinctively powerful about human thought and reason by simply paying lip-service to the importance of the web of surrounding Cognitive Technologies. Instead, we need to understand in detail how our brains dovetail their problem-solving activities to these additional resources, and how the larger systems thus created operate, change and evolve. In addition, and perhaps more philosophically, we need to understand that the very ideas of minds and persons are not limited to the biological skin-bag, and that our sense of self, place and potential are all malleable constructs ready to expand, change or contract at surprisingly short notice.
2. Some questions The challenge, then, is to take these tempting but impressionistic ideas and to turn them into a genuine science (or sciences) of the technologically-scaffolded mind. This is new territory, and even the shape of the major problems and issues remains largely up for grabs. But some major ones look to me to include: 2.1 Origins Since no other species on the planet builds such varied, complex and constantly evolving designer environments as us, what is it that allowed this process to get off the ground in our species in such a spectacular way? Otherwise put, even if it is the designer environments that, in a familiar, boot-strapping kind of way make us what we now are, what biological difference lets us build/discover/use them in the first place?
Towards a science of the bio-technological mind
This is a serious, important and largely unresolved question. Clearly, there must be some (perhaps quite small) biological difference that lets us get our collective foot in the designer environment door — what can it be? (Contenders might include biological innovations for greater neural plasticity, and/or the extended period of protected learning called “childhood”. (See Quartz, 1999; Quartz and Sejnowski, 1997; Griffiths and Stotz, 2000.) Thus, Griffiths and Stotz argue that the long human childhood provides a unique window of opportunity in which “cultural scaffolding [can] change the dynamics of the cognitive system in a way that opens up new cognitive possibilities” (op. cit.) These authors argue against what they nicely describe as the “dualist account of human biology and human culture” according to which biological evolution must first create the “anatomically modern human” and is then followed by the long and ongoing process of cultural evolution. Such a picture, they suggest, invites us to believe in something like a basic biological human nature, gradually co-opted and obscured by the trappings and effects of culture and society. But this vision (which is perhaps not so far removed from that found in some of the more excessive versions of evolutionary psychology) is akin, they argue, to looking for the true nature of the ant by “removing the distorting influence of the nest”. Instead we humans are, by nature, products of a complex and heterogeneous developmental matrix in which culture, technology and biology are pretty well inextricably intermingled. In short it is a mistake to posit a biologically fixed “human nature” with a simple “wrap-around” of tools and culture. For the tools and culture are indeed as much determiners of our nature as products of it. 2.2 Our self-image as a species How should the recognition of ourselves as naturally bio-technological hybrids affect our views of human nature? How do these ideas fit with, or otherwise impact, accounts which emphasize ancestral environments (see, e.g., Pinker, 1997)? At the very least we must now take into account a plastic evolutionary overlay which yields a constantly moving target, an extended cognitive architecture whose constancy lies mainly in its continual openness to change. Even granting that the biological innovations which got this ball rolling may have consisted only in some small tweaks to an ancestral repertoire, the upshot of this subtle alteration was a massive leap in cognitive potential. For our cognitive machinery is now intrinsically geared to self-transformation, artifact-based expansion, and a snowballing/bootstrapping process of computational and
31
32
Andy Clark
representational growth. The machinery of human reason (the environmentally extended apparatus of our distinctively human intelligence) may thus be rooted in a biologically incremental progression while simultaneously existing on the far side of a precipitous cliff in cognitive-architectural space. 2.3 Social policy Educational policy, and social policy in general, need to be geared to our best current scientific image of the human mind. What educational and social policies best serve a race of constantly changing bio-technological hybrids? How can contemporary art help us to better understand these aspects of our nature? (Performance artists like Stelarc (www.stelarc.va.com.au) are tackling this latter issue head-on, with work in which the biological and technological merge and change places). 2.4 The mechanisms of co-adaptation The complex fit between biological brains and technological scaffolding depends on a two way give-and-take. Brains need to be plastic enough to factor the technologies deep into their problem-solving routines. And the technologies need (over cultural-evolutionary time at first, and most recently, during their own life-spans) to adapt to better fit the users. We urgently need to understand the multiple factors and forces that shape this complex dynamic. (See, e.g., Norman’s (1999) work on ‘human-centered technologies’ and Daniel Dennett’s (1995) work on the ‘Cranes of Culture’.) 2.5 Types of scaffolding The single most important task, it seems to me, is to better understand the range and variety of types of cognitive scaffolding, and the different ways in which non-biological scaffoldings can augment (or impair) performance on a task. For example, there is interesting work comparing reasoning using selfconstructed external props (e.g., a diagram you draw to help you think about a problem) and reasoning using ‘found’ or given props (the diagrams in a textbook, say). There is also work on the role of individual differences (of cognitive style, etc.) and their impact on the usefulness of specific types of external structure. (For both the above, see Cox, 1999.) And there are detailed studies of the specific properties of various kinds of prop and aid (e.g., Scaife and Rogers (1996) work
Towards a science of the bio-technological mind
on graphical representations). These bodies of work cry out for extension and unification. The Holy Grail here is a taxonomy of different types of external prop, and a systematic understanding of how they help (and hinder) human performance. Such an understanding would also have immediate implications for the design process itself (see Norman, 1999; Dourish, 2001). 2.6 Collective effects A major part of our cognitive environment is other people, and their distinctive knowledge-bases. How can new technologies help us make the most of this resource? The use of collaborative filtering techniques, in which individuals’ activities leave electronic traces that can be used to automatically generate new knowledge (e.g., the familiar Amazon prompt: people who bought such-andsuch also liked….) is one simple tool whose power is not yet fully appreciated or exploited. But the potential is vast. For some discussion, see Bonabeau and Theraulez (2000). 2.7 Frameworks and organizing principles What general principles and concepts will allow us to make systematic sense (indeed, to make a science) of the bio-technological mind? Should we (following Hutchins, 1995) think in terms of the flow and transformation, through a series of external tools and media, of representations? This is, in effect, to extend traditional cognitive scientific approaches to mind outwards. Or should we be creating new analytic tools and approaches, perhaps borrowing ideas from dynamical systems theory and the study of complex, coupled systems (see, e.g., Thelen and Smith, 1994; Kelso, 1995; and discussion in Clark, 1997). What key concepts will help make unified sense of the complex and varied roles of external scaffolding? Contenders include: Off-loading: Several writers (e.g., Dennett, 1996, pp. 134–135) stress the way cognitive technologies can be used to ‘off-load’ work from the biological brain to external arenas. Complementarity: While straightforward off-loading certainly occurs, especially with regard to shifting stuff from our limited short-term memory out into the world, it is surely only part of the story. In my own recent work (e.g., Clark, 2001b, ch. 8), the focus is more on complementarity, and on the way external stuff can be configured so as to do the kinds of thing that biological brains
33
34
Andy Clark
either don’t do at all, or do fairly badly (think of the role of the artist’s sketchpad as described above). Transformation: Yet another approach (Rumelhart et al., 1986; Clark, 1998) takes the key notion to be that of problem transformation. Here, the external aids turn the problems that need to be solved (to perform a given task) into the kinds of problems brains like ours like best (e.g., using pen and paper to transform complex arithmetical tasks into sequences of simple ones). Stabilization of Environments: Hammond et al. (1995) deploy the notion of ‘environmental stabilization’ to analyze the problem-solving activity of embodied, environmentally active artificial agents. Such agents act so as to keep the environment steady in ways that allow the easy reuse of once-successful plans and stratagems (e.g., putting pots and pans away in the same places, so as to reuse a cooking routine…). Do we need just one or all of these concepts (and are there more?)? Do they all emerge as special cases of some deeper organizing principle? And within what kinds of explanatory framework are they best deployed?
3. Conclusions The project of understanding human thought and reason is easily misconstrued. It is misconstrued as the project of understanding what is special about the human brain. No doubt there is something special about our brains. But understanding our peculiar profiles as reasoners, thinkers and knowers of our worlds requires an even broader perspective: one that targets multiple brains and bodies operating in specially constructed environments replete with artifacts, external symbols, and all the variegated scaffoldings of science, art and culture. Understanding what is distinctive about human reason thus involves understanding the complementary contributions of both biology and (broadly speaking) technology, as well as the dense, reciprocal patterns of causal and coevolutionary influence that run between them. Turning this kind of vision into a genuine science of the bio-technological mind is a massive task, calling for interdisciplinary co-operation on a whole new scale. I hope and believe that this volume will contribute to a crucial forum for that important endeavor.
Towards a science of the bio-technological mind
Note * Section 1 is an expanded version of a text which appeared as “Natural-Born Cyborgs” in John Brockman’s Reality Club (2000). It is electronically published at http://www.edge.org, and is reproduced with permission.
References Bonabeau, E. & G. Theraulaz (2000). Swarm Smarts. Scientific American 282(3), 72–79. March 2000. Chambers, D. & D. Reisberg (1989). Can Mental Images Be Ambiguous? Journal of Experimental Psychology: Human Perception and Performance II(3), 317–328. Clark, A. (1989). Microcognition: Philosophy, Cognitive Science and Parallel Distributed Processing. Cambridge, MA: MIT Press. Clark, A. (1997). Being There: Putting Brain, Body and World Together Again. Cambridge, MA: MIT Press. Clark, A. (1998). Magic Words: How Language Augments Human Computation. In J. Boucher & P. Carruthers (Eds.), Language and Thought, pp. 162–183. Cambridge: Cambridge University Press. Clark, A. (1999). An Embodied Cognitive Science? Trends In Cognitive Sciences 3(9), 345–351. Clark, A. (2001a). Reasons, Robots and The Extended Mind. Mind And Language 16(2), 121–145. Clark, A. (2001b). Mindware: An Introduction to the Philosophy of Cognitive Science. New York & Oxford: Oxford University Press. Clark, A. & D. Chalmers (1998). The Extended Mind. Analysis 58, 7–19. Cox, R. (1999). Representation, Construction, Externalised Cognition and Individual Differences. Learning and Instruction 9, 343–363. Dawkins, R. (1982). The Extended Phenotype. New York & Oxford: Oxford University Press. Dennett, D. (1995). Darwin’s Dangerous Idea. New York: Simon and Schuster. Dennett, D. (1996). Kinds of Minds. New York: Basic Books. Dourish, P. (2001). Where The Action Is. Cambridge, MA: MIT Press. Griffiths, P. E. & K. Stotz (2000). How the mind grows: A developmental perspective on the biology of cognition. Synthese. 122(1–2), 29–51. Hammond, K, T. Converse & J. Grass (1995). The Stabilization of Environments. In P. Agre & S. Rosenschein (Eds.), Computational Theories of Interaction and Agency, pp. 304–327. Cambridge, MA: MIT Press. Hutchins, E. (1995). Cognition In The Wild. Cambridge, MA: MIT Press. Kelso, S. (1995). Dynamic patterns. Cambridge, MA: MIT Press. Latour, B. (1993). We Have Never Been Modern. Cambridge, MA: Harvard University Press. Norman, D. (1999). The Invisible Computer. Cambridge, MA: MIT Press. Pinker, S. (1997). How the Mind Works. New York: Norton.
35
36
Andy Clark
Quartz, S. (1999). The Constructivist Brain. Trends In Cognitive Science. 3(2), 48–57. Quartz, S. & T. Sejnowski (1997). The Neural Basis of Cognitive Development: A Constructivist Manifesto. Behavioral and Brain Sciences. 20, 537–596. Rumelhart, D., P. Smolensky, D. McClelland & G. Hinton (1986). Schemata and Sequential Thought Processes in PDP Models. In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition 2, pp. 7–57. Cambridge, MA: MIT Press. Scaife, M. & Y. Rogers (1996). External Cognition: How Do Graphical Representations Work? International Journal of Human-Computer Studies. 45, 185–213. Thelen, E. & L. Smith (1994). A Dynamic Systems Approach to the Development of Cognition and Action. Cambridge, MA: MIT Press. Van Leeuwen, C. I. Verstijnen & P. Hekkert (1999). Common Unconscious Dynamics Underlie Common Conscious Effects: A Case Study in the Iterative Nature of Perception and Creation. In J. Scott Jordan (Ed.), Modeling Consciousness Across the Disciplines, pp. 179–219. Lanham, MD: University Press of America.
Language as a cognitive technology* Marcelo Dascal Tel Aviv University
1.
Introduction
In our time we live surrounded by objects and devices we have created that are able to perform complex tasks, whose performance used to demand much concentration and mental effort by us. One can say that we have managed to transfer to these objects and devices some of the capacities that were considered — until not long ago — typical and exclusive to the human intellect. In this sense, we have created “cognitive artifacts” (Hutchins, 1999) or, more generally, “cognitive technologies” (Gorayska and Mey, 1996). These inventions save a considerable portion of our intellectual power, setting it free for the performance of higher cognitive tasks — those for which we have not yet been — and, as believed by some (e.g., Dreyfus, 1971, 1992), will never be — able to invent mechanical, computational or other kinds of ersatz. Behind every technology created by humankind — be it the wheel, agriculture, or the cellular telephone — there is, of course, a lot of cognitive effort. But this does not make it, as such, a cognitive technology in the sense in which I propose to use this expression. What I have in mind are the main uses of a technology, rather than the process of its creation or its effects. Primarily, the wheel serves for transportation; agriculture, for food production; the cellular phone, for communication. Secondarily, these technologies can also be useful for cognition: transportation allows us to participate in scientific conferences where we are supposed to learn something; nourishing ourselves gives us also mental energy; the cellular phone can occasionally serve to communicate cognitive content. Cognitive technologies, however, are those that either have been designed for cognitive uses or else have been appropriated for such uses. They can of course retain their original uses and have non-cognitive effects such as the production of jobs, the waging of war, or space travel. By ‘cognitive technology’ (CT) I mean, thus, every systematic means — material or mental — created by humans that is significantly and routinely used
38
Marcelo Dascal
for the performance of cognitive aims.1 By ‘cognitive aims’ I mean either mental states of a cognitive nature (e.g., knowledge, opinion, belief, intention, expectation, decision, plan of action) or cognitive processes (e.g., perception, memorization, conceptualization, classification, learning, anticipation, the formulation of hypotheses, demonstration, deliberation, evaluation, criticism, persuasion, discovery) that lead to cognitive states or help to reach them.2 Natural languages (NL), unlike formal languages created for specific purposes, can hardly be considered — as such — as prototypical ‘artifacts’, for they have not been purposefully ‘designed’. Yet they evolved — genetically and culturally — in view of certain human needs, and some of their features may have been appropriated (deliberately or not) in order to satisfy needs other than those whose pressure caused them to emerge in the first place. In-so-far as such needs are ‘cognitive’, it seems to me appropriate to view the corresponding features of natural languages and their use as ‘cognitive technologies’. Researchers and developers in many fields of technology have been increasingly interested in natural languages. In an ad published in Time magazine a few years ago, the Daimler-Benz corporation asks, “How will man and machine work together in the future?”, and replies: “With the spoken word”. The ad reveals that one of the corporation’s divisions “is currently researching and developing sophisticated new word-based systems”, which will bring to life the Biblically sounding prophecy: “Man will talk to Machine and Machine will respond — for the benefit of Man”. Language-based technology also occupies a central position in MIT’s multimillion “Oxygen” project, which is heralded as the next revolution in information technology — comparable to those achieved by the introduction of the computer and the internet. Oxygen will “address people’s inherent need to communicate naturally: we are not born with keybord and mouse sockets but rather with mouths, ears and eyes. In Oxygen, speech understanding is built-in — and all parts of the system and all the applications will use speech” (Dertouzos, 1999, p. 39). The intense current interest in NL-based technologies, however, is almost entirely focused on one of its functions — human-machine communication. Thus, in spite of its central position in the Oxygen system, the designers seem to confine NL to its communicative role in the human-computer interface.3 For example, the system is intended to be able to satisfy “people’s need to find useful information” by being able to understand and respond adequately to a user who simply says “Get me the big red document that came a month ago” (Dertouzos, 1999, p. 39). This would be no doubt a great communicative achievement, for it would require endowing machines with advanced syntactic,
Language as a cognitive technology
semantic, as well as pragmatic processing abilities. But no less important is to realize how fundamental for satisfying the cognitive “need to find useful information” is to fully use the syntactic, semantic and pragmatic potential characteristic of NL — which is in fact what humans do in order to find useful information. The Daimler-Benz ad seems to come close to realizing the cognitive potential of NL, for its heading declares: “Language is the picture and counterpart of thought”. Nevertheless, the research it announces is primarily concerned with the interface issue: “And the machines will understand the words and respond. They will weld, or drive screws, or paint, or write — they will even understand different languages”. Such a focus on the use of spoken language as the upcoming predominant mode of human-machine interface orients research toward important issues such as auditory pattern recognition and automated voice production, identification of eventual sources of misunderstanding, elimination of artificial constraints in communication (naturalness), effortlessness (no need of special learning), comfort (user-friendliness), ubiquity (widespread use of miniaturized devices, with minimal energy requirements, literacy requirements, etc.), security (through voice recognition), and, more recently, human-computer conversation in specific environments, the incorporation of non-verbal channels in human-computer communication, and social agent technologies.4 Most of these applications rely upon sophisticated cognitive means, namely those that subtend the communicative use of language. But they are not, as such, “cognitive technologies” in the sense defined above. These technologies are concerned with the role of NL in the cognitive processes themselves, regardless of whether and how they may be communicated to other humans or machines. It is to this cognitive use of NL, so far overlooked by researchers and developers of new technologies, that the approach here proposed wants to call attention. In my opinion, until this potential of NL is tapped, the truly revolutionary effect of the technological appropriation of NL will not be achieved. In terms of the above definition of CT, the question of whether certain aspects of NL are properly viewed as CT’s is independent of the current or prospective state of technological development. In other words, this question is independent of the question whether a presumed CT aspect of language can be implemented by computational or other devices, enhanced by such devices, or used in interfaces with them. These questions depend upon the design and development of artifacts capable of simulating, enhancing or making use of the cognitive-technological features of NL, rather than upon the existence and use of such features themselves. Of course, the better we understand the nature of
39
40
Marcelo Dascal
NL’s cognitive-technological functions, the better we will be in a position to develop the corresponding artifacts. We may eventually reach the conclusion that not all of these functions can be satisfactorily emulated by such artifacts. In this sense, the approach proposed here might provide some relevant input to the issue of “what computers can’t do”. I believe this approach will also be valuable for a better understanding of why a proper handling of the cognitive uses of language is crucial for the development of other, not necessarily linguistic, more efficient cognitive technologies, for the design of ‘humane’ interfaces, and — more generally — for the epistemology and philosophical anthropology of the “digital culture”. However, since my primary concern here is to show how several aspects of language and language use can be fruitfully conceptualized as cognitive technologies, the exploration of these further implications of such a conceptual shift will have to be left for another occasion. In the next section, I present a number of parameters in terms of which a typology of cognitive technologies in general can be outlined. Section 3 summarizes the main antagonistic positions in the traditional debate about the primacy of thought over language or language over thought and proposes to re-frame this debate with the help of the notion of cognitive technology. Section 4 analyzes some examples of linguistic structures and language use as possible candidates of language-based cognitive technologies. In the Conclusion (Section 5), I point out some of the gains to be derived from viewing language as a cognitive-technological resource.
2. Towards a typology of cognitive technologies In addition to their being directed at either cognitive states or processes — as pointed out in the Introduction — it is convenient to distinguish cognitive technologies according to other significant parameters. 2.1 ‘Strong’ and ‘weak’ cognitive technologies Mental states can be distinguished (from each other) according to their ‘modal’ qualifications. For instance, an epistemic state can be certain or probable, intuitive or explicit, definitive or hypothetical, justified or accepted without justification, etc. Cognitive processes, in turn, can be oriented towards reaching mental states endowed with any of these modalities. A logical or mathematical demonstration, for instance, leads to an epistemic state of definitive certainty, whereas argumentation or deliberation may lead at most to a doxastic state of
Language as a cognitive technology
holding a belief which, although well grounded, is only provisional.5 Cognitive technologies vary according to the modal aims of their designers. When they choose those modalities one could call ‘strong’ (certainty, irrevocability of the conclusions, etc.), they usually seek to endow the proposed technology with an algorithmic decision procedure which is entirely error-free and therefore irrevocable in its results. When they content themselves with ‘weak’ modalities, they can employ less rigid algorithms (e.g., non-monotonic or probabilistic logics), which do not ensure that the results cannot be called into question and modified. 2.2 ‘Integral’ and ‘partial’ cognitive technologies ‘Integral’ technologies are those that provide for the full execution of a given cognitive aim, without requiring any human intervention. ‘Partial’ technologies are those that provide only ‘helps’ for the performance of a given cognitive aim. These helps make it easier for a human agent to perform the task, but cannot dispense with his or her intervention. Often the designers’ ambitions lead them to propose maximalist projects of the first kind; but quite often, when they realize the enormous difficulties involved, they are likely to be less ambitious and satisfy themselves with partial technologies. The failure of the projects of full ‘mechanical translation’ in the 50’s and 60’s thus led — not without first wasting hundreds of millions of dollars — to the more modest current projects, in which the technologies developed merely suggest to the human translator alternative translations. The translator not only has to choose the most appropriate one; s/he must also edit it quite heavily in order to finally produce an acceptable text.6 This ‘evolutionary’ pattern from more to less ambitious technologies for a certain cognitive aim is quite frequent. However, maximalist ambitions tend to reappear whenever new technological and scientific developments make the conditions seem ripe for achieving ‘integral’ aims. 2.3 ‘Complete’ and ‘incomplete’ cognitive technologies One should further distinguish between the pragmatic notion of an ‘integral’ technology in the above sense and the notion of a syntactically and/or semantically ‘complete’ technology. The latter has to do with the ability of a technology to ‘cover’ completely a given domain or ensemble of ‘objects’ with respect to some desired property. For instance, if one creates an ‘alphabet of traffic signs’ in order to express through the combinations of its signs all the instructions to
41
42
Marcelo Dascal
be given to drivers, and if the alphabetical system in question has no means to express one of these instructions, it is incomplete. It may be incomplete either due to the insufficiency of its formation rules or due to that of its transformation rules.7 In so-called non-standard logics, degrees of completeness are distinguished, so that one can talk of ‘weakly complete’ systems, ‘very weakly complete’ systems, and so on.8 2.4 ‘Constitutive’ and ‘non-constitutive’ cognitive technologies Some technologies are ‘constitutive’ of certain cognitive states or processes, whereas others are not. The former are such that without them certain cognitive operations cannot be performed. The latter, although extremely useful for the facilitation of the achievement of certain cognitive aims, are not a sine qua non for that. An example of the first kind could be the alleged necessity of supercomputers in order to generate very large numbers so as to be able to decide whether numbers endowed with certain arithmetical properties exist; or, more generally, the alleged necessary reliance on computational technologies in order to prove certain theorems. An example of the second kind is the dramatic increment in the efficacy of many of our cognitive performances thanks to computers, in spite of the fact that the latter have not (so far) become indispensable for the former. It is not easy to discern whether a given technology is constitutive or not. The endless debate about whether language is a necessary condition for thought, discussed in Section 3, illustrates such a difficulty. 2.5 ‘External’ and ‘internal’ cognitive technologies Cognitive technologies can be ‘external’ or ‘internal’. The former consist in physical devices or processes that are instrumental in achieving cognitive aims. The most ubiquitous example today of this kind of technology is the computer. But it is not the only one. Its predecessor the abacus, as well as paper and pencil, graphs and diagrams, and even the book can be included in this category. Discussions of cognitive technologies usually focus on such ‘external’ or ‘prosthetic’ — as they are often called — technologies. The significance of ‘internal’ technologies should not be overlooked, however. ‘Internal’ cognitive technologies are mental procedures thanks to which we can improve our cognitive activity. In this category one should include, for instance, mnemonic techniques that improve our capacity of storage of and access to information, formal methods of reasoning that permit to draw correct
Language as a cognitive technology
conclusions from given premises, definitions that clarify and fixate the meaning of concepts, and so on. Underlying these mental technologies there are — according to current belief — cerebral physical processes. But so far the mental has not yet been successfully reduced to its underlying neural layer. What characterizes the ‘internal’ technologies, even in cases where they employ external devices, is the fact that they are part and parcel of the cognitive processes themselves at the mental level, rather than attempts to reduce to or replace such processes by devices or processes operating at another level.
3. Language and thought: Re-framing the classical debate Language has been — and still is — conceived as having as its primary function communication. In this respect, it serves to convey thought or other forms of cognitive content, but need not play any role in the formation of the thoughts it conveys. Descartes, who considered the ability to use language appropriately in any context to be a distinctive trait of humans (as opposed to animals and machines) and insisted that this ability shows that humans have minds, categorically ruled out the possibility that language may be constitutive of thought processes such as reasoning, as suggested by Hobbes. In the same vein, Turing (1950) considered that success in linguistic communication is a test for determining the presence of intelligence in a machine, but did not claim that this would also show that intelligence consists in the ability to manipulate linguistic symbols.9 Both Descartes and Turing assumed that the capacity to use language appropriately for communication requires high cognitive abilities, and therefore can testify to the existence of “mind” or “intellect”. To argue that language itself has a crucial function in cognition would not only violate Descartes’s mindbody dualism (which perhaps wouldn’t bother Turing), but would also seem to involve an egg-chicken circularity that would certainly bother Turing, as it bothered many others earlier.10 As opposed to such a view of the relationship between language and mind as purely ‘external’, the former having only an indicative role vis-a-vis the latter, other thinkers have argued that language is much more intimately connected with mental life. For such thinkers, language plays an essential role in cognition. They argue that it is constitutive of fundamental cognitive processes (Hobbes), necessary for their performance (Leibniz), responsible for their historical emergence (Condillac), determinant of their structure and content (Whorf), required for their explanation (Sellars), the behavioral substrate of thinking and
43
44
Marcelo Dascal
other mental processes (Watson), an essential component of the social, cultural or ontological context where thought and other aspects of mental life take place (Vygotsky, Mead, Geertz, Heidegger), and so on. The centuries-old debate on the nature of the relationship between language and thought was mesmerized by these polar positions regarding which one of them is, in some sense, “dependent” upon the other.11 Under close scrutiny, however, both sides in the debate acknowledge the existence of language-thought interactions that do not fit the sweeping versions of their claims. For example, avowed “externalists” like Bacon and Locke, undertake to criticize language as a dangerous source of cognitive mistakes and suggest methods (which gave rise to the attempt to elaborate “scientific” languages) to avoid such a danger. Yet, in so doing, they in fact admit that thought is not impervious to the influence of language. On the other side of the fence, Leibniz, who argued forcefully for the view that language and other semiotic systems are indispensable for even moderately complex forms of cognition, acknowledged the non-linguistic character of certain kinds of knowledge, such as “intuitive” and “clear but not distinct” knowledge. As in many other debates in the history of ideas, the tendency to focus on mutually exclusive dichotomous positions renders them insoluble and to some extent sterile. I have suggested elsewhere that, instead of focusing exclusively on the “primacy” or “dependency” issue when discussing the relationship between language and thought, it might be more useful to envisage the details of how language is actually used in mental processes.12 The application to language of the notion of cognitive technology as defined above provides, I submit, a fruitful way of further exploring this suggestion.
4. Language as environment, resource and tool of cognition Language’s presence in human life is overwhelming. Poets excelled in evoking the subtle ways in which words penetrate every corner of our mind,13 and — as we have seen in the preceding section — some thinkers have seen in language an essential and inevitable component of mental processes. This fact is not necessarily “good” or “useful”, if evaluated from the point of view of specific cognitive and practical aims. Hence the recurrent attempts to identify those aspects of language that are deemed to be “pernicious” and to propose a variety of “therapies” to filter them out. However justified it may be, such a critique testifies to the importance of language’s influence on cognition.
Language as a cognitive technology
Without going as far as Heidegger, who claimed that language is “the house of being”, I would say it is certainly a major component of the context of thinking.14 Without going as far as Geertz (1973, p. 83), who claimed that language, being one of our key “cultural resources”, is “ingredient, not accessory, to human thought”, I would rather emphasize that it is a ready-at-hand resource that thinking can easily make use of. Without suggesting, as does Watson, that thinking is nothing but sub-vocal speech,15 I would claim that certain linguistic resources do become sharp cognitive tools that afford the emergence and performance of certain types of cognition. The label “cognitive technology” is, of course, more straightforwardly applicable to those aspects of language that were shaped into cognitive tools, both because of their specific cognitive function and because they comport an element of “design”. But one should not overlook the fact that such tools emerge from a background where language’s potential and actual role as a cognitive environment and resource is unquestionable. In fact, the relationship between these three levels is dynamic and multi-directional. Just as “environmental” properties of language (e.g., sequential ordering) can give rise to resources (e.g., narrative structure) and thence to tools (e.g., explanatory strategies), so too a tool (e.g., a successful metaphor created in order to understand a new concept) can become a resource (a frozen metaphor) and then recede into the “environmental” background (e.g., by becoming incorporated into the semantic system as a lexical polysemy). 4.1 Language as environment As an environment of thought, language, through its sheer overwhelming presence in the mind, influences cognition independently of our awareness or will. Perhaps the most important of the environmental contributions of language to cognition derives from its general structural properties. Languages are articulated systems; linguists describe them as consisting in a “double articulation” comprising two different sets of basic units and principles of organization: units of meaning (lexemes or morphemes) and, say, units of sound (phonemes).16 These units, in turn, can be combined and recursively recombined in rule-governed ways at different levels — a mechanism that accounts for language’s impressive “generative” potential. This elaborate analytic-combinatorial system provides a natural model for other cognitive activities where the segmentation of complex wholes into parts, the “abstraction” of recurrent units and patterns from their actual context of use, and their use in any number of
45
46
Marcelo Dascal
other contexts are performed. The application of such a model to cognitive needs other than strictly linguistic ones need not be deliberately undertaken, but the fact that we are familiar with it and master it perfectly in our daily language use certainly grants it a privileged position in our practice and conceptualization of how cognition in general does and should proceed. No wonder that Descartes, Leibniz, Locke and many other thinkers used this analytic-combinatorial model as the core of their epistemology, and that Condillac considered the availability of language as a sine qua non for moving from “sensation” to the higher level of cognitive ability he calls “analysis”, without which humans would not be able to generate distinguishable and recurrently usable “ideas”. Another fundamental influence of the linguistic environment on cognition derives from the fact that language is a rule-based system. The power of the notion of “rule” is apparent in the attraction it exerts on a child’s mind, as soon as the child gives up its “egotistic” privilege of creating its own communicative symbols and submits to the socially imposed linguistic rules. Not only does the child attempt to impose absolute exception-free regularity on the linguistic rules themselves through the well-attested phenomenon of “over-generalization” (e.g., by “regularizing” irregular verbs: “eated” instead of “ate”, “shaked” instead of “shook”). It also projects this strict rule model onto other activities such as games where no violations of the rules are tolerated. The strong appeal of the “computation model of the mind”, as well as of its earlier counterpart, the mind-machine analogy, may ultimately derive from our familiarity with the machine-like rules of grammar.17 The sequential organization of speech — another structural characteristic of language — imposes upon oral communication a linear and unidirectional pattern. This pattern is imitated in cognitive processes, even when they are not communication-oriented. As a result, an ante-post, step by step ordering of thoughts acquires a privileged canonical status, where what comes “first” is assumed to be, in some sense, cognitively “prior” to what comes “after”. Such a priority may be interpreted as logical, epistemological, causal, psychological, or chronological, but the pattern is the same, and tends to be viewed as an indication that a cognitive process that follows it is “rational”. Obviously, this pattern does not fit all cognitive processes, some of which (e.g., associative thought) display rather a net-like structure.18 Speech permits deviations from linear and unidirectional thematic order (e.g., digressions, flashbacks), and writing and electronic text-processing provide further means for so doing (e.g., footnotes, hypertext). But the fact that such deviations are perceived as “exceptions” to the
Language as a cognitive technology
standard linear pattern implies that both linguistically and cognitively they must be sparingly used and their use needs to be especially justified. In this sense, the environmental influence of the linguistic sequential model obstructs, rather than helps, the performance of certain cognitive processes.19 The analytic-combinatorial, rule-based, and sequential models are not, however, the only ones the linguistic environment provides for cognitive imitation. Adam Smith observed that modern “analytic” languages stand to ancient “synthetic” languages, as far as their simplicity and systematicity is concerned, as early machines, which are “extremely complex in their principles”, stand to more advanced ones, which produce their effects “with fewer wheels and fewer principles of motion”. Similarly, “in language every case of every noun, and every tense of every verb, was originally expressed by a particular distinct word, which served for this purpose and for no other. But succeeding observations discovered, that one set of words was capable of supplying the place of all that infinite number, and that four or five prepositions, and half a dozen auxiliary verbs, were capable of answering the end of all the declensions, and of all the conjugations in the ancient languages” (Smith 1761, pp. 248–249). But he immediately made clear that the language-machine analogy breaks down as soon as one goes beyond grammar: “The simplification of machines renders them more and more perfect, but this simplification of the rudiments of languages renders them more and more imperfect, and less proper for many of the purposes of language” (ibid.). Smith had in mind the expressive needs that language must provide for, such as eloquence and beauty or, more generally, its ability to express not only the “thought but also the spirit and the mind of the author” (Smith 1983, p. 19). It is for such purposes that the simplified machinery of “analytic” languages is inadequate due to their inherent “prolixness, constraint, and monotony” (Smith 1761, p. 251). Since even a paradigmatic analytic language such as English obviously overcomes these inadequacies and provides for the expressive needs mentioned (didn’t Shakespeare write in English?), it must do so — if we follow Smith’s argument — by evolving some “non-mechanical” means that compensate for its “mechanical” limitations. Smith’s point can be generalized. First, the expressive needs for which more than the rules of grammar are needed comprise not only lofty literary-rhetorical ideals, but also down-to-earth everyday communicative needs. Second, there is no difference between analytic and synthetic languages in this respect; in fact, no known natural language can dispense with additional “wheels” and “principles of motion”, other than the syntactic and semantic ones, in order to fulfill its expressive and communicative duties. Such additions to the basic linguistic
47
48
Marcelo Dascal
system range from syntactic rules that permit to “transform” or adjust the output of the basic syntactic rules without substantial meaning change to devices that allow one to say one thing and mean another. The former can be compared to the addition of epicycles to the Ptolemaic system in order to cope with the observed phenomena, without modifying its methodological assumption about the kinds of “wheels” and “principles” that are supposed to account for the machine’s “competence”.20 The latter, studied mainly by pragmatics and rhetoric, are generally believed to obey different kinds of “rules” — of an heuristic, rather than an algorithmic nature.21 A particularly significant feature of the pragma-rhetorical component of a linguistic system is that it sometimes achieves its aims by resorting to explicit violations of the system’s rules — be they the algorithmic ones (as in metaphor, puns, and nonsense poetry) or the heuristic ones (as in conversational “implicatures”). A rule-based system that employs different kinds of rules and does not rule out, but rather permits and even exploits the violation of its own rules, is extremely valuable from a cognitive point of view. For it provides a living and effective model for many important cognitive processes that are open-ended, flexible, creative, and yet not aleatory. It also shows that there is an alternative to treating — as virtually all LN interfaces and applications to date do — rule violations as mistakes to be corrected (sometimes automatically, thereby irritating users). Apart from its generic influence, the linguistic environment can have quite specific effects upon cognition, which should not be overlooked. An interesting case is the presumed role of language in causing deviations from logically valid forms of reasoning. For example, there is evidence that the evaluation of invalid syllogisms as valid has to do with an “atmosphere” effect, produced by the particular linguistic formulation of the premises. Thus, syllogisms whose premises were both affirmative and universal tended to be viewed as having also an affirmative and universal conclusion, irrespective of whether the disposition of the subject and predicate terms in the premises logically warranted such a conclusion (Woodworth and Sells, 1935; Evans, 1982, pp. 89–90). Similarly, a robust finding in studies of conditional reasoning using Wason’s well-known “selection task” is the linguistically-driven “matching bias”. The subjects in this task are given a conditional sentence referring to a set of four cards laid down on a table. Each card has a letter on one side and a number on the other. The subjects are asked to determine which cards they would have to turn in order to tell whether the sentence is true or false. The matching bias consists in the fact that subjects tend to pick up those cards that match those named in the
Language as a cognitive technology
sentence, regardless of whether they verify or falsify it.22 Further, allegedly pernicious, specific examples of linguistic influence on cognition will be mentioned in the next section. 4.2 Language as resource Under this rubric I include those aspects of language that are regularly and, for the most part, consciously put to use for cognitive purposes, with minimal elaboration. They deserve to be considered “technologies” in-so-far as the choice of a particular linguistic feature stands in a means-end relationship with the cognitive purpose in view. An example of a linguistic resource widely employed for an extremely important cognitive purpose is the use of words for gathering, organizing, storing, and retrieving information. This has been done for so long that it is taken for granted and we are unaware of its linguistic-cognitive underpinnings as well as of the fact that in its current uses — be it in printed indices or in computerized search engines — its potential is far from being fully exploited. For, whereas the value of words for tracing relevant information lies in their semantic content, most applications make use only of their graphic form in order to locate matching graphic strings that are assumed to lead to semantically relevant material. The cognitive burden to sort out truly relevant information is for the most part left to the user. Few systems make use of the truly relevant linguistic resource for information storage and retrieval, the resource humans naturally and effortlessly use, namely the rich semantic structure of natural languages.23 The semantic network of language is based on a set of semantic relations that connect expressions in a variety of ways — as synonyms, near-synonyms, paraphrases, analytic, super-ordinate, subordinate, belonging or not to a semantic field, antonyms, contrary, contradictory, etc. By structuring the “mental lexicon”, such a network is an inescapable resource the mind constantly resorts to in most of its cognitive operations, which rely upon conceptual similarities and differences. The semantic network also comprises information — such as connotations, prototypes, and factual information — that belongs rather to the “mental encyclopedia” but, being standard, widely known and normally activated in the understanding of linguistic expressions, counts as “semantically relevant”.24 This extension makes the network an even more useful and constantly used cognitive resource, only minimally exploited to date by technologies that make use of thesauri.
49
50
Marcelo Dascal
The possibility of precision afforded by natural languages should not make us overlook the wide variety of syntactic, semantic and pragmatic means they have for expressing indeterminacy — an umbrella term here used to refer to phenomena such as indefiniteness, ambiguity, polysemy, unspecificity, imprecision and vagueness. Although considered a hindrance from the point of view of certain cognitive needs, such linguistic means are, from the point of view of other cognitive needs, an asset. For instance, they are an invaluable — perhaps indispensable — resource for the cognitive processes that begin with a foggy initial intuition which they undertake to clarify in a stepwise way, or vice-versa, for those processes that seek to sum up the gist of a theory, an argument, or a story. They are also essential for conceptualizing those situations in which the mind hesitates between alternatives, none of which seem to fall clearly into welldefined categories. While we often wish everything could be clearly classified as either black or white, good or bad, true or false, we often stumble at borderline cases, which force our mind to abandon dichotomous thought and rather think in terms of gradual, continuous, and vague concepts (Gullvåg and Naess, 1995). Language also provides its users with a repertoire of ready-at-hand, more or less conventionalized patterns that can be put to use not only communicatively but also cognitively. This repertoire ranges from phrases and sentences to fullfledged discursive structures. It includes, among other things, formulaic expressions, conventional metaphors, proverbs, topoi,25 argumentative formulae, dialogue patterns, and literary forms.26 These resources are ready-at-hand for organizing thought. The existence of argumentative canonical formulae structured by prepositions and adverbs such as if … then, but, either…or, therefore provides “directives” to the reasoner, which allow her to complete what is missing, to determine if something in her reasoning is irrelevant or redundant, etc. So too the current canonical form of the “scientific article”, say, in psychology, provides guidelines not only for the presentation of the author’s results, but also for the way in which his mental and practical steps leading to such results should be executed.27 Before concluding this sample of language-based cognitive resources, I want to mention a number of related linguistic devices that, I believe, are extremely important for cognition. Consider the parenthetical ‘I believe’ I have employed in the preceding sentence. Its position could have been occupied, instead, by ‘I know’, ‘I am sure’, ‘I have no doubt’, ‘I hypothesize’, ‘I submit’, ‘I argue’, ‘I contend’ or, with slight syntactical modifications, by ‘I wonder’, ‘I doubt’, ‘allegedly’, etc. Some of these expressions express propositional attitudes; others, the illocutionary force of speech acts. Both act as operators on propositional
Language as a cognitive technology
contents, which reflect the variety of different degrees of commitment, epistemic status, intentions, etc. with which the mind may relate to such contents. They thus belong to a family of expressions which perform a distinction between two layers of “content” — the one referring to or modulating the other. The most familiar linguistic devices of this kind are metalinguistic operators such as quotation, thanks to which natural languages can act as their own metalanguage. As a whole, these linguistic resources correspond to and reveal the inherent reflexivity of human mental processes, i.e., the fact that cognition is conscious of itself, and therefore involves “metacognition”.28 It seems to me that this is not a one-way road, leading from metacognition to its linguistic expression, but at least a two-way road, in which the existence of metalinguistic resources should also be credited with the enhancement of metacognitive awareness and its development. The mechanism of joint attention (towards perceptual objects, towards each other in an interaction), for example, which is a necessary ingredient of intentional communication (Brinck, 2001), involves the recognition of the other’s attentional state, as well as awareness of one’s own. Similarly, the mother’s attribution of intentions to the infant has been suggested to play a decisive role in the infant’s development of her self-perception as an intentional agent (De Gelder, 1981; discussed in Dascal, 1983, pp. 99ff.). Let us now turn to the alleged negative effects of language as a resource. The careless cognitive use of linguistic resources has been blamed, throughout the centuries, for inducing all sorts of cognitive mistakes. The indiscriminate use of linguistically productive (and legitimate) patterns of word generation (e.g., white Æ whiteness) has been held responsible for yielding in fact vacuous terms (e.g., nothingness) which are the source of conceptual confusion and pointless dispute (Locke). The existence in language of general terms was blamed for inducing the false belief that there are general ideas and general objects (Berkeley). Natural language categorization, based on “vulgar” knowledge, was considered to be the most dangerous of the “idols” that threaten scientific thinking (Bacon). Vagueness was considered incompatible with logic and therefore utterly unreliable (Russell). Reliance on grammatical analogy was blamed for causing “category mistakes” and “systematically misleading” the understanding (Ryle). A list of “pseudo-problems” in which generations of metaphysicians were entangled was added to language’s long list of cognitive deficits (Carnap). Uncritical linguistic practice was singled out as the most dangerous cause — whether deliberate or not — of cognitive distortion, manipulation, and ultimately “un-sanity” (Count Korzybski and the General
51
52
Marcelo Dascal
Semantics movement).29 And so on. This small sample of criticism certainly shows that language’s influence on cognition can be indeed pernicious. But it also highlights the extent and variety of this influence. The lesson to be drawn, as in many other cases, is simply that we must be aware of this variety, extent, and sometimes insidious nature, so as to be able to rely upon the linguistic environment of thought only judiciously. 4.3 Language as tool A language-based cognitive technology can be viewed as a tool when it is the result of the engineering of linguistic resources for a specific cognitive task. Let us consider some examples. The linguistic resource of explaining the meaning of one term by correlating it with a string of other terms that “define” the former has been sharpened into the powerful cognitive tool of formal definition. This tool permits the creation of special terminology (new terms for new concepts, or redefinition of existing terms) or of new notational systems.30 Usually the model of definition adopted in these cases is the “classical” one, i.e., the specification of necessary and sufficient conditions. But natural language semantics also provides other models of capturing concepts, e.g., in terms of similarity to a prototypical member of the denoted class or in terms of clusters of properties which are hierarchically organized in terms of their centrality or weight, although none of them is per se necessary or sufficient. Such “non-classical” models are characteristic of so-called “natural kind” terms (Achinstein, 1968). The elaboration of each of these kinds of “definition” yields different types of linguistic tools or technologies that are fit for different cognitive purposes. The various forms of indeterminacy available in natural languages can be shaped into cognitive tools. For example, the linguistic possibility of generating scales of quantifiers, making them as subtle as desired (e.g., everyone, virtually everyone, almost everyone, most of the people, the majority of the people, some people, nearly nobody, virtually nobody, nobody, etc.) can give rise to rigorous systems of quantification other than the standard one. The same is true of linguistic tense systems that can be elaborated into a variety of temporal logics. And vagueness has been elaborated semantically into “fuzzy logic”, that permits to reason rigorously with vague concepts (Zadeh, 1975; see also Black, 1963), as well as pragmatically, into a dynamic interpretive tool for gradually increasing precision until what appears to be an agreement or disagreement is shown to be in fact a pseudo-agreement or a pseudo-disagreement (Naess, 1966).
Language as a cognitive technology
Formulaic expressions can become powerful cognitive tools. A remarkable example, analyzed by Reviel Netz (1999), is the role of linguistic formulae in ancient Greek mathematics. Netz shows that, in contrast to deduction in modern mathematics, where one resorts to typographic symbols, thus opting for exploiting a visual resource, Greek mathematics made use of formulaic expressions of a linguistic resource presumably of oral origin. He analyzes in detail Book II of Euclid’s Elements, identifying and sorting out the 71 such formulaic expressions, i.e., highly repetitive and standardized phrases and sentences, which make up for most of the text. He argues that deduction as a cognitive tool may have been made possible due to the systematic use of such formulae: “The constant re-shuffling of objects in substitutions may be securely followed, since it is no more than the re-fitting of well-known verbal elements into well-known verbal structures. It is a game of decomposition and recomposition of phrases, indeed similar to the jigsaw puzzle of putting one heroic epithet after another, which is to a certain extent what epic [Homeric] singers did” (Netz 1999, p. 161). If we jump from mathematics to religion, we may find in the Hindu mantra a similar phenomenon. The nature and role of mantras is quite controversial, as is patent from the papers in Alper’s (Ed., 1989) collection. Some scholars even doubt their linguistic nature, and most view them as belonging to the religious ritual, where — according to some — they are akin to prayer. Nevertheless, there is no doubt that, at least in some of their variants, they are self-addressed linguistic or quasi-linguistic tools whose main purpose is to play a definite role in their user’s mental life. This is the case, for instance, in Yogi meditation; and a classical text presumably of the 3rd or 4th century, the Arthas´a¯ stra, goes as far as attributing to it impressive intellectual effects: “a mantra accomplishes the apprehension of what is not or cannot be seen, imparts the strength of a definite conclusion to what is apprehended, removes doubt when two courses are possible [and] leads to inference of an entire matter when only a part is seen” (quoted by Alper 1989, p. 2). Literary resources can also develop into cognitive tools par excellence. Tsur and Benari (2001) have shown how a specific poetic device — ‘composition of place’ — employed in meditative poetry, is designed to overcome the linear and conceptual character of language so as to convey “such non-conceptual experiences as meditation, ecstasy or mystic insights” and thus to “express the ‘ineffable” (Tsur and Benari, 2001, p. 231). In the same vein, Zamir (2002b) shows, through a close reading of Hamlet, how literature is able to create awareness of “ineffable” content.31 In both cases, I would suggest, literary tools not only express or induce certain mental states, but also in a sense create the
53
54
Marcelo Dascal
conditions for the very existence of these states in the first place. As a last example of a linguistic resource that gives rise to a cognitive tool, I would like to mention the dialectical use of dialogue structures. Ever since Plato, philosophers developed what can be seen as a genre — the “philosophical dialogue” — in order to expound their ideas. Its formal structure, however, carries with it cognitive requirements quite different from other forms of exposition. To expound one’s ideas for a specific interlocutor and to defend them against her specific objections — even if both the character and objections are fictional creations of the writer — requires techniques of persuasion, argumentation and justification other than those used in a linear text that addresses a generalized, non-present, and unknown reader. Other dialectical techniques developed independently, in oral rather than in written form. In the Middle Ages, codified forms of debate such as the disputatio and the obligatio evolved and success in them became part of the academic requirements to obtain a university degree. But the cognitive implications of these practices transcended both pedagogical needs and the Middle Ages. For the basic idea that a rational debate should obey a set of principles that define the duties of the “defendant” and the “opponent”, the types of moves they are allowed to perform, and what will count as a “victory”, remains in force in fact up to this day — even though the contents of such principles have changed. There is no space here to trace the development of dialectical techniques, which involves an interesting interplay between logic and rhetoric, culminating with “dialogical logic” on the one hand and “the new rhetoric” on the other. What is important to realize, for the purposes of this paper, is that the ensemble of techniques thus developed transcends pedagogical, expository, or communicative ends, for it becomes a powerful tool for actually implementing the idea that at the core of rationality lies the practice of critical thought.32 In this sense, a system of “electronic argumentation” should be designed not only to improve one’s ability to express oneself (Sillince, 1996), but also as a tool to improve one’s ability to think rationally.
5. Concluding remarks In this chapter I have proposed to look at language not only as a communicative interface between cognitive agents, but as a technology involved in cognition itself. I surveyed instances of how language functions as an environment, a resource, and a tool of cognition. Some of these examples are more easily
Language as a cognitive technology
acknowledged as “cognitive technologies” than others, but all of them share the main characteristics I have attributed to this notion. They contribute systematically and directly to cognitive processes and to the achievement of cognitive aims. And all of them are clearly language-based. In terms of the parameters presented in Section 2, most of the examples of language-based cognitive technologies discussed are “internal”, and await the eventual development of “external” counterparts; in spite of optimistically exaggerated claims of some designers, virtually all of the extant such developments are “partial” rather than “integral”; some of the language-based cognitive technologies are useful for “strong” cognition, others for “weak” cognition, and still others for both; very few purport to be “complete”; and only a few of them have been suggested to be “constitutive”. By emphasizing the direct contribution of language-based technologies to cognition, I want to stress that they are not mediated by the communicative use of language — the kind of use that monopolizes the attention of designers of human-computer interfaces. I obviously do not deny the importance of the latter, but I think the justified desire to develop humane interfaces and, in general, humane technologies, requires a better understanding of how the human mind makes use of and is affected by naturally evolved or designed technologies. In this respect, this paper should be seen as a contribution to the incipient field of an “epistemology of cognitive technology” (Gorayska and Marsh, 1996). By focusing on language, it connects this field with one of the main philosophical achievements of 20th century thought, the “linguistic turn”, which transformed language into the fulcrum of research in philosophy, psychology, the social sciences, and the cognitive sciences. In his intriguing book Meaning in Technology, Arnold Pacey defends a worldview “in which human relationships and human purposes may have a closer connection with technological progress than sometimes seems possible” (Pacey, 2001, p. 38). He distinguishes between the prevalent detached approach to science and technology and a participatory approach, in which we “feel ourselves to be involved in the system on which we are working” (p. 12). According to him, it is the latter that endows technology with meaning. Pacey might have found support for his insights in the present paper. Not only because we have an intimate participatory relationship with language in general and language-based cognitive technologies in particular, but also because such technologies are, ultimately, the technologies of meaning par excellence.
55
56
Marcelo Dascal
Notes * I have presented some of the ideas put together in this paper, in one way or another, in the following forums: “Dialogue Analysis 2000” (International Association for Dialogue Analysis, Bologna, June 2000); “Limited Cognitive Resources: Inference and Reference” (Center on Resource Adaptive Cognitive Processes, Saarbrücken, October 2000); “IV Encontro Brasileiro Internacional de Ciência Cognitiva”; Marília, Brazil, December 2000); and “Ciencia, Tecnología y Bien Común: La Actualidad de Leibniz” (Universidad Politécnica de Valencia, Spain, March 2001). I thank the organizers as well as the participants who enlightened me with their comments and criticism. 1. Notice that my definition is substantially narrower than those attributed to this term by other researchers (e.g., Dautenhahn 2000). 2. It should be noticed that some of the expressions in these two lists of illustrations — e.g., ‘demonstration’, ‘persuasion’, ‘decision’, etc. — display the well-known process/product ambiguity. This is why they can belong both to the list of states and to that of processes. 3. See Zue (1999). 4. On the three last items, see for example the papers presented in Proceedings (2000), as well as those collected in Cassell et al. (Eds., 2000) and Dautenhahn (Ed., 2000). 5. I have proposed a distinction between ‘demonstration’ and ‘argumentation’ as preferred moves in different types of polemics in Dascal (1998a). 6. For a critique of the initial projects of mechanical translation, which pointed out the insufficiency of linguistic theory to support them, see Bar-Hillel (1964, Chapters 10–14). 7. Usually, in the first case it is said that it is syntactically incomplete, while in the second it is said to be semantically incomplete. In both cases, however, semantics — in the broad sense of correspondence between a symbolic system and the properties it purports to represent — is involved. The formation rules in fact select a set of well-formed formulae or combinations of symbols according to some criterion of well-formedness that is supposed to correspond to some property (e.g., ‘grammaticality’ in a linguistic system or ‘propositionality’ in the propositional calculus), whereas the transformation rules select a set of derivation relations between formulae that is supposed to correspond to another property (e.g., ‘meaning invariance’ in the standard model of generative grammar or ‘validity’ in the propositional calculus). 8. See Anderson & Belnap (1975, pp. 403ff.). 9. Some passages in Turing’s paper may suggest that he took success in playing the “imitation game” (i.e., what I called the “test”) as an operational definition of intelligence, and thus — from the point of view of behaviorism — as equivalent to it, rather than a sign of it. See, for example, Block (1981) and Richardson (1982). Eli Dresner tried to persuade me that this is the case, but he concedes that “Turing definitely does not describe himself as a behaviorist” (personal communication). 10. Among them Rousseau and Adam Smith (cf. Dascal 1978 and Forthcoming). 11. For an analysis of this debate, see Dascal (1995), where several of the authors mentioned in this and the preceding paragraphs are discussed. Those interested particularly in Leibniz
Language as a cognitive technology
and Hobbes should consult Dascal (1998b and 1998c, respectively). On the implications of this debate for AI and current work in the philosophy of mind and of language, see Dascal (1992b, 1997a) and Dresner & Dascal (2001). 12. I coined the term ‘psychopragmatics’ for the branch of pragmatics that deals not with the social uses of language such as communication (a task reserved for ‘sociopragmatics’) but with the mental uses of language. See Dascal (1983) and references therein. 13. For example: “We thought a day and night of steady rain / was plenty, but it’s falling again, downright tireless … / …Much like words / But words don’t fall exactly; they hang in there / In the heaven of language, immune to gravity / If not to time, entering your mind / From no direction, travelling no distance at all, / And with rainy persistence tease from the spread earth / So many wonderful scents … (Robert Mezey, “Words”; quoted in Aitchison, 1994, p. v). The images employed in this poem capture several of the “environmental” properties of language described in Section 4.1. 14. In this respect, I am much more moderate than Winograd & Flores, who interpret Heidegger’s dictum as claiming that “nothing exists except through language” (Winograd and Flores, 1986, p. 68). 15. Watson later rejected this reductionist claim (see Watson & McDougall, 1928). 16. In fact, linguistic articulation goes well beyond this, since one can identify sub-phonemic features out of which phonemes are formed, as well as supra-lexical meaningful compounds such as idioms, whose meaning cannot be accounted for in terms of lexical-syntactic composition. 17. “Grammar itself is a machine / Which, from innumerable sequences / selects the strings of words for intercourse …/ When the words have vanished, grammar’s left, / And it’s a machine / Meaning what? / A totally foreign language” (Lars Gustafsson, “The machines”, quoted by Haberland, 1996, p. 92). 18. The philosopher Gilles Deleuze, who describes this kind of structure using the botanical model of the rhizome, rather than the now popular neural net model, has highlighted its centrality for understanding the multi-layered complexity of human thought and its expression. See Deleuze & Guattari (1976, 1980). 19. A striking example of the sheer linguistic difficulty in overcoming this obstacle is exemplified by Alejo Carpentier’s story “Viaje a la semilla” (in Carpentier, 1979, pp. 63–94). The story moves backwards from a current event to the “seed” whence it derives. In spite of the author’s ingenious efforts, however, it becomes apparent that it is virtually impossible to neutralize the temporal order embedded in various levels of linguistic structure. 20. For example, Smith pointed out one of these expressive devices used to circumvent the basic syntactic order in English (subject-verb-object), namely the anteposition of “whatever is most interesting in the sentence” (Smith 1983, p. 18), which is accounted for in modern syntactic theory in terms of an “epicyclic” rule. 21. For example, the “maxims” that govern conversation according to Grice. For further exploration, application, and theoretical grounding of these and other pragmatic rules and principles, see Dascal (2003). For a critique of the view that, since conversation is not ruled
57
58
Marcelo Dascal
by constitutive rules of a grammatical kind, it is not, properly speaking, a rule-governed phenomenon, see Dascal (1992a). 22. For discussion and interpretation of the “matching bias” phenomenon, see Evans (1982, pp. 140–144) and Dascal (1987). 23. Leibniz devoted much thought, in his projects for an encyclopedia and its role in the “art of discovery”, to the cognitive role of a variety of types of indexing. See Dascal (2002). 24. On the notion of “semantic relevance”, see Achinstein (1968). On the difficulty of establishing a clear distinction between “dictionary” and “encyclopedia”, see Peeters (2000) and Cabrera (2001). 25. Topoi, loci communes, or “commonplaces” occupied a central place in humanist education in the renaissance and the early modern period. Dozens of “CommonplaceBooks” were printed at the time, and students were required to write and use their own commonplace lists. Such a practice not only established shared forms of expression, but also shared conceptual tools, which thus constituted a background of “mental structures” guiding the thought and understanding of educated persons throughout Europe for at least two centuries. For a study of this linguistic-based cognitive resource, see Moss (1996). 26. Some of these resources have been put to use in computer applications. Chinese wordprocessors, taking advantage of the Chinese habit of systematically using proverbs (mainly four-character ones), “propose” to the writer possible proverbial continuations once the first two characters of the proverb are typed. Attempts to simulate and exploit the dialogical resources of natural language for human-computer interfaces are now proliferating. The pioneer classic ELIZA employed a number of phrasal structures routinely occurring in nondirective psychotherapy in order to create the impression of a real dialogue between therapist and patient (Weizenbaum, 1966). The MUD robot-agent JULIA, like ELIZA, employs lists of common queries and a matching procedure in order to generate naturallooking “conversation” with users (cf. Foner, 2000). More recent rule-based systems of dialogue and conversation (e.g., Kreutel and Matheson, 2000; Webb, 2000) are no doubt much more sophisticated and useful tools than ELIZA, but they still remain excessively subordinated, in my opinion, to the rule-following model. 27. For a rhetorical analysis of the scientific paper and its evolution from the 17th century onwards, see Gross (1990) and Gross et al. (2002). 28. For a sample of research on metacognition, see Metcalfe & Shimamura (1994). For the relationship between metacognition and consciousness, see Newton (1995), and for its relationship with conversation, see Hayashi (1999). For a critique of the exaggerated emphasis on metacognitive abilities in education, see Roth (2004) 29. A striking example of the use of language for alleged “scientific” purposes is Scientology. This religious movement, based on the “science” of “Dianetics”, claims to provide its followers with a “cognitive technology” that allows them to achieve the status of “Clears”, essentially through linguistic manipulation. For an analysis of this phenomenon, see Mishori & Dascal (2000).
Language as a cognitive technology
30. Lavoisier, who was in this respect a follower of Condillac, viewed his new chemical nomenclature as having cognitive implications far beyond those of a mere terminological reform (cf. Bensaude-Vincent, 1993). 31. Zamir (2002) also proposes an epistemological account of how literature can express and eventually generate cognitive content that the resources of philosophical discourse are unable to capture. 32. See Astroh (1995), Barth (1992), Dascal (1997b, 1998a, 2000) and references therein.
References Achinstein, P. (1968). Concepts of science: A philosophical analysis. Baltimore: The Johns Hopkins Press. Aitchison, J. (1994). Words in the mind (2nd ed.). Oxford: Blackwell. Alper, H. P. (1989). Introduction. In H. P. Alper (Ed.), pp. 1–14. Alper, H. P. (Ed.) (1989). Mantra. Albany: State University of New York Press. Anderson, A. R. & N. D. Belnap, Jr. (1975). Entailment: The logic of relevance and necessity, vol. 1. Princeton: Princeton University Press. Astroh, M. (1995). Sprachphilosophie und Rhetorik. In Dascal et al. (Eds.) (1992–5), pp. 1622–1643. Bar-Hillel, Y. (1964). Language and information. Reading, MA & Jerusalem: Addison-Wesley & Magnes Press. Barth, E. M. (1992). Dialogical approaches. In Dascal et al. (Eds.), pp. 663–676. Bensaude-Vincent, B. (1993). Lavoisier: Mémoires d’une révolution. Paris: Flammarion. Black, M. (1963). Reasoning with loose concepts. Dialogue, 2, 1–12. Block, N. (1981). Psychologism and behaviorism. The Philosophical Review, 80, 5–43. Brinck, I. (2001). Attention and evolution of intentional communication. Pragmatics & Cognition 9(2), 255–272. Cabrera, J. (2001). Words, worlds, words. Pragmatics & Cognition 9(2), 313–327. Carpentier, A. (1979). Cuentos completos. Barcelona: Brughera. Cassell, J., J. Sullivan, S. Prevost & E. Churchill (Eds.) (2000). Embodied conversational agents. Cambridge, MA: The MIT Press. Dascal, M. (1978). Aporia and theoria: Rousseau on language and thought. Revue Internationale de Philosophie 124/125, 214–237. Dascal, M. (1983). Pragmatics and the philosophy of mind, vol. 1: Thought in Language. Amsterdam: Benjamins. Dascal, M. (1987). Language and reasoning: Sorting out sociopragmatic and psychopragmatic factors. In B. W. Hamill, R. C. Jernigan & J. C. Bourdreaux (Eds.), The role of language in problem solving II, pp. 183–197. Amsterdam: North Holland. Dascal, M. (1992a). On the pragmatic structure of conversation. In H. Parret and J. Verschueren (Eds.), (On) Searle on Conversation, pp. 35–56. Amsterdam: Benjamins. Dascal, M. (1992b). Why does language matter to artificial intelligence?. Minds and Machines 2, 145–174.
59
60
Marcelo Dascal
Dascal, M. (1995). The dispute on the primacy of thinking or speaking. In Dascal et al. (Eds.), pp. 1024–1041. Dascal, M. (1997a). The language of thought and the games of language. In M. Astroh, D. Gerhardus, and G. Heinzman (Eds.), Dialogisches Handeln: Ein Festschrift für Kuno Lorenz, pp. 183–191. Heidelberg: Spektrum Akademischer Verlag. Dascal, M. (1997b). Critique without critics? Science in Context 10(1), 39–62. Dascal, M. (1998a). Types of polemics and types of polemical moves. In S. Cˇmejrková, J. Hoffmannová, O. Müllerová & J. Sveˇtlá, Dialogue analysis VI, vol. 1, pp. 15–33. Tübingen: Max Niemeyer. Dascal, M. (1998b). Language in the mind’s house. Leibniz Society Review 8, 1–24. Dascal, M. (1998c). O Desafio de Hobbes. In L. Ribeiro dos Santos, P. M. S. Alves & A. Cardoso (Eds.), Descartes, Leibniz e a Modernidade, pp. 369–398. Lisboa: Colibri. Dascal, M. (2000). Controversies and epistemology. In Tian Yu Cao (Ed.), Philosophy of science (= Vol. 10 of Proceedings of the Twentieth World Congress of Philosophy), pp. 159–192. Philadelphia: Philosophers Index Inc. Dascal, M. (2002). Leibniz y las tecnologías cognitivas. In A. Andreu, J. Echeverría & C. Roldán (Eds.), Ciencia, tecnología y el bien común: La actualidad de Leibniz, pp. 159–188. Valencia: Universidad Politécnica de Valencia. Dascal, M. (2003). Interpretation and Understanding. Amsterdam: Benjamins. Dascal, M. (Forthcoming). Adams Smith’s theory of language. In K. Haakonssen (Ed.), The Cambridge Companion to Adam Smith. Cambridge: Cambridge University Press. Dascal, M., D. Gerhardus, K. Lorenz & G. Meggle (Eds.) (1992–5). Philosophy of Language — A handbook of contemporary research, vols. 1 & 2. Berlin & New York: Walter de Gruyter. Dautenhahn, K. (2000). Living with intelligent agents: A cognitive technology view. In K. Dautenhahn (Ed.), Human cognition and social agent technology, pp. 415–426. Amsterdam: Benjamins. De Gelder, B. (1981). Attributing mental states: A second look at mother-child interaction”. In H. Parret, M. Sbisà & J. Verschueren (Eds.), Possibilities and limitations of pragmatics, pp. 237–250. Amsterdam: Benjamins. Deleuze, G. & F. Guattari (1976). Rhizome. Paris: Minuit. Deleuze, G. & F. Guattari (1980). Mille plateaux. Paris: Minuit. Dertouzos, M. L. (1999). The future of computing. Scientific American 281(2), 36–39. Dresner, E. & M. Dascal (2001). Semantics, pragmatics, and the digital information age. Studies in Communication Sciences 1(2), 1–22. Dreyfus, H. (1971). What computers can’t do. New York: Harper & Row. Dreyfus, H. (1992). What computers still can’t do. Cambridge, MA: The MIT Press. Dror, I. E. & M. Dascal (1997). Can Wittgenstein help free the mind from rules? The philosophical foundations of connectionism. In D. M. Johnson & C. E. Erneling (Eds.), The future of the cognitive revolution, pp. 217–226. New York: Oxford University Press. Evans, J. T. St. B.. (1982). The psychology of deductive reasoning. London: Routledge & Kegan Paul. Foner, L. (2000). Are we having fun yet? Using social agents in social domains. In K. Dautenhahn (Ed.), Human cognition and social agent technology, pp. 323–348. Amsterdam: Benjamins. Geertz, C. (1973). The interpretation of cultures. New York: Basic Books.
Language as a cognitive technology
Gorayska, B. & J. Mey (Eds.) (1996). Cognitive technology: In search for a humane interface. Amsterdam: Elsevier. Gorayska, B. & J. Marsh (1996). Epistemic technology and relevance analysis: Rethinking cognitive technology. In Gorayska & Mey (Eds.), pp. 27–39. Gross, A. G. (1990). The rhetoric of science. Cambridge, MA: Harvard University Press. Gross, A. G., J. E. Harmon & M. Reidy (2002). Communicating science: The scientific article from the 17th century to the present. Oxford: Oxford University Press. Gullvåg, I. & A. Naess (1995). Vagueness and ambiguity. In Dascal et al. (Eds.), pp. 1407–1417. Haberland, H. (1996). “And we shall be as machines” — or should machines be as us? On the modeling of matter and mind. In Gorayska & Mey (Eds.), pp. 89–98. Hayashi, T. (1999). A metacognitive model of conversational planning. Pragmatics & Cognition 7(1), 93–145. Hutchins, E. (1999). Cognitive artifacts. In R. A. Wilson & F. C. Keil (Eds.), The MIT encyclopedia of the cognitive sciences, pp. 126–128. Cambridge, MA: The MIT Press. Kreutel, J. & C. Matheson (2000). Information states, obligations and intentional structure in dialogue modelling. In Proceedings, pp. 80–86. Metcalfe, J. & A. P. Shimamura (Eds.) (1994). Metacognition: Knowing about knowing. Cambridge, MA: The MIT Press. Mishori, D. & M. Dascal (2000). Language change as a rhetorical strategy. In Harish Narang (Ed.), Semiotics of language, literature and cinema, pp. 51–67. New Delhi: Books Plus. Moss, A. (1996). Printed commonplace-books and the structuring of renaissance thought. Oxford: Oxford University Press. Naes, A. (1966). Communication and argument: Elements of applied semantics. Oslo: Universitetsforlaget. Netz, R. (1999). Linguistic formulae as cognitive tools. Pragmatics & Cognition 7, 147–176. Newton, N. (1995). Metacognition and consciousness. Pragmatics & Cognition 3(2), 285–297. Peeters, B. (Ed.) (2000). The lexicon-encyclopedia interface. Amsterdam: Elsevier. Pacey, A. (2001). Meaning in technology. Cambridge, MA: The MIT Press. Proceedings (2000). Proceedings of the 3rd International Workshop on Human-Computer Conversation (Bellagio, July 2000). Richardson, R. (1982). Turing tests for intelligence: Ned Block’s defense of psychologism. Philosophical Studies 41, 421–426. Roth, M. (2004). Theory and praxis of metacognition. Pragmatics and Cognition 12(1), 153–168. Sillince, J. A. A. (1996). Would electronic argumentation improve your ability to express yourself?. In B. Gorayska & J. L. Mey (Eds.), pp. 375–387. Smith, A. (1761). Considerations concerning the first formation of languages and the different genius of original and compounded languages. In J. R. Lindgren (Ed.), The Early Works of Adam Smith, pp. 225–251. New York: Augustust M. Kelley Publisher, 1967. Smith, A. (1983). Lectures on rhetoric and belles lettres, ed. J. C. Bryce & A. S. Skinner. Oxford: Clarendon Press. Tsur, R. & M. Benari (2001). ‘Composition of place’, experiential set, and the meditative poem. Pragmatics & Cognition 9(2), 201–234.
61
62
Marcelo Dascal
Turing, A. M. (1950). Computing machinery and intelligence. Mind 59, 433–460. Watson, J. B. & W. McDougall (1928). Battle of behaviorism: An exposition and an exposure. London: K. Paul, Trench, Trubner & Co. Webb, N. (2000). Rule-based dialogue management systems. In Proceedings, pp. 164–169. Weizenbaum, J. (1966). ELIZA — A computer program for the study of natural language communication between man and machine. CACM 9, 36–45. Winograd, T. & F. Flores (1986). Understanding computers and cognition: A new foundation for design. Reading, MA: Addison-Wesley. Woodworth, R. S. & S. B. Sells (1935). An atmosphere effect in syllogistic reasoning. Journal of Experimental Psychology 18, 451–460. Zadeh, L. A. (1975). Fuzzy logic and approximate reasoning. Synthese 30, 407–428. Zamir, T. (2002a), An epistemological basis for linking philosophy and literature. Metaphilosophy, 33(3), 321–336. Zamir, T. (2002b). Doing nothing. Mosaic 35(3), 167–182. Zue, V. (1999). Talking to your computer. Scientific American 281(2), 40–41.
Relevance, goal management and cognitive technology* Roger Lindsay and Barbara Gorayska Psychology Department, Oxford Brookes University / SPS, University of Cambridge
1.
Introduction
Understanding what is relevant is absolutely fundamental to every cognitive operation carried out by humans from low-level feature recognition, to highlevel problem solving. In Artificial Intelligence (AI) work the importance of relevance is easily passed over. AI programs are written by human programmers in such a way as to ensure that all the cognitive resources needed by the program are available at exactly the time the program needs them. Indeed, most of the challenge involved in programming computers comes from having to anticipate and make available what is relevant at different stages of a processing cycle, and having to exclude information and operations that are irrelevant. The central contention underlying this chapter can be expressed as a positive and a negative thesis. The negative thesis is that the central role of relevance in cognition passes largely unacknowledged in cognitive neuroscience, despite the fact that neuroscientists are forced to employ or grapple with the concept at every turn. The positive thesis is that by according to relevance the central role that it should properly have in explaining cognition, it is possible to clear up a considerable number of issues and problems that presently seem mysterious in connection with problem-solving, ethics, symbol-connection hybridism and the motivation-action nexus. The first researchers with an interest in cognitive science to realise that relevance is important were Sperber and Wilson (1986/1995). Their updated Relevance Theory (Sperber and Wilson, 2004) is briefly summarized below (preserving, in as much as possible, the authors’ own terminology and style of expression).
64
Roger Lindsay and Barbara Gorayska
1.1 Sperber and Wilson’s Theory of Relevance According to Sperber and Wilson (1986/1995) their Theory of Relevance (RT) is a cognitive psychological theory which aims to explain, in cognitively realistic terms, the mental processes people employ when they overtly communicate. It is set within, and further elaborates on, the Gricean framework of Inferential Pragmatics where intended communicative acts, which include, but are not limited to, natural language utterances, are comprehended in situational contexts rather than merely decoded from, or coded in, what is strictly being said or ostensively done. It attempts to provide an empirically plausible account of human cognition at the interface between intention, action, and the language-mediated, perceivable world. Any cognitive psychological theory relies on how the mind is understood at the time of its formulation. RT is no exception. When it was first developed (Sperber and Wilson, 1986), it adopted the mental architecture proposed by Fodor (1983) which postulated that the mind comprised the largely undifferentiated, central thought processor (reflective reasoning mechanism) and a set of peripheral input-modules, or faculties, of which language module was one. Comprehending verbal and non-verbal behaviour intended to communicate was thus inferential (employing intuitive and spontaneous inference) but dissociated from the cognitive processes that related mental states to other forms of action. This dissociation, albeit weakened, is still in place in the current version of RT updated (Sperber and Wilson, 2004, summarized in this section) to better fit the modern, highly modular view of the mind in the cognitive sciences. What is now proposed is a dedicated inferential comprehension module which, according to the authors, is comparable to an Intention Detector, or an Eye Direction Detector (Leslie 1991; Premack & Premack, 1995; Baron-Cohen, 1995). It has its own proprietary concepts and mechanisms, which do not have to be learnt but come as a substantial innate endowment. This module, they say, is a part of a more general module for processing motivated action, but comprises special-purpose inferential comprehension procedures (or heuristics) attuned to, and taking advantage of, the regularities in the communicative domain. (Note that, in principle, a “substantial innate endowment” does not rule out procedures and heuristics that have to be learnt. If so, the proposed module is a prime candidate for inclusion into the category of natural technologies discussed in Meenan & Lindsay (2002), El Ashegh & Lindsay and Bowman et al. (both this volume). Further empirical research is necessary to validate this point.) Exploring this possibility, of a separate
Relevance, goal management and cognitive technology
specialized comprehension sub-module, is worthwhile, Sperber and Wilson argue, because of the disparate nature of the phenomena involved: Firstly, the range of actions an agent can possibly intend in situational contexts is limited while the range of meanings a speaker can intend in any situation is virtually unlimited. Secondly, a single level of metarepresentation is generally sufficient for attributing intentions to agents (regular mind-reading) while several layers of metarepresentations are typically necessary for inferential comprehension. It is therefore unclear, they say, how the communicator’s meaning (a communicative intention) could be inferred by the same standard procedures that attribute intention to actions or, by the same token, if this were so, how a child of two who failed on regular first-order false belief tasks could recognize and understand the multi-levelled representations involved in verbal comprehension.
The inferential comprehension module RT postulates that the search for relevance is basic to human cognition and people exploit it when communicating. Modifications to world models happen within cognitive environments when assumptions, the basic building blocks of these models, “became manifest” (i.e., are available to conscious awareness). Overt communicative signals from speakers (or communicators) provide evidence of their intentions and create precise and predictable expectations of relevance sufficient to guide the recipients (at whom communication is directed) towards the speakers’ meaning. Note a departure from Grice (1961 and compendium: 1989) in abandoning the Cooperative Principle, the maxims of Quality, Quantity, and Manner and the role of maxim violations. For details see Sperber and Wilson (1986/1995 and 2004). Upon receiving the communicative signals (a sight, a sound, an utterance or a memory), recipients access from memory available assumptions (background information or contexts) that, when connected with the input signals (and not otherwise), yield conclusions (contextual implications) that make a worthwhile difference to their world models (by answering queries, improving what is known on a given topic, eliminating a doubt, confirming a suspicion, or correcting a mistake). To Sperber and Wilson relevance of a communicative input is a matter of degree; It is a function of cognitive costs and awards: the greater the positive cognitive effects and the lower the costs of mental processing expended to achieve them (mainly due to the relative salience of input stimuli), the more relevant the (preferred interpretation of the) input signal. For this reason, selected input stimuli are not just relevant but are more
65
66
Roger Lindsay and Barbara Gorayska
relevant that any alternative available, and are, hence, the outcome of making the most efficient use of available resources. Note that the corollary of this view is that in RT the purpose of communicating is narrowed down to, and the relevance of what is said, seen or remembered is sought in association with, a mere desire to improve one’s understanding of the world, i.e., the world model, by either adding new assumptions to it or by strengthening or weakening the already entertained assumptions within it. (For further discussion, see Gorayska and Lindsay, 1993.) Framing the notion of degrees of relevance in comparative rather than quantitative terms, RT bypasses the problem of how the ratio of effect and effort is to be “measured” in real time in psychologically plausible ways. Note that computation itself is effort expending and not all cognitive factors, e.g., levels of attention, are measurable. (For detailed criticisms, see, e.g., Sperber and Wilson, 1987.) Effort and effect are treated as non-representational dimensions of mental processes and comparative judgments of relevance are presumed intuitive rather than absolute, numerical ones. Consequently, the First, or Cognitive, Principle of Relevance (a regularity specific to the communicative domain) claims that humans automatically maximize relevance (because of the way their cognitive mechanisms have evolved due to constant pressure for increased efficiency). We automatically perceive relevant stimuli, activate in memory relevant assumptions, or spontaneously infer conclusions in the most productive way. Further, the degree of relevance that the audience settles for in comprehending communicative stimuli is optimal. In inferential communication ostensive stimuli are designed to attract attention. Communicators intend both to inform others of something and, at the same time, also to inform them of their desire to inform. Consequently, the Second, or Communicative, Principle of Relevance claims that all ostensive stimuli convey a presumption of their own optimal relevance. They are the most relevant that communicators can and want to produce. What this means to the audience in terms of judging efforts vis-à-vis effects is that the designed stimuli are at least relevant enough to be worth processing, and as easy as possible to understand. This leads straightforwardly to the Relevance-theoretic comprehension procedure (a “fast and frugal heuristic”) whereby the path of least effort is followed in automatic computing of cognitive effects. Linguistically-encoded word meanings provide clues to the communicator’s meaning. Inferential processes are employed both in deriving explicatures (completing decoded logical forms of utterances, i.e., conceptual representations of
Relevance, goal management and cognitive technology
what is said, that are fragmentary or incomplete due to linguistic indeterminacy: disambiguating, resolving reference, or lexical-pragmatic processes such as narrowing or loosening in figurative uses of language, etc.) as well as implicatures (conclusions drawn from the explicatures and the background information). Comprehension of what is being communicated is an on-line cognitive task of the recipient whose goal is to formulate and confirm plausible hypotheses about the communicator’s meaning. In a highly parallel manner three subtasks are executed: (1) constructing appropriate hypotheses about the explicit content of utterances (explicatures), (2) constructing appropriate hypotheses about the intended contextual assumptions (implicated premises), and (3) constructing appropriate hypotheses about the intended contextual implications (implicated conclusions). Upon the evidence provided, interpretive hypotheses about the communicator’s meaning are tested in order of accessibility via mutual adjustment of context, content and cognitive effects. The process terminates when the first plausible hypothesis is entertained, which is then considered most plausible and therefore most relevant in the context. Sperber and Wilson show that outcomes of the inference comprehension procedures (or heuristics), hence the operations of the mechanism in comprehending communication, are empirically testable: For example, predictable variations can be witnessed in deriving the speaker’s intended meaning or her deception due to degrees of sophistication in (meta)representation capacity of the interpreter (noticeable in child development). The operations of the mechanism can also serve to explain why in selection tasks such as the Wason task, responses of the subjects can be seen as the output (of deriving optimal relevance) according to the linguistic evidence (clues) provided: people choose options that are supported by the different situational contexts (that manipulate the effort and effect factors) made explicitly available to them (Sperber, Cara and Girottto, 1995). (See also Evans (1982) and Dascal (1987) who show that people also choose options explicitly named in utterances.) 1.2 Limitations of Sperber and Wilson’s Theory of Relevance Though Sperber and Wilson deserve a great deal of credit for their vision in coming to appreciate the importance of the relevance construct, their vision, at least as far as it is so far realised in print remains seriously limited. For example, according to Sperber and Wilson, relevance relationships only exist between propositions, and hence relevance is fundamentally a relationship between symbols or symbol strings. By contrast, we will argue below that relevance is the
67
68
Roger Lindsay and Barbara Gorayska
key concept underlying all forms of cognitive processing, non-verbal as well as propositional, connectionist as well as symbolic, not just one of a number of important concepts in one or another sub-domain of cognition. Further, as indicated above, Sperber and Wilson (2004) suggest that the special-purpose inferential comprehension procedures (or heuristics) underlying the linguistic relevance relations with which they almost exclusively deal in their published work are a part of a more general module for processing motivated action. However, there is almost nothing in their work that offers insight into how actions are planned, or how motivation impinges upon this process. The theory is, even being charitable, loosely coupled to mechanisms such as Working Memory and the Episodic Buffer (Baddeley, 2001) or Norman and Shallice’s SAS (Norman and Shallice, 1986). Last but not least, Sperber and Wilson clearly intend their theory to be interpreted as a theory of cognitive processing that is in some sense supposed to be implemented in the brains of human agents. Again however, there are few hints in their work that identify the actual neuropsychological mechanisms that are responsible for processing relevance information. Our intention in the sections below is to describe a theory of relevance that rectifies these omissions: our theory seeks to explain how relevance connects with motivation. Because relevance is a key variable in goal management, the theory tries to link relevance processing with the constructs of psychology by suggesting that relevance plays a fundamental part in problem-solving and action planning. Finally, the theory claims that relevance is neuropsychologically grounded because it is the mechanism by which associative processing in neural networks is converted into hypothesis testing in symbolically represented problem spaces. The first step in delineating our theory is to explain how the concept of relevance is intrinsically bound up with the process of goal management. This will be followed by a discussion of the cognitive function of ethics in managing goals. The paper will end with some consideration of how the proposed theory of relevance processing and goal management can be put to technological use.
2. The ontogenesis of relevance The conceptual intimacy of the link between relevance and goal management derives from the fundamental fact that relevance is a “goal-dependent predicate”; That is to say, whether something can be accurately or meaningfully described as relevant, depends upon the prior specification of a goal (Gorayska
Relevance, goal management and cognitive technology
and Lindsay, 1989a,b; Gorayska and Lindsay, 1993; Lindsay, 1996a). Lindsay, Gorayska, and Cox (1994) have reported evidence which suggests that subjects can reliably match plan elements to goals, and while they can readily formulate effective plans to achieve specified goals using relevant plan components generated by other subjects, they are quite unable to do so when the plan components provided for them are not relevant. Gorayska and Lindsay (1989a,b, 1993, 1995) and Lindsay and Gorayska (1995) have offered a formal definition of relevance that attempts to capture this goal-dependent character: P is relevant to G iff G is a goal, and P is an essential element of some plan that is sufficient to achieve G. Several computer-based problem-solving systems have been developed which employ relevance (defined as above) as a central theoretical construct (Gorayska et al., 1992; Gorayska and Tse, 1993; Tse, 1994; Ip, Gorayska and Lok, 1994; Gorayska, Tse and Kwok, 1997; Zhang, 1993, Zhang, Nealon, and Lindsay, 1993; Johnson, 1997). It would seem that the idea has some practical utility for supporting AI systems which have at least a limited capability for reasoning and dialogue. As important as the evidence that planning systems based upon the processing of relevance relations are sufficient to generate goal-oriented action plans, however, is the fact that a relevance-based theory can also supply a solution to a problem that at present fatally afflicts symbol-based AI models of reasoning and problem-solving. AI systems for planning and reasoning almost all operate within a set of assumptions developed by Newell and Simon (Newell and Simon, 1972; Newell, 1990; Vera and Simon, 1993). This framework is often called the Symbolic Search Space Paradigm (SSSP) approach (Partridge, 1991). According to SSSP, a problem consists of a set of givens (objects or events), a set of operations, and a set of goals. Application of every allowable operation in every possible order to the givens, generates the state space of a problem. Any sequence of allowable operations is a plan. Solution of a problem requires the identification of a sequence of operations that can be applied to the given state of a problem so as to transform it into the goal state. Problem-solving may be difficult because such solution paths are sparsely distributed within the state space, and because a solver has no direct access to the state space but must construct its own symbolic representation of it. A symbolic representation of the state space of a problem which is constructed with the aim of locating an effective plan for solving the problem is called a problem space. A problem space may differ from the state space of a problem by over- or under-inclusion of objects or operations.
69
70
Roger Lindsay and Barbara Gorayska
A central difficulty for AI research, and for theories of human problemsolving is the question of how problem spaces are constructed. Once the givens and operations are known, generating an effective plan is more-or-less mechanical. For some relatively formal problems such as chess playing, the issue is trivial: the problem space must include a representation of the chessboard, the pieces, and the allowable moves. In most cases, however, construction of the problem space is by far the most challenging aspect of problem-solving in AI. In practice, this difficulty has been handled by “hand-crafting” the problem space, that is to say, by using human intuitions to decide what objects and operations are to be used. This tactic bootstraps over the problem, but leaves in its wake the worry that the models that employ it are little more than wellintentioned fakes, appearing successful only because they are being fed a preprocessed version of difficult problems that disguises their limitations. The SSSP framework is immensely powerful and has been successfully applied in AI models of perception, robotics, reasoning in formal and natural language and many types of learning. However, it is clear that for genuinely creative problemsolving to occur, particularly with problems which are incompletely or informally defined, it is essential that a better understanding is achieved of problemspace construction. If human beings can make reliable judgements concerning the relevance of plan elements to goals, it seems possible that problem spaces might be constructed on the basis of such judgements: first objects and operations relevant to some goal are retrieved, then “standard” problem-solving in the sense of trying to locate an effective plan to transform givens to goal may proceed. There is however, a serious obstacle to this proposal: the analysis of relevance offered above defines the relevance of an element in terms of whether or not it is a component of an effective plan. This information cannot be available prior to the existence of the plan, so how can relevance information form part of the input to the planning process? The analysis of problem-solving and relevance processing offered so far is assumed to occur entirely within SSSP systems that manipulate symbolic representations. One possible way in which the circularity identified above may be broken, is for relevance information to originate outside the symbol processing system. How plausible is this suggestion in the case of human information processing? We wish to argue that both ontogenetically and psychogenetically, the case is a strong one. Though it is inappropriate to argue the case in detail here, it seems plausible that human infants are capable of learning, and have considerable problem-solving competence before they have symbolic planning capabilities
Relevance, goal management and cognitive technology
(Cohen and Strauss, 1979; Gottlieb and Krasnegor, 1985). Not only is there a strong a-priori case that such presymbolic learning is essential, as symbols must be interpreted before they can be used, and the process of symbol interpretation must require learning, but there is an accumulating body of evidence that the cognitive competences of human infants in early life can be more successfully explained by non-symbolic connectionist models than by models sharing SSSP assumptions (Elman, Bates, Johnson, Karmiloff-Smith, Parisi and Plunkett, 1996; Plunkett, McLeod and Rolls, 1998). Similarly, there is mounting evidence that adults can show evidence of learning through improved performance without being able to symbolically represent or articulate their knowledge in some domains (Mack and Rock, 1998) or prior to the development of explicit access to relevant knowledge in others (Reber, 1993). It is obvious that learning requires the acquisition of relevance information. If it is true that learning can precede symbolic planning, then it must be true that the acquisition of relevance information can precede symbolic planning. The claim that relevance information may be a precursor of symbol processing raises an immediate further question: what pre-symbolic mechanisms could make relevance information available? A plausible answer is that two distinct processing systems might exist: a higher-level system running on interpreted (or ‘grounded’, Harnad, 1990; but see also Kay, 2001) symbols, the other a subsymbolic system that does not itself use symbols, but that generates outputs capable of satisfying the requirements of its symbol-based partner. The view that human beings have access to two different processing systems has arisen in a variety of forms. Perhaps the best known is the “controlled” versus “automatic” processing distinction of Shiffrin and Schneider (Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977). This distinction is now treated as a core element in two-process theories of action control such as Norman and Shallice’s Supervisory Attentional System (Norman and Shallice, 1986). In visual perception it is now generally accepted that there are two quite separate cortical pathways involved in processing sensory signals: the ventral route, terminating in the temporal lobe, is believed to process the kind of ‘what’ information originating from the fovea of the eye and carries out detailed featural analysis of static objects. The dorsal route terminates in the parietal lobe and is thought to processes ‘where’ information originating from peripheral vision, and concerned with controlling movement and directing eye movements (Mishkin, Ungerleider and Macko, 1983; Baizer, Ungerleider and Desimone, 1991; Boussaoud, di Pellegrino and Wise, 1996). In memory, it is now common wisdom that a conscious declarative memory system underlies performance in recognition and
71
72
Roger Lindsay and Barbara Gorayska
recall whilst a quite separate non-declarative or ‘procedural’ system is responsible for implicit memory phenomena in priming and associative tasks (Squire, 1992). In learning, there is widespread acceptance that “distinct learning systems encode very different sorts of information, one system inducing rules… while a second system memorises instances” (Shanks and St. John, 1993). Shanks and St. John suggest that the system which memorises instances is based on connectionist principles, and the learning exhibited is responsible for the phenomenon known as “implicit learning” (Cleeremans, 1993), which is less likely to be available for verbal report. “In contrast [they claim] connectionist models do not fit well with our understanding of the explicit hypothesis testing also found in the grammar learning literature”. The separate operation of the two systems subserving language is perhaps most readily evident to human agents, who have full awareness of the propositional content and semantic aspects of speech but are entirely unconscious of grammatical operations and the processes underlying speech perception and production. Broca had begun to assemble evidence for the neurological independence of these two subsystems shortly after the middle of the nineteenth century (Broca, 1861). It is unlikely to be a coincidence that within each of the major sub-domains of cognition: action, visual perception, memory, learning and language, there seem to be two distinct systems operating in parallel. In each case, one system requires attention at input, content is explicit and directly available to conscious report processes, and real-time responses are generally fairly slow. The other system operates with unattended or incidentally processed material, is implicit, unavailable to consciousness and generally supports fast nonverbal responses. It does not seem to involve straying much beyond the available evidence to suggest that this duplex architecture is a general structural feature of human cognition, and that in every case the characteristics of the unconscious, nonsymbolic member of the pair seem to resemble those associated with connectionist systems, whilst the conscious and explicit processes correspond more closely with what might be expected in systems conforming to the assumptions of the SSSP. Smolensky (1988), among others, has also proposed that human cognition is likely to be subserved by connectionist systems whose operations are integrated with symbol processing mechanisms. The evidence seems to strongly indicate that a type of cognitive processing exists that is non-symbolic, largely or entirely implicit, and possibly mediated by connectionist mechanisms. This sub-symbolic system seems to support cognitive operations in many tasks when they are carried out by very young children. In adults sub-symbolic processing seems to occur when attention is
Relevance, goal management and cognitive technology
not paid, or is not available (Mack and Rock, 1998). Could sub-symbolic processing of this type yield relevance information? There is every reason to believe that it could. A connectionist system seeks to maximise the probability of certain classes of feedback by varying the probability of outputs as a function of input. In effect, a connectionist system seeks to converge upon a set of input/ output contingencies that do not require it to make any further adjustments to its internal parameters. Without knowing anything about the internal processes of such a device, and merely by treating it as a “black box”, a second system monitoring the behaviour of a connectionist learning system could infer relevance in the following way: An input I is relevant to a goal G when I causes some variation in output which is associated with a change in the value of feedback Roughly, this principle says that if a second system is monitoring a connectionist system during learning, the monitoring system can infer that an input is relevant to the goal state sought by the connectionist system when the output of the latter changes as a result of receiving that input. The validity of the inference arises because the change in output must be an effect of feedback being used to modify internal parameters, and hence of progress towards the goal. Johnson (1997) has demonstrated by building a working AI model, that using this associative relevance principle, a higher-order symbolic planning system can successfully capture and use relevance information. It is important, and not without interest, to note that the “goal” of a connectionist system is usually not represented within the system itself, as it would have to be within an SSSP problem-solver. For example, consider a connectionist system designed to analyse satellite data on CO2 emissions from earth-surface industrial installations. The goal of the system might be to generate one output when emissions exceed an acceptability threshold, and another when they do not. The problem is complicated by the fact that atmospheric variation interacts with source variation, and the system only has access to data collected from beyond the atmosphere. In training the system, the standard against which output accuracy will be judged is independently measured emission levels at the earth’s surface. The design objective (one kind of goal) is to identify as unacceptable all and only those installations that exceed threshold values at ground level. This abstract specification of the system goal will be found nowhere within the system itself. Even the operationalised proxy for the designer goal used in training, a list of acceptable and unacceptable installations in terms of surface measurements, cannot be an essential part of the system, as the system is intended to correctly
73
74
Roger Lindsay and Barbara Gorayska
classify installations for which the true value is unknown. For connectionist problem-solvers then, the system goal can, in general, only be inferred from the behaviour of the system. When such problem-solvers are artefacts designed by humans, the system developer can tell us what the design objective was, what the training criterion was and so forth, but even then the criterion might have been ill-chosen (and so fail to match the design objective), or the system may not have been trained on all contingencies falling under the criterion (so that there are behaviour/criterion mismatches in some situations). In either eventuality, the result will be that in some circumstances the observed goal (implicit in system behaviour) will differ from the intended goal (as described by the system originator). In the case of organic connectionist systems that have developed via evolution rather than an intentional design process, there are no design objectives to consult. Extraction of the system goal will require an analysis of those interactions between the structure and behaviour of the organism within which the system is embedded and the environmental constraints under which it behaves that affect the probability of survival and reproduction. Fortunately, the practical unavailability of system goals need not impede inferences about relevance. If, following a particular input-output pairing, a feedback-driven modification of system parameters occurs, then the input triggering the change must be relevant to the system goal, whatever that goal may be. This may seem a long way from the more familiar controlled versus automatic processing distinction which has now become a familiar landmark in the skills literature. The claim is that controlled processing is slow, serial. conscious, error-prone and characterises the early stages of skill acquisition. Automatic processing is fast, parallel, accurate, unconscious, and occurs when a skill is thoroughly mastered. However, if there are two systems, there seems no good reason why the parallel version should be inoperative during the early stages of learning, and if both systems operate throughout the course of learning, the high error rate during the early stages may as well result from poorer quality decisions by the parallel system as from the serial system. Let us propose an alternative scenario. Two learning systems operate in tandem throughout the course of learning. The symbolic planning system is serial, learns by hypothesis testing, which is usually all-or-none, executes relatively slowly, and depends upon relevance information from its connectionist fellow. The connectionist system is parallel, learns slowly and usually incrementally, but executes quickly. The time course of learning could be considerably compressed if relevance information from the connectionist system was used by the symbolic planning system to construct an effective plan,
Relevance, goal management and cognitive technology
which in turn was fed back to the connectionist system, allowing more rapid convergence on a set of network weights which approximated to optimum performance. This architecture may also seem more satisfactory from a functional perspective than a dual component system in which first one, and then the other component does nothing, and the relationship between the two is largely unknown. The proposed architecture has a further advantage: a puzzling feature of relevance is that it seems to have both a subjective and an objective characterisation; this can readily be accounted for by assuming that, while a connectionist learning system seeks to exploit objective relevance information signalled by changing relationships between input, output, and feedback, symbolic plans are always hypothetically related to the external world and are thus subjective in the sense that they may incorrectly represent relationships underlying the regularities they seek to capture. What we have attempted to establish to date is that it is possible that relevance information could become available to a symbolic planning system from a connectionist learning system which operates in parallel with it, and that this suggestion is compatible with views of human information processing which are widespread in the current literature. Our argument goes beyond this however: we want to suggest that relevance is an essential theoretical construct which underpins all symbolic planning processes. An analogy might be made with familiarity in the domain of memory. The case for the utility of familiarity as a construct depends in part upon the demonstration that it has explanatory value, but the case for arguing that familiarity is a real dimension of information processing in memory is bolstered by evidence that the familiarity assignment mechanism can malfunction, for example in cases of déjà vu, or in the moments immediately preceding an epileptic seizure (Penfield and Roberts, 1959). Is there any evidence that the relevance assignment process can function inappropriately in a similar way? Part of our answer is that inappropriate relevance assignment is a common human experience which is often responsible for problem-solving failure. (See further Lindsay, Gorayska and Cox, 1994.) A second part is that inappropriate relevance assignment is a common feature of cognitive disorder in conditions such as schizophrenia: for example Meehl (1962) has discussed cases of what he calls “cognitive slippage” which exhibit precisely the features to be expected as a result of dysfunctional assignment of relevance. Similarly Cutting (1985) cites a wealth of cases of delusional thinking such as the following: “A young man felt that the whole of London was uncertain and strange. He could only be sure of the date if he bought a newspaper that had been printed
75
76
Roger Lindsay and Barbara Gorayska
outside London, and only sure of the year if he visited a well known beauty spot to the south of London where there was a fallen tree whose age he knew by the number of rings on the trunk.” (Cutting 1985, p. 319)
It seems hard to deny that cases such as this involve a failure to make appropriate relevance judgements. It is not that a pathology of relevance does not exist; rather, disorders of relevance processing have been attributed to other causes. Together with the demonstrations described elsewhere (Gorayska et al., 1992; Tse, 1994; Ip, Gorayska and Lok, 1994; Gorayska, Tse and Kwok, 1997; Zhang, 1993; Zhang et al., 1993; Johnson, 1997) that relevance can be used to support learning and problem-solving in AI systems, it would seem that the considerations reviewed here establish at least a prima facie case for regarding relevance as a neglected, but important, component of symbolic planning processes. The effect of irrelevant information on problem solving received considerable attention from researchers in the 1930s. Woodworth and Schlosberg (1954) conclude their classic text on experimental Psychology with a discussion of the “atmosphere effect” which includes a review of some of this research. It is reported that in studies of syllogistic reasoning Woodworth and Sells (1935) “hypothesised that the global impression or “atmosphere” of the premises was an important factor in erroneous reasoning” (Woodworth and Schlosberg 1954, p. 846). Woodworth and Schlosberg also note that “the atmosphere effect is not confined to syllogisms. In speaking or writing you are likely to make the verb agree with the single or plural atmosphere of the subject phrase instead of with the grammatical subject, as in the examples: “The laboratory equipment in these situations were in many instances essentially the same as those used before. Is trial and error blind or not?” (Woodworth and Schlosberg 1954, p. 847)
Whilst the phrase “atmosphere effect” is a useful and evocative label, it does nothing to explain the phenomena that Woodworth and Schlosberg discuss. An explanation may however lie in inappropriate relevance attribution which in turn induces subjects to set up problem spaces incorrectly. This phenomenon arises not only spontaneously in syllogistic reasoning and syntactic concordance, but is sometimes engineered by conjurers and experts in legerdemain, as well as by school children in constructing riddles. A well-known example of the latter category is the riddle which proceeds by reminding a dupe that there is an English word meaning a humorous saying or story, which is spelt J-O-K-E and is pronounced joke, and that there is also an English word meaning of, or to do with the common people, which is spelt F-O-L-K and pronounced folk.The
Relevance, goal management and cognitive technology
victim is then asked how to pronounce the word for the white of an egg. Few give the correct answer: A-L-B-U-M-E-N. The initial information concerning spelling is in fact irrelevant, but serves to induce the respondent to believe that the required answer rhymes with “joke” and “folk”. Victims of this riddle presumably establish a problem space which contains some plan such as: ‘find a word which refers to part of an egg and which rhymes with “joke”. Pronounce the word which is found’. For the riddle to work, respondents must also fail to fully process and check all of the information in the riddle question. This failure to exhaustively check information has also reported in the case of “semantic illusion” sentences such as: ‘How many animals of each type did Moses take into the ark?’. Respondents commonly reply “two” to this question, even after repeating the question aloud, and when fully aware that it was Noah, not Moses, who was the protagonist in the biblical flood story (Erickson and Mattson, 1981; Reder and Kusbit, 1991). The “egg-white” example uses redundant preliminary information to establish a fallacious presumption of relevance. There is another type of riddle sentence which, unlike the Moses illusion but like the egg-white trick, uses irrelevant supplementary information to induce a false answer to a perfectly straightforward question. The “paradigm” form of the riddle sentence is illustrated by the following example: ‘I am going to name a colour and ask you a question. Answer the question as quickly as you can: White. What do cows drink?’ Most subjects who have not encountered the problem before erroneously respond “milk” to this query, even though they are fully aware that cows drink water. It is easy to demonstrate that the incorrect “milk” response can equally well be induced by presenting a sheet of white paper immediately before the query, instead of using the word “white”. The “colour-query” problem provides a useful way of demonstrating the difficulty that people can have in correctly assigning relevance to materials associated with a problem. In the cases cited by Woodworth and Schlosberg (1954), it is easy to dismiss the atmosphere effect as a relatively trivial failure to process local syntactic cues concerned with number or negative/affirmative information. The fact that non-linguistic visual input can also misdirect the verbal question-answering process seems to support the much more general theoretical claim that the locus of the atmosphere effect is a modality neutral problem space in which symbolic planning
77
78
Roger Lindsay and Barbara Gorayska
processes operate over elements rightly or wrongly classified as relevant to some goal. Deceptive colour-query sentences are not intrinsically deceptive, but when they are preceded by a colour cue, an incorrect answer to the query is primed by the cue and is in many cases articulated instead of the correct answer. We suggest that this occurs because the colour cue is wrongly identified as relevant, and, as a result, is included within the problem space. At least the colour query illusion demonstrates that errors in cognitive processing occur because of failures in relevance processing: if such errors can be triggered by playground riddles, it seems likely that the same phenomenon occurs in less contrived situations. The fact that cognitive processing is also affected when the query sentence is linguistic but the misleading cue is not, may be interpreted as at least weakly supporting the suggestion that the locus of the effect is a modality independent problem space established to support the symbolic planning of responses.
3. The origin and function of goals Goals are symbolic representations of states of the world, or of a planning system itself, which are the “target” of planning processes. Planning processes are attempts to sequence symbolic representations of actions and objects in a manner that allows a goal to be achieved. Planning processes are applied to models of the world and goals, to produce goal-plan pairings which are believed to be sufficient to shift the world from its current state to the target state when the plan is implemented. Goals are always and necessarily abstract and symbolic, though they usually stand for, or represent, states that are not. Goals arise from two different sources: a. Cognitive Goals Most goals are part of complex goal chains, and can perhaps more properly be classified as sub-goals. A goal is cognitive if its achievement contributes to the construction or execution of a higher order plan. Any fully-specified goal must be associated with goal satisfaction conditions (GSCs). GSCs are conditions which an agent believes that the world will satisfy when it is in the goal state. For example, in a chess game the goal is to win, the GSCs are the conditions for believing that “checkmate” has been achieved. The GSCs for a cognitive goal are derived from the requirements of the higher order plan to which its achievement would contribute. Its justification (the answer to a question such as: ‘Why are you doing x?’) is entirely in terms of the grounds for believing that the contribution it makes to a higher order plan is essential or efficient, and that the
Relevance, goal management and cognitive technology
higher order plan will be effective in achieving the higher order goal with which it is associated. b. Terminal Goals The top goal of a complex goal chain does not contribute to a higher order plan. It is therefore non-cognitive. We call a non-cognitive top-goal of this sort a terminal goal. The justification of a terminal goal is exclusively in terms of the desirability of the state brought into existence by the achievement of that goal. All cognitive goals derive their justification ultimately from the terminal goal at the head of the chain of which they form a part. How are GSCs for terminal goals specified? The specification cannot derive from the requirements of a higher order plan, because no higher order plan exists. The question ‘where do terminal goals come from?’ is equivalent in the human case to the question: ‘how does cognition interface with motivation?’ It has been claimed already that goals are symbolic representations. The relationship between any symbolic expression and the world is hypothetical; that is to say, the symbolic expression is one of a set of possible models of the world. A terminal goal must therefore be the consequence of a hypothesis concerning the relation between some possible state of the world that does not currently exist and the motivational system of the cogniser. How could a system that generates such hypotheses develop, and what would its function be? It seems likely that motivation can control behaviour in the absence of intermediary symbolic processes. Human neonates vary their behaviour as a function of motivational states, as do non-human organisms. In many cases this occurs when there is no evidence of a capacity to manipulate symbols. The lack of a capacity for symbolic representation is not however demonstrable, as it requires the proof of a negative existential proposition. For present purposes, the absence of a developed capacity for symbolic representation in most nonhumans and in human newborns will be assumed on grounds of parsimony. Motivation can be conceptualised as the result of a set of subsystems which produce an output in proportion to an increase or a reduction in the value of some system variable, such as the concentration of substances in the bloodstream, or tension in a sphincter muscle. Let us assume that motivational changes of this kind are without effect until they reach some threshold value. A system which is controlled by motivation without cognitive mediation will behave in one of two ways: either the effect of the output of a motivational subsystem reaching threshold will be to nonspecifically increase the intensity and variability of behaviour, or when some “fixed action-pattern” (Tinbergen, 1953) is available, this will be executed as soon as the appropriate “releasers” (ibid.)
79
80
Roger Lindsay and Barbara Gorayska
are detected. The problems and limitations associated with such systems are obvious and severe: survival depends upon evolution having anticipated all important needs of the organism and provided appropriate releasers when and where they are required; energy will be wasted on inappropriate behaviour in contexts where consummation is not possible; motivational effects will interfere with one another; prioritisation and deferred gratification are impossible. No doubt all manner of inhibitory links between motivational subsystems could be developed, but the effect of this will be to make increasingly complex bets about the ecological context of behaviour. For example, an organism which suppresses motivational impulses to mate in favour of vigorous action until food is ingested, might fail to eat and to reproduce in difficult times. In the absence of symbolic control, the somewhat bleak picture presented above as well describes a human being as any other organism. If human neonates lack the capacity for symbolic control of behaviour then the features described might be expected to characterise the behaviour of human neonates. Indeed it does seem to be true that some motivational subsystems in newborns operate to switch on fixed action patterns such as those involved in excretion, whereas others increase the variability and intensity of behaviour until a carer intervenes with a diagnosis of the motivational source of the problem, and the offer of whatever is necessary to alleviate it. The point of importance here is that the organism need not necessarily know (indeed in the absence of a symbolic capacity, cannot know, as knowledge is propositional, and therefore symbolic) either what is the motivational cause of its extreme behaviour, or what is required to prevent the cause from operating. The human carer for a newborn acts as an external cognitive support system, proposing hypotheses about underlying motivational states and testing them by changing the infant’s state in various ways. When an organism does not have a carer, it must simply allow itself to be carried along on the current of motivationally driven behaviour until the appropriate context for consummation is encountered. Evolution must ensure a close match between motivation, behaviour, and what exists in the environment to be encountered, if the organism is to survive. If an organism is capable of symbolic representation, then it can learn to internalise the operations carried out by a carer in the case of a human infant. An orphan creature surviving alone could similarly relate a symbolic representation of its motivational condition to a representation of the state of the world that has previously changed that condition in a desirable way. In both cases the same sequence of events must occur: a. Symbolic representation of the motivational condition
Relevance, goal management and cognitive technology
b. Symbolic representation of a world state in which condition (a) does not exist c. Derivation from (a) and (b) a criterion by which the achievement of the appropriate world state can be identified d. Symbolic representation of an action sequence which can cause the world state in (b) to exist. The predisposing motivational context provides the activation conditions for the goal: some aspect of these activation conditions is taken as criterial for triggering the whole goal structure, just as some satisfaction condition is taken as criterial for achievement of the goal. The criterial activation condition might be a feeling, a set of sensations, or a type of motor activity. The symbolic representation of actions capable of causing the world to enter the state which changes the motivational condition is a plan, and (a)–(d) above taken together, define a goal-plan pair. In the AI and problem solving-literature, all of the objects, or “givens”, which are required to solve a planning problem, are usually available. For instance, appropriate pieces and a chessboard are provided when a planner is required to solve a chess problem. In real-world planning a problem-solver does not usually have all required objects to hand. In these circumstances planning processes must specify what objects or material are required to achieve a particular goal, as well as the operations to be performed. It is undesirable for plan implementation to proceed until all of the objects which the plan requires are available. A plan will therefore be associated with implementation conditions; a plan will not normally be executed until these implementation conditions are satisfied. Goals are the cognitive representations of self-diagnosed needs, and wants, together with prescriptions for the states of the world which it is believed will satisfy them. The diagnosis of their own motivational state which individuals arrive at may well be wrong, and it is always possible that a third person who is more experienced or insightful, can provide a more accurate diagnosis. It seems likely that in the case of human beings, diagnosis of motivation is never perfect or complete, though no doubt it continues to improve throughout life. Failure to correctly detect and represent a source of motivation will result in behaviour which does not correspond to any cognitive goal, and which is not under cognitive control. Behaviour of this kind is often said to be the result of “unconscious motivation”. There is an important distinction to be made between positive and negative goals. The distinction is important because the two types of goal can have quite
81
82
Roger Lindsay and Barbara Gorayska
different relationships to planning processes. It is hypothesised that wants and needs are unavailable to consciousness unless and until they are symbolically represented as goals. Planning processes are intrinsically and necessarily cognitive and symbolic: plans represent sequences of operations which have not yet occurred, and which if implemented, will result in states of the world which do not yet exist The point of symbolically representing needs, wants, threats, and aversions, as goals is to enable planning processes to be brought to bear upon them. Positive goals represent states of the world that a plan is intended to achieve. Negative goals represent constraints on plans intended to achieve positive goals. The children’s game “Snakes and Ladders” provides a helpful analogy — planning seeks to achieve the positive goal ladders, while avoiding the negative goal snakes. The constraints imposed upon planning processes by the necessity to avoid negative goals, do not necessarily make planning processes more complex or difficult. One of several major problems for planning systems is the “combinatorial explosion problem”. This problem exists because the number of plans to be generated and evaluated rises exponentially with the number of plan elements. With only a small number of elements, the number of possible plans can easily exceed the processing capacity of any realistic system. Negative goals prune planning trees, that is, they reduce the number of plans which need to be considered. In this way, planning may actually be made easier by the existence of negative goals. It is unlikely that negative goals alone suffice to eliminate the combinatorial explosion problem. A similar device can, however, be used to make any planning problem tractable: this is the introduction of further quite arbitrary constraints on planning, which have no other function but to limit the class of possible plans. It seems at least possible that this is the functional explanation for the existence of subjective preferences in human beings. By definition, subjective preferences have no other functional justification, and it is hard to understand how a propensity for subjective preferences could evolve without some functional value. If some preferences need to exist to enable planning to occur, but their value can be assigned quite arbitrarily, then it becomes comprehensible that biology should dictate that people have preferences, but individuals should be free to decide what they are. If this suggestion is correct, it would seem to follow the existence of subjective preferences in a species is an indication that the species is capable of symbolic planning. This analysis of terminal goals and their relation to motivation implies that there are two distinct, and possibly even competing, control systems for human behaviour. The initial control system (ICS) is non-symbolic, unconscious, and
Relevance, goal management and cognitive technology
inefficient in terms of minimising the energy expended to satisfy a given want or need. The goal management system (GMS) is symbolic, conscious, and capable of achieving high levels of efficiency. The problem for evolution is how to pass control from ICS to GMS when the symbolic representations required by GMS cannot be specified in advance of experience. We suggest that in effect, a neonate is equipped with an on-board computer, which seeks to symbolically represent motivational conditions, states of the world that will modify them, and plans that will produce those states of the world. The symbolic planning system is a second-order system creating symbolic representations which capture regularities in the behaviour of the first order system, which operates under the autonomous control of the ICS. As appropriate symbolic representations are established, so the occurrence of motivational activation conditions for goals already represented will result in control passing to the on board computer which can utilise symbolic information in memory to modify those motivational conditions more efficiently. It is hypothesised that consciousness is intimately tied to planning processes, but that once formulated plans may be executed automatically: the activation conditions for a goal can initiate an action sequence directly without any need for further conscious planning processes to occur. There is a clear resemblance here to distinctions such as those between automatic and controlled processing (Schiffrin and Schneider, 1977) and supervised versus schema-controlled action (Norman and Shallice, 1986). Conscious symbolic planning thus has a “managerial” role in the regulation of behaviour: identifying problems, generating solutions, and delegating the implementation of effective plans to lower level agencies whenever possible. The set of goals identified and available at a particular point in the developmental history of the system constitutes the system’s model of its own motivations. There is no reason why this model should ever be complete or accurate. Consequently individuals might be expected to misidentify some sources of motivation, and to fail to identify others, the result of these failures will be the frustration, conflict, and control of behaviour by forces of which they are unaware, which we observe in other people, all the time. Oddly, there seems to be some resemblance between the processes assumed in this theoretical analysis and those of Freudian Psychoanalysis. For example, transfer of control from ICS to GMS has some features in common with the Freudian idea that “where Id is, there Ego shall be”; and the possibility of continued control by ICS where motivated behaviour has not been symbolically captured by GMS might be expected to result in behaviour the origins of which are unavailable to consciousness.
83
84
Roger Lindsay and Barbara Gorayska
Some of the advantages resulting from the shift from ICS to GMS and establishing cognitive goals are summarised in Figure 1 below. Evidence for the employment of goal-driven categorisation and an extended discussion of its utility is provided by Barsalou (1991). The importance of cognitive goals in controlling dynamic aspects of behaviour (Figure 1, point 8), is discussed, for example by Carver and Scheier (1982), and Hyland (1987, 1988). Oatley and Jenkins (1992) have also explicitly linked cognitive goals to emotion: “Emotions have a biological basis. The most plausible current hypothesis is that they function (a) within individuals in the control of goal priorities, and (b) between people to communicate intentions and set outline structures for interaction” (Oatley and Jenkins, 1992, p. 78). In the account we offered earlier of the origin of relevance information, we proposed that a symbolic planning system runs in tandem with a connectionist learning system and establishes relevance relationships by seeking systematic relationships between input to the connectionist system, and variations in output and feedback. In our account of goals, we have proposed that a symbolic planning system establishes symbolically represented goals by seeking systematic relationships between motivational conditions and those states of the world that change them. We have noted already (see pp. 68–78 above) that this theoretical claim is consistent with evidence that in many cognitive domains two distinct processing systems exist. A clear example comes from work on the “blindsight” phenomenon, whereby visual information can apparently continue to guide forced-choice motor responses, even though, as a result of cortical damage, such visual informationship is not available to verbal report or to guide planning processes (Weiskrantz, 1988).
Benefits of Cognitive Goals 1. 2. 3. 4. 5. 6. 7. 8.
Generation and testing symbolically represented hypotheses Goal management, including use of goals as plan elements, and goal prioritisation Plan optimisation Voluntary control of behaviour Relation of current goals to symbolically represented material in memory External control of behaviour via symbolic input Enabling goal-driven categorisation Provision of reference criteria for dynamic aspects of behaviour
Figure 1.A summary of some of the benefits conferred by establishing cognitive goals.
Relevance, goal management and cognitive technology
4. A goal management system On grounds both of parsimony and theoretical coherence, it would seem reasonable to suggest that there is a single goal management system which utilises (and may lose contact with), perceptual information, motivational information, and relevance information as part of an integrated process of establishing goals, diagnosing when it is appropriate to seek them, and formulating plans for their achievement. There is a good deal of plausibility in the general notion that the symbolic representational system which mediates understanding of the world is unified and second-order. It is not essential for motivation, perception, or action — how could it be without symbols having innate meaning — for newborns can be motivated, can perceive, and initiate action. The symbolic planning system can however offer the possibility of rational planning and systematic management. It can hugely enhance the efficiency of learning by making available hypothesis testing as an additional problem-solving procedure. But most importantly, it can create a whole new world of possibilities through communication and cooperation. The existence of a goal management system implies the necessity for planning processes that use goal representations as their components. This in turn implies the existence of metagoals and heuristics for establishing and manipulating goals. Whilst we do not intend to discuss these processes in detail in the present paper, some examples of the kind of heuristics we have in mind are presented in Figure 2 below. Many of these heuristics involve plans that incorporate other agents. Goal management inevitably forms a bridge between individual cognition and social processes. We next turn to ethical reasoning, an aspect of cognition with respect to which social behaviour is brought centre stage. Though there have been some instructive explorations of the development of ethical values (Kohlberg, 1981), Cognitive Psychology and Cognitive Science have almost entirely neglected ethical problems and ethical decision-making. This neglect is unfortunate, as it is difficult to see how the cognitive regulation of human action, understanding of the actions of others, and comprehension of action related discourse can dispense with ethical notions. Ethical language and concepts constitute such a large proportion of human discourse that it is almost inconceivable that such terms and concepts are devoid of cognitive significance.
85
86
Roger Lindsay and Barbara Gorayska
Metagoals and their Implications 1. Bring as much behaviour as possible under cognitive control — identify and symbolically represent as many needs and wants as possible. 2. Plan so as to maximise the value of positive goals achieved. Three factors will contribute to decisions here: the value of each goal in the set about which a decision is to be made; the number of goals which can be attempted simultaneously or during the interval over which the decision process ranges; the expectancy that goals will be achieved if an attempt is made to achieve them. 3. Plan so as to minimise the number of negative goals experienced. Negative goals will include the experience of failure. This metagoal will therefore entail decisions concerning the competence of the system itself, the probable effectiveness of plan proposals, and the level of difficulty of tasks to be attempted. A system’s estimation of its own competence is hypothesised to be equivalent to the construct labelled “self-esteem” in studies of human judgement and performance. 4. Monitor achievement and switch goals if progress is unsatisfactory. This metagoal can be related to emotion via the role of control theory in goal-setting. (Hyland, 1988) 5. Plan to optimise parameters such as time and cost (effort, and derivatives such as money, materials, etc.). 6. Store frequently used plans. 7. Optimise planning and representation processes. This metagoal will require the relationship between plans and states of the world to be represented in memory as simply as possible. 8. Enlist the cooperation of other agents whenever this is advantageous. Other agents may assist not only in developing and executing plans, but also in, for example developing symbolic representations for the efficient use of memory. 7 and 8 together suggest that science may be a socially externalised institution for the discovery of efficient data compression algorithms. This would explain why parsimony features as such an important constraint on scientific theorising; something which has hitherto proved resistant to explanation. 9. Increase the capability of the system for planning and goal representation. GMS must have learning goals, as well as performance goals 10. Learn from other agents under appropriate conditions. The potential gains from utilising the experience of others are clear. There are however many important questions to be asked about the circumstances under which a cognitive system can reasonably decide to modify its permanent memory on the authority of another. Clearly such attributes of the “authority” as credibility, consistency, honesty, integrity, character, etc., are relevant. In our own society, free reign is usually given to professional teachers to modify and shape the cognitive system of children. Quality control of the modification process is handed over to external agencies. 11. Try to maximise the probability that other agents can be relied upon. This will entail minimally, choosing friends and teachers carefully, but less directly, advocacy of honesty, discouraging deception, insistence that promises be kept, etc. 10 and 11 together make a case for the pragmatic utility of ethics in the management of cognitive processes. This case is further explored below. 12. Teach other agents if this might enhance their power to cooperate or to assist others to do so.
Figure 2.Examples of metagoals and heuristics used in the cognitive management of goals.
Relevance, goal management and cognitive technology
5. Goal management and the cognitive function of ethics The function of ethical language has been largely ignored, both by Linguistics and Cognitive Psychology. In a recent study by one of the authors (RL), in which undergraduates were tape-recorded while discussing the design for a laboratory practical, no less than 45% of sentences contained some element which could loosely be classified as ethical, such as “ought”, “should”, “good”, etc. We do not wish to make a great deal of this, and accept that different samples of people on different days or with different topics might yield different data. But the finding does serve to underscore what is probably obvious: ethical language is a ubiquitous feature of the everyday cognitive environment. Philosophy is probably much to blame for the lack of attention to the cognitive function of ethics. There is an apocryphal story of an Oxford graduate who was asked whether he had learned anything which had endured from his study of the philosophy of ethics. He replied that he had learned never to use the word “good” as it was clear to him that no-one knew what it meant. Ethics in general has become almost inextricably mired in metaphysics, and this has served to distract attention from its cognitive function. This is in spite of the fact that a number of relatively recent ethical philosophers (Urmson, 1950; Hare, 1952; Hampshire, 1960; Gauthier 1963) have offered theories of ethical language which emphasise its role in making and influencing decisions, and in planning actions. It seems likely that the major reasons for the failure of Linguistics and Cognitive Psychology to take up and apply these theoretical analyses are: a bias towards individualism, the absence of any satisfactory theoretical framework which can offer an account of how ethical principles can be related to cognitive mechanisms, and how ethical language can be used by one person to influence the planning processes of another. The theory outlined in this paper attempts to overcome individualist bias, and to explain how ethical language and cognitive mechanisms are interrelated. To begin quite generally, we believe, along with many other recent commentators, that syntactic and semantic approaches to language have proved incapable of providing any real insight into language mechanisms. A more promising approach is to seek to analyse language processes at the level of pragmatics. On this assumption an articulated sentence is an implemented plan intended to influence the behaviour of its audience. The locus of its intended effect is the symbolic planning processes of the recipient of the sentence. On the basis of arguments presented earlier, this means that a
87
88
Roger Lindsay and Barbara Gorayska
sentence may have its impact upon the recipient’s goals, or plans to achieve a goal (holding objects and operations constant), or upon the objects and operations employed in planning. In the GMS framework, sentence comprehension is the process by which the hearers of a sentence recover the intended effect of that sentence upon their own planning processes; utterance comprehension is the process by which hearers seek to reconstruct the plan which led a speaker to utter a particular sentence. These may differ: a speaker may interpret the sentence “Fire!” to mean that the goal of vacating a building should be adopted, but the plan leading to its utterance may be to steal the hearer’s briefcase. This approach has been growing in popularity over recent years, both in Linguistics (Levy, 1979) and in Artificial intelligence (Bruce, 1975; Allen and Perrault, 1978; Cohen and Perrault, 1979; Appelt, 1985; Carberry, 1990). Its main attraction is that it moves beyond analyses of language which are little more than impoverished paraphrases in some logical formalism; it embeds language processing in more general cognitive operations, and in action planning in particular; and it offers some solution to the problem of indirect speech, which has proved quite intractable within syntactic and semantic approaches to language. To illustrate this last problem: it has not proved possible to find convincing syntactic or semantic grounds for treating ‘Buy me a drink’, ‘My glass is empty again’, ‘I wonder if the barman will cash a cheque?’, and ‘Would you like another?’ as linguistically equivalent. It is much less difficult to see how each could be uttered with a view to producing an equivalent effect on the planning processes of an unobservant companion. We have proposed elsewhere (Gorayska and Lindsay, 1989a,b; Lindsay, 1996b), that the processing of goals, plans, and plan elements are constrained by social factors as well as by the characteristics of the physical world. Other social agents may impose or enjoin the adoption of particular goals, or the use of prescribed plans or plan elements to achieve a goal which has been freely selected. In our culture imperatives are used to signal an attempt to impose constraints upon symbolic planning processes, while ethical terms are used to enjoin an agent to adopt a goal, a plan, or a plan element. To give one or two examples: ‘you should try to get a job’ is to enjoin a goal; ‘drive on the left’ is an imposed plan; ‘you ought to take a bottle of wine to the party’ is an enjoined plan element. A probable objection to this claim is that at best it can only apply to “prudential” and not to “genuinely moral” reasoning — the distinction which Kant (1953) sought to capture by contrasting hypothetical with categorical imperatives. Our answer has three components: firstly metaphysics has hampered
Relevance, goal management and cognitive technology
understanding of ethical language, not helped it; there is no respectable independent justification for the notion of objective moral ends. Secondly, there is no convincing case for the claim that words like “ought” are ambiguous: ‘It is surely more plausible to argue that “ought” is not multiply ambiguous, that ……different equivalents correspond not to different uses of “ought”, but only to the differing grounds underlying the judgement’ (Gauthier 1963, p. 21). Thirdly, the use of “ought” and related words for the purpose of enjoinment are best construed as signals that the narrow interests of the agent are not the only interests at stake: the prudential concerns of social groups of which the agent is a member are also relevant, and these concerns may be presented “as if” they are objective constraints on an agent’s choice of action. There now exists a well-established AI technology concerned with problemsolving, decision-making and action planning, which is known as “expert systems” research. The basic premise upon which this research is founded is that intelligent decisions and actions require the application of logical operations to knowledge. Expert systems thus consist essentially of an “inference engine” which controls inference, and a “knowledge base” which contains appropriately represented knowledge. “Knowledge” consists of facts, rules, and heuristics, which are elicited from human experts. Though humans are indubitably more complex than current expert systems, these distinctions are often helpful in considering human cognition. In the present context, the question suggested by analogy with expert systems is whether ethical knowledge which constrains human action planning can be captured as facts, rules and heuristics? Our view is that not only is it true that ethical knowledge can be accommodated within this framework, but that some traditional ethical problems cease to be troublesome when so conceptualised. Propositions such as ‘x is good’ are clear candidates for ethical facticity, closely resembling as they do such naturalistic facts as ‘grass is green’. Similarly ‘do not steal’ has both the form and the substance of a rule. Perhaps the most interesting of the three categories, however, is that of ethical heuristics. Heuristics are decision-making principles which can profitably be employed when it is too costly to compute and evaluate every possible cognitive option. An important feature of a heuristic is that though in general it increases the probability of a positive outcome, there is no guarantee that this is so. For example, in chess playing, where the number of alternative moves rapidly exceeds exhaustive computability, such heuristics as ‘dominate the centre squares’ and ‘protect your queen’ are often used. No matter how diligently such heuristics are employed, their use will not ensure victory. Plausible candidates for heuristic status in the area of ethical knowledge
89
90
Roger Lindsay and Barbara Gorayska
are principles such as ‘killing is wrong’ and ‘promises should be kept’. Much philosophical energy has been expended on demonstrating that universal application of these latter principles results in ethical paradoxes when, for example, killing is morally justifiable, or promise-keeping results in manifest harm. These paradoxes cease to puzzle when the principles which give rise to them are interpreted as heuristics, when invoked by an agent in action planning, which are bound to yield negative outcomes in some cases; and as enjoinment devices when invoked by a third party, which function to constrain the actionplanning of others, not to capture ethical absolutes in propositional form. It is widely acknowledged that limitations on processing time and capacity make it inevitable that human beings use heuristics in planning and decisionmaking (Kahneman and Tversky, 1982; Kahneman, Slovic, and Tversky, 1982). Most studies to date which have presented empirical evidence demonstrating the effects of heuristic use, have focused on the processes of memory search and retrieval. There is a strong case for arguing that effects of heuristic use might be expected to manifest themselves most strongly in the domain of action planning, particularly where agents must allow for interactions between their own behaviour and that of others. Certainly, limitations on computability will drastically restrict decision-making in this area. The problem of planning actions which affect, and are affected by other agents, is precisely the domain to which ethical principles apply, and it is therefore tempting to suggest that much of the sophisticated wrangling over the universality of ethical principles in which philosophers have engaged is best interpreted as dramatic evidence of the operation of cognitive heuristics. A compelling recent example is Baron’s (Baron, 1994) discussion of “nonconsequentialism”. Baron argues that while rational agents should seek to maximise the extent to which they achieve their goals, their are classes of case in which people demonstrably fail to do so, for example in laboratory studies, subjects consistently award “third-party compensation on the basis of the cause of an injury rather than the benefit of the compensation, they ignore deterrent effects in decisions about punishment, and they resist coercive reforms that they judge to be beneficial” (Baron, 1994, p.1). He infers that in such cases defective decision rules are being employed which should be identified and corrected. No-one would similarly be tempted to suggest that an expert system which produced suboptimal decisions in some cases was malfunctioning: this is exactly what would be expected of a heuristicbased decision-making system. Similarly, if consequentialist principles function as heuristics, then the reported cases of tolerated suboptimality are entirely to be expected.
Relevance, goal management and cognitive technology
If actions affect other agents, and actions are partially determined by knowledge, then the mechanisms of enjoinment and imposition might be expected to apply even to knowledge that is not intimately connected with action planning, as ethical knowledge obviously is. We believe that these processes do operate on knowledge which is relatively naturalistic. To give an example: in Western society, university science is one of the main institutionalised vehicles of knowledge manufacture. As the process of knowledge refinement is endless, those involved in the refinement process are aware that that knowledge is always provisional and revisable. In teaching and disseminating the results however, belief in currently accepted theory is enjoined. It is that which students and the public must believe and act upon, if their actions are to have the best chance of success. The consequence is the frequently noted paradoxical tendency of apologists for science to claim both that current theory is demonstrated truth, and that current theory will inevitably be modified and revised in the future. There are two implications of the claim that the mechanisms of imposition and enjoinment are the central devices by which a group seeks to generate consistency and coherence in the beliefs and actions of its members. One is that imposition and enjoinment are important determinants of between-group cognitive differences. The other is that what is enjoined and imposed within a group is crucial to an understanding of the cognitive processes of group members. Explanation of cognitive differences between cultures and sub-cultural groups has not in general been an area in which Psychology has been conspicuously successful. Early approaches assigned a constitutional origin to any observed differences; later, linguistic differences were invoked, for example by Whorf (1956) for cross-cultural differences and by Bernstein (1964, 1971), for differences between subcultures. More recently attention has turned to ecological variables. Irvine and Berry (1988), for example, propose an ecological interpretation of the “Law of Cultural differentiation” which asserts that “all populations have the same perceptual and cognitive processes and the same potential for cognitive and perceptual development, but ecological and cultural factors “prescribe what shall be learned and at what age; consequently different cultural environments lead to the development of different patterns of ability (Ferguson, 1956, p. 121)”” (Kagitcibasi and Berry, 1989, p. 498). The mechanisms by which differences in ability arise are to say the least, underspecified. The theoretical approach developed here and elsewhere attempts to be much more specific about what these mechanisms are.
91
92
Roger Lindsay and Barbara Gorayska
One important factor is what we have called the “fabricated world” effect (see Gorayska and Lindsay, 1989a,b and further discussion in Section 6 below) that can massively amplify intrinsic ecological differences. For example, instead of investing time in remembering a complex and arbitrary environment, people may instead choose to construct cities laid out according to simple algorithms: the world is adapted to the limitations of memory rather than vice versa. Similarly buildings are frequently organised around goal satisfaction, with specific rooms to eat, sleep and cook, or specific buildings to drink, dance, or borrow a book. In this way, commonly sought goals in a culture become realised as physical structures, and goal-oriented planning is conditioned by the physical possibilities that exist. A second way in which cultural differences in cognition can be induced, is by the use of imposition and enjoinment to constrain knowledge and planning processes. Initially the locus of imposed and enjoined constraints is the symbolic planning system, but as we have suggested earlier, once successful plans are formulated they are likely to be downloaded to an automatic, non-symbolic, action generation system, which means that the constraints initially imposed via ethical and imperative forms of language eventually become unconscious and habitual. If this account is correct, then human planning activities are, at least in part, shaped and controlled by enjoinment, through the use of ethical language. As particular ethical exhortations are usually, if not necessarily, derived from more general ethical values and principles, it should follow that the goals, plans, and actions that arise as a result of the enjoinment process are systematically related to the ethical values and principles that underlie them. This in turn suggests that ethical values and principles can be used by outside observers to explain regularities in the behaviour of members of a group within which those values are current, and that they can be employed by group members themselves, to assist in understanding the behaviour of their fellows. Some of the ways in which ethical principles and maxims can contribute to the cognitive processes underlying action planning are summarised in Figure 3 below. Ethical values and principles operate to reduce the cognitive demands of goal selection and planning, to enable cooperative goal-seeking, and to facilitate the construction of mental models of other agents, derived from their actions, which permit cognitive recovery of their goals. The cognitive benefits of ethical constraints on reducing the number of possible plans to be evaluated in actionplanning contexts are closely analogous to the effects of subjective preference, discussed earlier (see p. 82). It is now widely accepted that AI models that seek to understand human text about the social and physical world require access to
Relevance, goal management and cognitive technology
Cognitive Functions of Ethical Values 1. 2. 3. 4. 5. 6. 7.
Facilitating choice of goals Reducing the range of acceptable plans and plan elements to be considered Conveying plans not validatable within a single lifetime experience. Eg.: ‘rules for a good life’ Providing a framework for cooperative goal-seeking Enabling socialisation via “enjoinment” Offering a source of principles to explain the behaviour of other agents Allowing the credibility assessment of agents attempting to enjoin knowledge
Figure 3.Some cognitive functions of ethical values.
a rich set of physical principles and social conventions. An example is given by Charniak (1972): Jane was invited to Jack’s birthday party. She wondered if he would like a kite. She went to her room and shook her piggy bank. It made no sound. Charniak argues that a text comprehension system would need to know at least, that: – – – – –
Guests are expected to take gifts to birthday parties. A kite is a suitable gift for a child. Purchasing a gift requires money. Money is often kept in piggy banks, particularly by children. Piggy banks containing coinage emit sound when shaken.
In the absence of ethical principles, the actions, including dialogue, of artificial cognitive systems cannot be comprehensible as human action is; nor without access to ethical values as a source of hypotheses can artificial systems understand the behaviour of humans, learn from them, or cooperate with them. If machines are to be treated as agents, or even to understand the behaviour of agents, they must first be given ethical values. It might be admitted that ethical values and principles have some connection with action and action planning, and yet be denied that this confers any particular plausibility on the model we have proposed which assigns a fundamental role to the processing of relevance information. Is there any good reason to assert that ethical reasoning and relevance processing are closely connected? We believe that there is. If ethical knowledge is represented in memory as abstract rules and general principles and heuristics, much of ethical reasoning must be concerned with determining which principles and heuristics are
93
94
Roger Lindsay and Barbara Gorayska
relevant to particular contexts in which action is required. If we are correct, then the notion of relevance must play a central, though unacknowledged part in ethical debate. In support of our contention, we quote from two philosophical texts concerned with ethics, these texts were chosen simply because they were readily available in the circumstances under which are writing: any of a wide range of others would no doubt serve as well. Singer’s (1963) book is concerned with the problems of generalisation associated with the Kantian “categorical imperative”, crudely, the claim that an agent ought to carry out only those acts which they are prepared to “will as universal law”, that is to advocate that any agent ought to carry out under similar circumstances. We take Singer’s book as an example because it is centrally concerned with those judgements generally regarded as uniquely ethical. Singer notes that there are serious problems involved in deciding who is similar to an agent making an ethical judgement, and when their circumstances are similar, and when not. He argues that “while some similarities, and differences, can always be specified, not all of them will be relevant ones. The generalisation principle must be understood in the sense that what is right for one person must be right for every relevantly similar person in relevantly similar circumstances” (Singer, 1963, p. 19, author’s italics). In discussing specific moral principles, Singer declares: “though they hold in all circumstances in the sense that violations of them are always wrong, they are not relevant in all circumstances. On the contrary they are relevant only where the corresponding rule is relevant. The principle that it is always wrong to kill for the sake of killing is not relevant where killing is not involved” (ibid. p. 109). By contrast with Singer, Gauthier (1963) denies that there are uniquely “categorical” (or ethical) judgements. He believes that all ethical reasoning is “practical”, that is rooted in the specific context in which action is necessary. His dependence on the notion of relevance is however, no less: “a person need not consider all wants in determining what to do. Only a minute fraction of all wants have any possible relevance to the situation in which he is to act, or to what he may do. Indeed, most of the wants in each person’s practical basis never do nor can enter into his reasoning, although it is impossible to provide criteria to determine, in advance, just which wants may possess practical relevance” (Gauthier, 1963: 86). The quotations provided above only illustrate the point we have argued, they cannot prove its truth. Nonetheless, the considerations to which we have drawn attention add up to a powerful case for reconsidering the role of ethics in cognition. We have argued that ethical knowledge plays an important part in
Relevance, goal management and cognitive technology
action planning; that ethical language is used to shape the cognitive processes of members of social and cultural groups; that ethical values and principles are the main dimensions along which social actions are interpreted and understood; that artificially intelligent systems must act upon and have access to ethical values and principles if they are to participate in dialogue with humans and act in a manner which is considered meaningful; and finally, that because of the part played by ethical knowledge in action planning, relevance information is central to the use of ethical knowledge and though never acknowledged, has consistently been invoked as a prominent element of ethical discussion.
6. The fabricated world hypothesis This hypothesis was first proposed by Gorayska and Lindsay (1989a,b). The name assigned to the hypothesis was intended to reveal its debt to the earlier “carpentered world hypothesis” of Segall, Campbell and Herskovits (1966). In the 1950s and 60s a large body of evidence was accumulated indicating that preliterate, rural tribespeople were less susceptible than urban dwellers to a range of visual illusions. Campbell, Segal and Herskovits suggested that this might be because constant exposure to regular rectilinear structures in the “carpentered world” of modern cities predisposed the perceptual systems of city folk to interpret two dimensional arrays of straight lines as geometrically regular figures in 3-dimensional space. People from a rural environment in which rectilinearity is almost never encountered were less likely to use the same visual processing heuristic, and hence experienced the illusion to a lesser extent. Lindsay and Gorayska suggested that cognitive effects created or amplified by the environment were not confined to visual perception. Built environments, or “fabricated worlds”, have very different information processing properties from natural environments. The human cognitive system is only able to control motor vehicles travelling at scores or hundreds of miles per hour because highways afford an artificially predictable context in which relevant change is relatively rare and usually signalled well in advance, for example by road signs. Millions of people a year find their way through unfamiliar airports, often without assistance, because of the informational structure of airport organisation. Strangers can navigate abroad, because engineers have constructed an environment that embodies informational structures already in the head of the traveller. If we know what people know, we can design an environment that fits their preexisting knowledge structures like a glove. Novel environments can be mastered
95
96
Roger Lindsay and Barbara Gorayska
without new learning because they have been engineered to fit old learning. The idea that the human cognitive system is not a stand-alone device, but part of a larger unit which incorporates elements of the external environment (the parts we have learned about) has arisen in several contexts. Andy Clark (2001) uses the term wideware to capture the fact that memory is outside in the world as well as in the head. Taking the same idea, that human cognition must be viewed as part of a larger whole that incorporates the learning environment: O’Regan and Nöe (2002) have sought to explain unexpected perceptual phenomena such as change blindness. This term is used to refer to the discovery that for example, when a visual image is modified even as it is being inspected, if the introduction of the change is masked by a flicker, the change to the image even though perceptually gross, is often not noticed. Similarly, people frequently fail to notice when a receptionist interacting with them bends below a desk, and is replaced by a different person who continues seamlessly with the interaction. O’Regan and Nöe argue that change blindness occurs, because people do not have access to a “direct” representation of the perceptual environment in memory, that can be compared with current input to detect change. But it seems that we do: phenomenology gives us a false impression of the nature of our own consciousness. Instead of a cognitive representation of external reality inside our heads, O’Regan and Nöe suggest that the world is its own memory — perceptual memory doesn’t need to remember everything, only the “motor coordinates” of where information not currently relevant can be found if needed. An AI system built on similar assumptions, that directly uses relevance information to control what parts of a visual array are seen was described by Ip, Gorayska and Lok (1994). O’Regan and Nöe however, draw much more far-reaching implications from the idea that the environment is embedded in human cognitions: “we suggest that the basic thing people do when they see is that they exercise mastery of the sensorimotor contingencies governing visual exploration. Thus, visual sensation and visual perception are different aspects of a person’s skillful exploratory activity”. The existence of sensory modalities can readily be explained by this approach: “the difference between seeing and hearing is to be explained in terms of the different things that we do when we see and hear”, and there are implications for understanding consciousness itself: “way of thinking about the neural bases of perception and action is needed that does not rest on the false assumption that the brain is the seat of consciousness”. These recent suggestions that mind and the environment are an integrated whole seem to considerably increase the salience of a relevance-based theory of
Relevance, goal management and cognitive technology
cognition. Relevance-based processing provides a mechanism that is sufficient to explain how the mind-environment duplex operates: action planning confers relevance, because only the set of perceptual features that can form part of action plans currently being developed or executed is available within the symbolic problem space. Associative relationships developed within the connectionist component of the cognitive system provide at least the initial set of features to which the problem space has access. This arrangement will naturally cause the development of cognitive modules and sensory modalities as a collateral outcome. When two problem space generators use disjoint operator sets they are functionally independent processing systems: cognitive goals constrain the set of plans that will work; the set of plans that will work constrains the set of plan elements that can be used in implementation. When operator sets are disjoint, relevance discontinuity exists between the two systems: operators used by one problem-space generator are never relevant to the other. Whilst many theorists (e.g., Fodor, 1983, and most present-day neuropsychologists) have proposed that the mind is modular in organisation, it has proved impossible to locate a neurophysiological basis for this organisation. Damage to the brain can sometimes cause specific deficits in language, perception or memory, but damage to identical parts of the brain in other patients does not invariably lead to similar neuropsychological impairments. Modules seem to exist in the mind, but not in the brain. A GMS processing relevance information appears capable of showing how this can occur. The Fabricated World Hypothesis goes beyond the bare assertion of the environmental embodiment of mind, to draw attention to the technological possibilities of manipulating mind by engineering the environment.
7. The cognitive technology of goal management Let us suppose that the theory of relevance processing and goal management that we have described is true. What are the pragmatic implications? How can this knowledge be put to technological use? a. Controlling problem spaces via connectionist learning and modifying connectionist models via problem spaces The theory of cognitive goal management via relevance relationships proposes that the unconscious and automatic control of actions is handled by a connectionist learning system that provides the base inputs to a conscious symbolic hypothesis testing module that operates by developing explicit propositional
97
98
Roger Lindsay and Barbara Gorayska
models of the behaviour resulting from lower-level connectionist control processes. A symbol-based cognitive system that is sufficient to learn by imitating the behaviour of others, must also be sufficient to learn by symbolically modelling the system’s own behaviour. However, the set of operators available to the symbolic processor is restricted to the set of operators designated as relevant by connectionist learning processes. The more powerful hypothesis testing paradigm available within the symbol-based learning system may require the use of operators not designated as relevant by connectionist processes, and hence cognitively unavailable, or available only in propositional form as intellectual beliefs that seem “contrary to intuitions” or unconnected with how a person acts. One practical implication of the goal-management theory is that action-planning at the symbolic level can be facilitated, and actions can be brought into line with beliefs by providing learning experiences that add new operators into the problem space. For example, the beliefs developed by theoretical physicists about the relationship between matter and gravity are symbolically coherent, but are usually found to be counter-intuitive and cognitively complex, presumably because they can only be handled by a serial operation working-memory with restricted processing capacity. However, the mass-gravity relationship can be modelled by using a rubber membrane distorted by objects of varying mass that are placed upon its surface. Experimenting with a model of this kind is directly training the connectionist actioncontrol system to operate with an extended set of action possibilities. But the utility of this particular training procedure could only be established by working through the higher-level symbolic system — it could not have been arrived at directly by observing the effect of action contingencies. In a way, there is nothing new here: every teacher knows that if intellectually abstract and complex relationships can be experienced as physically instantiated by some model, learning is facilitated. What we are providing is an explanation of why this occurs, and an insight into the mechanisms underlying the process so that these processes become easier to manipulate in a principled way. b. Using shared learning to develop cognitively efficient, fabricated environments (eliminating relevance discontinuities) It is old news that operating environments can be made user-friendly by ensuring that they meet user requirements. It probably follows from Norman and Shallice’s (1986) model of action control that user requirements have more to do with habitual patterns of action controlled by the unconscious implementation of schemas in the contention scheduling process, than with consciously controlled behaviour. These familiar principles are surely given a new spin
Relevance, goal management and cognitive technology
when we suspect that the reason schema-driven action is unconscious is that it results from connectionist processes operating in a biologically primitive action control system. Further, modification of this system cannot occur directly through top down influences from the symbolic processor because connectionist systems cannot be symbolically programmed. Changing someone’s beliefs about effective actions in a given domain will only change their behaviour when slow serial voluntary control is exercised, or when as a result of changed beliefs, an agent systematically practices new stimulus response contingencies to bring actions and beliefs into line. Think about a concert pianist learning a new finger movement: knowing how the action is to be executed at the level of belief, may provide the incentive for practice and the feedback by which practice is guided, but the belief is no substitute for the practice. Now it becomes quite clear that if what we mean by meeting user needs is enabling the direct transfer of schema-controlled actions, we cannot establish user needs by interviewing users, or by asking them to complete questionnaires. Cognitive tools built on the basis of such data, generated by what are probably the most commonly used methodologies in interface engineering, should match performer beliefs, but this provides no assurance that they will support skilled action. The new model suggests that we should first seek discrepancies between the two control systems, then, when discrepancies exist, decide which system it is most important that a new tool interfaces with. Ramachandran (1996) has demonstrated how cognitive control of sensory feedback from (phantom) limbs can be established by modulating visual input (using mirrors to fool the connectionist system into acting as if an amputated limb is still present), when it is completely impossible to affect it via a person’s beliefs. Marcel (1992) asked GY, a muchstudied individual with a right-sided hemianopia and blindsight to respond in 3 ways to a 200ms bright light in his blind field. The 3 response modes were: blink right eye, forefinger button-press, and say ‘yes’. Marcel’s results showed the fastest responses for blinking, then for button pressing with vocalisation being slowest. GY commonly “dissociated”, e.g., said ‘yes’ with an eyeblink and ‘no’ with a finger response finger on the same trial. Vocal responses were lowest and least accurate. Marcel found similar results when normal participants were presented with the same task. Findings such as those of Ramachandran and Marcel clearly demonstrate that our cognitive systems are not as they appear to phenomenal analysis: unified and under the control of conscious will. When we speak to another person, it is already clear that the message we transmit is available only to one department in the bureaucracy of mind: a department that is ignorant of much
99
100 Roger Lindsay and Barbara Gorayska
of what goes on elsewhere, and unable to control many of the events of which it has knowledge. Cognitive science is beginning to reveal the true nature of the Wizard-of-Oz-style mechanisms that really operate the levers and pulleys of our minds. Cognitive technology must follow in the train of cognitive science, developing techniques by which the various subsystems of mind can be influenced or modified, as their properties become manifest. (For an early example of this practice, see Meenan and Lindsay (2002); for further discussion of desirable methods and practice in Cognitive Technology see Gorayska and Mey (1996), Gorayska and Marsh (1996 & 1999), Marsh et al. (1999), and Gorayska, Marsh and Mey (2001)). c. Modifying social action planning via ethical engineering Most people already believe that in a dimly understood way, the ethical principles and precepts shared by a society have some effect upon the quality of social life that results. The theoretical analysis of cognition as a goal management system within which ethical constraints play an essential role in ensuring the tractability of action planning computations, provides a much clearer picture of how ethical beliefs can be causally effective. This analysis offers a naturalistic account of ethics whilst still retaining some basis for claiming that ethical precepts are not just empirical generalisations, that they cannot (as heuristics rather than assertions) be falsified; that they play a cognitively essential part in the production of socially acceptable behaviour, and that they promote the well being of groups of which an agent is part, as well as serving the interests of individuals themselves. Questions about relative efficiency and effectiveness naturally follow any functionalist analysis, and the naturalistic account of ethics we have offered above is no exceptions. Until now, no explanations have been available of how ethical reasoning operates at a cognitive level. Instead most people seem to operate with a vague notion that “virtue is its own reword” viz. acting according to moral precepts is itself virtuous, and this will either enhance an agent’s social desirability in the present or result in some form of posthumous remuneration. Our analysis suggests that the connection between ethics and theistic epistemologies is solely of historical interest. Social agents need to operate under the constraint of principles of some kind, to moderate the computational burden of social action planning. Societies will benefit if the principles employed tend to promote the achievement of ends that are in the general rather than the individual interest, or that include precepts related to outcomes that are too remote from individual behaviour to otherwise figure in their action planning. Examples of the former kind might relate to smoking in public, vaccinating
Relevance, goal management and cognitive technology
children against childhood diseases, driving safely, or limiting noise pollution resulting from personal sound sources. These are all examples of cases in which the well-being of the community is likely to benefit from the acceptance of constraints upon the behaviour of individual agents. Examples of moral precepts associated with remote ends might relate to environmental conservation issues: for example constraints upon waste disposal practices or energy use. The point that we are making is that historically, societies shaped the behaviour of their members by accepting ethical principles that reduced the frequency of acts such as murder, or promoted the occurrence of actions thought to be desirable, such as promise-keeping, marital fidelity, or truthfulness. Usually, gains or losses recorded and administered by a deity were a crucial part of an apparatus, that operated to enjoin conformity. It seems that consensually accepted ethical belief systems have tended to become abandoned along with religion, even though religion provides only a fanciful scheme for policing whatever ethical systems are adopted, rather than a rationale for any particular set of ethical principles. In the end, the fact that some deity likes promise keeping, is no more an explanation of why promise keeping is good or right, than the fact that promise-keeping is approved by Great-Aunt Emily, or the family dog. The profound unfashionability of ethical systems may also be connected with a sense of “otherness” associated with them, perhaps because of their association with theism. Ethical systems are “revealed” to humankind, handed down on tablets of stone, or presented in some equally mysterious manner. There is little sense that they are human products like other cognitive tools that can be sharpened or revised as needs and circumstances change. Because of this, ethical systems have become antiquated and irrelevant-seeming: curious vestiges of the past. They incorporate injunctions not to covet a neighbour’s wife (absurdly when the neighbour’s wife in question is skilfully disporting herself on a silver screen with the intention of inducing covetousness), but do not condemn a multitude of behaviours we commonly recognise as evils, such as corporate greed, paedophilia and environmental pollution. If our analysis is correct, the cognitive need for principles that simplify the task of action planning will not go away, nor will there be any diminution of the social benefits of acting according to heuristics that promote community wellbeing or serve goals to remote for individual agents to learn how their behaviour is related to them. Contemporary societies need to understand how ethical beliefs operate, to agree upon ethical precepts that are relevant to contemporary goals, and via educational programmes to promulgate such precepts so that
101
102 Roger Lindsay and Barbara Gorayska
everyone can benefit from constraints upon the action planning processes of individual agents that are in tune with the times.
Note * This paper is an extended and updated version of Lindsay and Gorayska (1994).
References Allen, J. F. & C. P. Perrault (1978). Participating in Dialogue: Understanding via Plan Deduction. Proceedings, Canadian Society for Computational Studies of Intelligence. Appelt, D. E. (1985). Planning English Sentences.Cambridge: Cambridge UP. Baddeley, A. D. (2001). The episodic buffer: a new component of working memory? Trends in Cognitive Sciences 4 (11), 417–423. Baizer, J. S., L. G. Ungerleider & R. Desimone (1991). Organisation of visual inputs to the inferior temporal and posterior parietal cortex in Macaques. Journal of Neuroscience 11, 187–194. Baron, J. (1994). Nonconsequentialist decisions. Behavioral and Brain Sciences 17(1), 1–10. Baron-Cohen, S. (1995). Mindblindness: An Essay on Autism and Theory of Mind. MIT Press, Cambridge MA Barsalou, L. W. (1991). Deriving categories to achieve goals. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory, 27, pp. 1–64. San Diego, CA: Academic Press. [Reprinted in A. Ram & D. Leake (Eds.), Goal-driven learning (1995), pp. 121–176. Cambridge, MA: MIT Press/Bradford Books] Bernstein, B. (1964), ‘Elaborated and restricted codes: their social origins and some consequences’. American Anthropologist 66, 55–69. Bernstein, B. (1971), Class, codes and control vol. 1: Theoretical studies towards a sociology of language. London: Routledge & Kegan Paul. Boussaoud, D, G. di Pellegrino & S. P. Wise (1996). Frontal lobe mechanisms subserving vision-for-action-versus vision-for-perception. Behavioural Brain Research 72, 1–15. Bowman, S., L. Hinkley, J. Barnes & R. Lindsay (this volume). Gaze aversion and the primacy of emotional dysfunction in autism. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 267–301. Amsterdam: John Benjamins Publishing Company. Broca, P. P. (1861). Remarques sur la siege de la faculté du language articule, suivies d’une observation d’aphemie. Bulletin de la Société Anatomique 36, 330–57. Bruce, B. C. (1975). Belief Systems and Language Understanding. BBN Technical Report No. 2973. Carberry, S. (1990). Plan recognition and its use in understanding dialogue. In A. Kobsa & W. Wahlster (Eds.), User Models in Dialogue System, pp. 133–62. Berlin: Springer Verlag.
Relevance, goal management and cognitive technology 103
Carver, C. S. & M. F. Sheier (1982). Control Theory: a useful conceptual framework for Personality-Social, Clinical and Health Psychology. Psychological Bulletin 92(1), 111–35. Charniak, E. (1972). Towards a model of children’s story comprehension. Unpublished doctoral dissertation, MIT. Clark, A. (2001). Mindware. Oxford: Oxford University Press. Cleeremans, A. (1993) Mechanisms of Implicit Learning. Cambridge, Mass: MIT Press. Cohen, M. B. & M. S. Strauss (1979). Concept acquisition in the human infant Child Development, 50, 419–424. Cohen, P. & Perrault, C. R. (1979). Elements of a Plan Based Theory of Speech Acts. Cognitive Science, 3, 177–212. Cutting, J. (1985). The Psychology of Schizophrenia. Edinburgh: Churchill Livingstone. Dascal, M. (1987). Language and reasoning: Sorting out sociopragmatic and psychopragmatic factors. In B. W. Hamill, R. C. Jernigan & J. C. Bourdreaux (Eds.), The role of language in problem solving II, pp. 183–197. Amsterdam: North Holland. El Ashegh, H. A. & R. Lindsay (this volume). Cognition and Body Image. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 175–223. Amsterdam: John Benjamins Publishing Company. Elman, J. L., E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, D. Parisi & K. Plunkett (1996). Rethinking Innateness: a Connectionist Perspective on Development. Cambridge, MA: MIT Press. Erickson, T. & M. Mattson (1981). From words to meaning: a semantic illusion. Journal of Verbal Learning and Verbal Behaviour 20, 540–551. Evans, J. T. St. B. (1982). The psychology of deductive reasoning. London: Routledge & Kegan Paul. Ferguson, G. A. (1956). On transfer and the abilities of man. Canadian Journal of Psychology 10, 121–31. Fodor, J. A. (1983). The Modularity of Mind. Cambridge, Mass.: MIT Press. Gauthier, D. P. (1963). Practical Reasoning. Oxford: Clarendon Press. Gorayska, B. & R. O. Lindsay (1989a). Metasemantics of relevance. The First International Congress on Cognitive Linguistics. Print A265. L. A. U.D. (Linguistic Agency at the University of Duisburg) — Catalogue: Pragmatics, 1989. Available from http:// www.linse.uni-essen.de:16080/linse/laud/shop_laud Gorayska, B. & R. O. Lindsay (1989b). On relevance: Goal dependent expressions and the control of planning processes. Technical Report 16. School of Computing and Mathematical Sciences. Oxford: Oxford-Brookes University. (First published as Gorayska and Lindsay 1989a.) Available at http://cogtech.org/publicat.htm Gorayska B. & R. O. Lindsay (1993). ‘The Roots of Relevance’. Journal of Pragmatic 19, 301–323. Gorayska, B. & R. O. Lindsay (1995). Not Really a Reply — More Like an Echo. Journal of Pragmatics 23, 683–686. Gorayska, B., R. Lindsay, K. Cox, J. Marsh & N. Tse (1992). Relevance-Derived Metafunction: How to Interface Intelligent Systems’ Subcomponents. Proceedings of the AI Simulation and Planning in High Autonomy Systems Conference, Perth Australia, 8–11 July 1992, pp. 64–72. Los Alamitos: IEEE Computer Society Press.
104 Roger Lindsay and Barbara Gorayska
Gorayska, B. & J. Marsh (1996). Epistemic Technology and relevance analysis: Rethinking Cognitive Technology. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 27–39. Amsterdam: North Holland. Gorayska, B. & J. Marsh (1999). Investigations in Cognitive Technology: Questioning perspective. In B. Gorayska, J. Marsh & J. L. Mey (Eds.), Humane interfaces: Questions of methods and practice in Cognitive Technology, pp. 17–43. Amsterdam: North Holland. Gorayska, B., J. Marsh & J. L. Mey (2001). Cognitive Technology: Tool or Instrument. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 1–16. Berlin: Springer. Gorayska, B. & J. L. Mey (1996). Of minds and men. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 1–24. Amsterdam: North Holland. Gorayska, B. & N. Tse (1993). The Goal Satisfaction Heuristic in a Relevance Based Search. Technical Report TR-03–93. Department of Computer Science, City Polytechnic of Hong Kong. Gorayska, B., N. Tse & W. H. Kwok (1997). A Goal Satisfaction Condition as a Function Between Problem Spaces and Solution Spaces. Technical Report TR-97–06. Department of Computer Science, City University of Hong Kong. Available at http://cogtech.org/ publicat.htm Gottlieb, G. & N. A. Krasnegor, (1985). Measurement of Audition and Vision in the First Year of Postnatal Life: A Methodological Overview. Norwood, NJ: Ablex. Grice, H. P. (1961). The causal theory of perception. Aristotelian Society Proceedings, Supplementary Volume 35, 121–152. Reprinted in Grice 1989: 224–247. Grice, H. P. (1989). Studies in the Way of Words. Harvard University Press, Cambridge MA. Hampshire, S. (1960). Thought and Action. London: Chatto and Windus. Hare, R. M. (1952). The Language of Morals. Oxford: Clarendon Press. Harnad, S. (1990). The Symbol Grounding Problem. Physica D 42, 335–346. Hyland, M. E. (1987). Control theory interpretations of psychological mechanisms of depression: comparison and integration of several theories. Psychological Bulletin 102 109–21. Hyland, M. E. (1988). Motivational control theory: an integrative perspective. Journal of Personality and Social Psychology 55, 642–51. Ip, H., B. Gorayska & W. Y. Lok. (1994). Relevance-Directed Vision using Goal/Plan Architecture. Proceedings of the Third Pacific Rim International Conference on Artificial Intelligence, Beijing, 16–18 August 1994, pp. 945–951. Irvine, S. H. & J. W. Berry (1988). Human Abilities in Cultural Context. New York: Cambridge UP. Johnson, G. (1997). Neuron to Symbol: relevance information in hybrid systems. PhD Thesis. Oxford Brooke University, UK. Kagitscibasi, C. & J. W. Berry (1989). Cross-Cultural Psychology: Current Research and Trends. Annual Review of Psychology 40, 493–531. Palo Alto: Annual Reviews Inc. Kahneman, D. & A. Tversky (1982). On the study of statistical intuitions. Cognition 11, 123–141. Kahneman, D., P. Slovik & A. Tversky (1982). Judgements under uncertainty: Heuristics and biases. Cambridge: Cambridge UP.
Relevance, goal management and cognitive technology 105
Kant, I. (1953) Groundwork of the Metaphysics of Morals.Translated by H. J. Paton as: The Moral Law. London: Hutchinson’s University Library. Kay, K. (2001). Machines and the Mind: Do artificial intelligence systems incorporate intrinsic meaning? Harvard Brain Review, 8, Spring. http://www.hcs.harvard.edu/~husn/ BRAIN/vol8-spring2001/ai.htm. Accessed September 2002. Kohlberg, L. (1981). The Philosophy of Moral Development: Moral Stages and the idea of justice. New York: Harper and Row. Leslie, A. (1991). The theory of mind impairment in autism: Evidence for a modular mechanism of development? In A. Whiten (Ed.), Natural Theories of Mind: Evolution, Development and Simulation of Everyday Mindreading, pp. 63–78. Oxford: Blackwell. Levy, D. M. (1979). Communicative Goals and Strategies: Between Discourse and Syntax. In T. Givon (Ed.), Syntax and Semantics, pp. 183–210. New York: Academic Press. Lindsay, R. O. (1996a). Cognitive Technology and the pragmatics of impossible plans — a study in Cognitive Prosthetics. AI & Society 10, 273–288. Special issue on Cognitive Technology. Lindsay, R. O. (1996b). Heuristic Ergonomics and the Socio-Cognitive Interface. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In Search of a Humane Interface, pp. 147–58. Advances in Psychology 113. Amsterdam: North-Holland, Elsevier Science. Lindsay, R. O. & B. Gorayska (1994). Towards a Unified Theory of Cognition. Unpublished manuscript. Lindsay, R. O. & B. Gorayska (1995). On Putting Necessity in its Place, with R. Lindsay. Journal of Pragmatics 23, 343–346. Lindsay, R. O., Gorayska, B. & Cox, K. (1994) The Psychology of Relevance. Unpublished manuscript; available at http://cogtech.org/publicat.htm Mack, A. & I. Rock (1998). Inattentional blindness. Cambridge, MA: MIT Press. Marcel, A. J. (1993). Slippage in the unity of consciousness. In G. R. Bock & J. Marsh (Eds.), Experimental and Theoretical Studies of Consciousness. pp. 168–80. CIBA Foundation Symposium 174. Chichester, UK: John Wiley & Sons. Marsh, J., B. Gorayska, & J. L. Mey (Eds.) (1999). Humane interfaces: Questions of methods and practice in Cognitive Technology. Amsterdam: North Holland. Meehl, P. E. (1962). Schizotaxia, schizotypy, schizophrenia. American Psychologist 17, 827–838. Meenan, S. & R. Lindsay (2002). Planning and the Neurotechnology of Social Behaviour. International Journal of Cognition and Technology 1(2), 233–274. Mishkin, M, L. G. Ungerleider & K. A. Macko (1983). Object vision and spatial vision: two cortical pathways. Trends in Neurosciences 6, 414–417. Newell, A. (1990). Unified Theories of Cognition. Cambridge, Mass.: Harvard University Press. Newell, A. & H. A. Simon (1972). Human Problem Solving. Englewood Cliffs NJ: PrenticeHall. Norman, D. A. & T. Shallice (1986). Attention to action: Willed and automatic control of behavior. In R. Davidson, G. Schwartz & D. Shapiro (Eds.), Consciousness and SelfRegulation. New York: Plenum Press. Oatley, K. & J. Jenkins (1992). Emotion. Annual Review of Psychology 43, p. 55–85.
106 Roger Lindsay and Barbara Gorayska
O’Regan, J. K. & A. Nöe (2001) A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences 24 (5). Available at www.bbsonline.org. Partridge, D. (1991). A new guide to artificial intelligence. Norwood, N. J.: Ablex. Penfield, W & Roberts L. (1959). Speech and brain mechanisms. Princeton NJ: Princeton University Press. Plunkett, K., P. McLeod & E. T. Rolls (1998). An Introduction to Connectionist Modelling Cognitive Processes. Oxford: Oxford University Press. Premack, D & A. J. Premack (1995). Origins of human social competence, In M. S. Gazzaniga (Ed.), The Cognitive Neurosciences, pp. 205–218. Cambridge, Mass: MIT Press, A Bradford Book. Ramachandran, V. S. (1996). Synaesthesia in phantom limbs induced with mirrors. Proceedings of the Royal Society Londo, 263, 377–386. Reber, A. (1993). Implicit learning and tacit knowledge: an essay on the cognitive unconscious. New York: Oxford University Press. Reder, L. & G. Kusbit (1991). Locus of the Moses Illusion: Imperfect Encoding, Retrieval, or Match? Journal of Memory and Language 30, 385–406. Schneider, W. & R. M. Shiffrin (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review 84, 1–66. Segall, M. H., D. T. Campbell & M. J. Herskovits (1966). The Influence Of Culture On Visual Perception. Indianapolis/NY: Bobbs-Merrill. Shanks, D. R. & M. F. St. John (1993). Characteristics of Dissociable Learning Systems. Brain and Behavioural Sciences 17(3), 367–447. Shiffrin, R. M. & W. Schneider (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attenging, and a general theory’. Psychological Review 84, 127–90. Singer, M. G. (1963). Generalisation in Ethics. London: Eyre and Spottiswoode. Smolensky, P. (1988). On the Proper Treatment of Connectionism. Behavioral and Brain Sciences 11(1), 1–59. Sperber, D., F. Cara & V. Girotto (1995). Relevance theory explains the selection task. Cognition 57, 31–95. Sperber, D. & D. Wilson (1986/1995) Relevance: Communication and Cognition. 2nd edition. Oxford: Blackwell. Sperber, D. & D. Wilson (1987). Précis of Relevance: Communication and Cognition. Behavioral and Brain Sciences 10, 697–754. Sperber, D. and D. Wilson (2004). Relevance theory. In G. Ward & L. Horn (Eds.), Handbook of Pragmatic, pp. 607–632. Oxford: Blackwell. Also available at http:// www.dan.sperber.com/relevance_theory.htm. Accessed: September 2002 and January 2004. Squire, L. R. (1992). Declarative and Non-declarative memory: multiple brain systems supporting learning and memory. Journal of Cognitive Neuroscience 4, 232–243. Tinbergen, N. (1953). The Herring Gull’s World. London: Collins. Tse, N. (1994). The Learning Mechanism in GEPAM. MPhil Dissertation. Computer Science Department, City University of Hong Kong. Urmson, J. O. (1950). On Grading. Mind 59 (234), 145–169.
Relevance, goal management and cognitive technology 107
Vera, H. A. & H. A. Simon (1993). Situated Action: a Symbolic Interpretation. Cognitive Science 17 (1), 7–48. January-March. Weiskrantz, L. (1988). Some contributions of neuropsychology of vision and memory to the problem of consciousness. In A. J. Marcel & E. Bisiach (Eds.), Consciousness in Contemporary Science, pp. 183–199. Oxford: Clarendon Press. Whorf, B. L. (1956). Language Thought and Reality. (Ed. J. Carroll). Cambridge Mass: MIT Press. Woodworth, R. S. & H. Schlosberg (1954). Experimental Psychology. London: Holt, Rhinehart, Winston. Woodworth, R. S. & S. B. Sells. (1935). An Atmosphere effect in formal syllogistic reasoning. Journal of Experimental Psychology 18, 451–60. Zhang, X, H. (1993). A Goal-Based Relevance Model and its Application to Intelligent Systems. Ph.D. Thesis, Oxford Brookes University, Department of Mathematics and Computer Science, October, 1993. Zhang, X. H., J. L. Nealon & R. O. Lindsay (1993). An Intelligent User Interface for Multiple Application Systems. Research and development in Expert Systems IX: Proceedings of Expert systems 92, The Twelfth Annual Technical Conference of the British Computer Society Specialist Group on Expert Systems, Cambridge, 1992. Cambridge: CUP.
Robots as cognitive tools* Rolf Pfeifer University of Zurich
1.
Introduction
Cognitive Technology is concerned with the relationship between humans and machines, in particular how the human mind can be explored via the very technologies it produces. In this chapter, I explore the use of robots as cognitive tools. While in science fiction scenarios, the focus is often on how robots change society and our daily lives, I restrict myself here to the exploration of how robots can be productively used as tools for cognitive science, including the cognitive ‘scaffolding’ that robots provide for our understanding of embodied cognitive processes. The idea to design artifacts to explore human and other forms of intelligence goes back to the early days of artificial intelligence. From the 1950s until the mid-80s, the period in which the so-called classical paradigm was predominant, the goal was mostly to develop algorithms for cognitive processes, cognitive being a very general term for mental processes. Examples include playing games such as chess and checkers, solving cryptarithmetic puzzles, performing medical diagnosis, proving mathematical theorems, or processing of written natural language text. This classical paradigm which deliberately abstracts from the physical level and focuses on symbol processing, can be naturally represented as algorithms. The models developed for these kinds of tasks typically had a very centralized, hierarchical, top-down organization. We will see later, that these characteristics are not appropriate to describe naturally intelligent systems. It turned out that this approach had severe limitations as can be seen, for example, from the failure of expert systems (for detailed arguments, see, e.g., Clancey, 1997; Pfeifer and Scheier, 1999; Vinkhuyzen, 1998; Winograd and Flores, 1986). In the mid-80s Rodney Brooks of the MIT Artificial Intelligence Laboratory suggested that we forget about logic and problem solving, that we do away with high-level symbolic processing and focus
110
Rolf Pfeifer
on the interaction of agents with the real physical world (Brooks, 1991a, b). This interaction is, of course, always mediated by a body, i.e., his proposal was that artificial intelligence be “embodied”. As a consequence, many researchers in the field started using robots as their workhorse which, by definition, are embodied systems. What originally seemed nothing more than yet another buzzword turned out to have profound consequences and radically changed our thinking about intelligence, behavior, and society in general; a change in which robots as cognitive tools have been instrumental. Since the methodology employed by artificial intelligence — traditional or embodied — is a synthetic one, it is briefly introduced at the beginning. Then I outline the concept of embodiment and provide a set of case studies to illustrate different kinds of implications. The implications are integrated into an approach that has been called “developmental robotics”, which I introduce next. Finally, I attempt to characterize what we have learned by using robots as cognitive tools.
2. Synthetic methodology Research in artificial intelligence employs a synthetic methodology, i.e., an approach that can be succinctly characterized as “understanding by building”: by developing artifacts that mimic certain aspects of the behavior of natural systems, a deeper understanding of that behavior can be acquired. There are three steps in the synthetic methodology: (1) building a model of some aspect of a natural system, (2) abstracting general principles of intelligence, and (3) applying these abstract principles to the design of intelligent systems. Examples of behaviors one might be interested in are how humans recognize a face in a crowd, how physicians arrive at a diagnosis, how rats learn about a maze, how dogs can run so quickly and at the same time catch a Frysbee, or how insects manage to get back to their nests after having found food. The models of interest are artifacts, either computer programs as in classical artificial intelligence, or robots as in embodied artificial intelligence. In the embodied approach simulations are used as well, but they are of a particular type, the socalled embodied agent simulations. They include physically realistic models of an environment and of the agent’s sensory and motor interactions with that environment. While biologists might be satisfied with an accurate model of a biological system’s behavior (step 1), from the perspective of artificial intelligence, and in
Robots as cognitive tools
particular from the perspective of the present paper, it is important to extract general principles (step 2). Those principles not only include principles of designing intelligent agents but also principles concerning scaffolding. By scaffolding we mean the structuring of the environment with the goal to enable the achievement of complex tasks through interaction with this structured environment (e.g., Clark, 1997). Robots employed as cognitive tools may provide precisely this perspective. Abstracting from purely biological principles is also a prerequisite for engineering applications (step 3) because in engineering the best solution is, in general, not as close a copy of the biological system as possible but some abstraction and modification thereof. In this way, the engineer can also exploit means not available to biological systems such as new types of sensors, media of communication, actuators, etc. Steps 1, 2 and 3 are not sequential: they are pursued partly in parallel and in an iterative way. The synthetic methodology contrasts with the analytic one where a given system is analyzed in a top-down manner, as is the standard way of proceeding in science. The synthetic approach, building aspects of the system one is trying to understand, has proved enormously successful: If one attaches a camera to a computer in order to develop a perceptual system, one’s attention becomes immediately attracted to the relevant problems. For example, it becomes obvious that trying to map a pixel array from a camera image onto an internal symbolic representation is not going to work. As an aside, it is interesting to note that science in general is becoming increasingly synthetic as illustrated, for example, by the rapid growth of the computational sciences.
3. Case studies This section provides a number of demonstrations of the sorts of insights that can be gleaned using robots as cognitive tools. Two important points must be mentioned upfront. First, for the better part, research performed in the field of robotics is of a very traditional nature. It is based on a so-called sense-think-act, or sense-model-plan-act cycle: There is an input (sense), this input is then mapped onto some internal representation (model) which is used for planning (plan), and finally the plan is executed in the real world (act). Modeling and planning together form the “think” part of the cycle. This cycle is closely related to the idea of hierarchical, centralized, symbol processing systems as they have been employed in the classical approach. Thus, if one is after truly interesting results and if one wants to explore the true implications of embodiment, merely
111
112
Rolf Pfeifer
building robots is not enough since robots can be used in very traditional ways. Instead, one has to apply a novel research strategy such as the framework outlined in embodied cognitive science (e.g., Pfeifer and Scheier, 1999). In the case studies below I will show how robots can be employed to explore embodiment. As we will see, this requires an entirely new way of thinking and necessitates reflecting on the interaction with the real world which is messy and not as neat as the world of computer programs. Second, embodiment has two main types of implications: physical and information theoretic. The former are concerned with physical forces, inertia, friction, vibrations, and energy dissipation, i.e., anything that relates to the (physical) dynamics of the system. The latter are concerned with the relation between sensory signals, motor control, and neural processing. While in the traditional approach the focus is on the internal control mechanisms, or the neural substrate, in the new approach the focus is now on the complete organism which includes morphology (shape, distribution and physical characteristics of sensors and actuators, limbs, etc.) and materials. One surprising consequence is that, often, problems that seem very hard if viewed from a purely computational perspective, turn out to be easy if the embodiment and the interaction with the environment are appropriately taken into account. For example, given a particular task environment, if the morphology is right, the amount of neural processing required may be significantly reduced (e.g., case study 1). Because of this perspective on embodiment, entirely new issues are raised and need to be taken into account. As I will illustrate in the following case studies, one important issue concerns the so-called “ecological balance”, i.e., the interplay between the sensory system, the motor system, the neural substrate, and the materials used (Hara and Pfeifer, 2000; Pfeifer, 1996b; Pfeifer, 1999, 2000; Pfeifer and Scheier, 1999). I will begin with a simple robotics experiment, the “Swiss Robots” (Case study 1) and an example from artificial evolution (Case study 2), which illustrate mostly the relation between behavior, sensor morphology, and internal mechanism. Then I will discuss motor systems, in particular biped walking, where the exploitation of (physical) dynamics as well as the interrelationship between morphology and control are demonstrated. This will be followed by an introduction to “developmental robotics” which incorporates the major implications of the embodied approach to artificial intelligence and cognitive science. Because of its importance, I will devote an entire section to it (Section 4).
Robots as cognitive tools
a.
b.
c.
d.
Figure 1.The “Swiss Robots”. (a) Robot with IR sensors and neural network implementing a simple avoidance reflex. (b) Clustering process. (c) Explanation of cluster formation. (d) Changed morphology: modified sensor positioning (details: see text).
Case study 1: The “Swiss Robots” The “Swiss Robots” (Figure 1a) can clean an arena cluttered with Styrofoam cubes (Figure 1b) (which is why they are called “Swiss Robots”). They can do this even though they are only equipped with a simple obstacle avoidance reflex based on infrared (IR) sensors. The reflex can be described as “stimulation of right IR sensor, turn left” or “stimulation of left IR sensor, turn right”. If a robot happens to encounter a cube head-on, there will be no sensory stimulation because of the physical arrangement of the sensors. The robot will move forward and at the same time push the cube until it encounters another one on the side (Figure 1c), at which point it will turn away. If the position of the sensors is changed (Figure 1d), the robots no longer clean the arena, although the enviroment and the control program are exactly the same (for more detail, the reader is referred to Maris and te Boekhorst, 1996; Pfeifer and Scheier, 1998; or Pfeifer, 1996a, 1999). A powerful idea illustrated by this example is that, if the morphology is right, control can become much simpler (in this case a simple obstacle avoidance reflex leads to clustering behavior). This point will be further illustrated below when I discuss the trade-off between morphology and
113
114
Rolf Pfeifer
control in the case study on the evolution of the morphology of an “insect eye” on a robot.
Case study 2: Evolving the morphology of an “insect eye” on a robot When sitting in a train, looking out the window in the direction of the train, a light point, say a tree, will travel slowly across the visual field as long as the tree is well in front and far away. The closer we are to the tree, the more the tree will move to the side, and the faster it will move across the visual field. This is called the phenomenon of motion parallax; it is solely a result of the geometry of the system-environment interaction and does not depend on the characteristics of the visual system. If the agent is moving at a fixed lateral distance to an object with a constant speed, we may want its motion detectors to deliver a constant value to reflect the constant speed. Assume now that we have an insect eye consisting of many facets or ommatidia. If they are evenly spaced, i.e., if the angles between them are constant (Figure 2a), different motion detector circuits have to be used for each pair of facets. If they are more densely spaced toward the front (Figure 2b), the same circuits can be used for motion detection in the entire eye. Indeed, this has been found to be the case in certain species of flies (Franceschini et al., 1992) where the same kind of motion detectors are used throughout the eye, the so-called EMDs, the Elementary Motion Detectors, because the motion parallax is compensated away, so to speak. This is an illustration of how morphology can be traded for computation — a kind of preprocessing is performed by the morphology. How this trade-off is chosen depends on the particular task environment, or in natural systems, on the ecological niche: natural evolution has come up with a particular solution because morphology and neural substrate have co-evolved. In order to explore these ideas, Lichtensteiger and Eggenberger (1999) evolved the morphology of an “insect eye” on a real robot (Figure 3). They fixed the neural substrate. That is, the elementary motion detectors which were taken to be the same for all pairs of facets, were not changed during the experiment and they used a flexible morphology where they could adjust the angles at which the facets were positioned (Figure 3a). They used an evolution strategy (Rechenberg, 1973) to evolve the angles for the task of maintaining a minimal lateral distance to an object. The results confirm the theoretical predictions: the facets end up with an inhomogeneous distribution with a higher density towards the front (Figure 3c). The idea of space-variant sensing (e.g., Ferrari et al., 1995; Toepfer et al., 1998) capitalizes on this trade-off and is gaining rapid acceptance in the field of robot vision.
Robots as cognitive tools
East transition
Constant transition
Slow transition
Figure 2.Trading morphology for computation. (a) Evenly spaced facets imply different motion detection circuits for different pairs of facets. (b) Inhomogeneous distribution of facets implying that the same neural circuits can be used for motion detection throughout the entire eye.
Although these examples are very simple and obvious, they demonstrate the interdependence of morphology and control in sensory systems, a point that should always be explicitly taken into account but has to date not been systematically studied. Similar considerations apply to the motor system.
Case study 3: The passive dynamic walker I start with an example illustrating the relation between morphology, materials, and control. The passive dynamic walker by Steve Collins (originally suggested by McGeer, 1990a, b), illustrated in Figure 4a, is a robot (or, if you like, a mechanical device) capable of walking down an incline without any actuation whatsoever. In other words, there are no motors and there is no control on the robot; it is brainless, so to speak. In order to achieve this task the passive dynamics of the robot, its body and its limbs, must be exploited. This kind of walking is very energy efficient but its “ecological niche” (i.e., the environment
115
116
Rolf Pfeifer
in which the robot is capable of operating) is extremely narrow: it only consists of inclines of certain angles. The strategy is to build a passive dynamic walker, and then to extend its ecological niche and have the robot walk on a flat surface (and later more complex environments) by only adding little actuation and control. Energy-efficiency is achieved because in this approach the robot is operated near one of its Eigenfrequencies. A different approach has been taken by the Honda design team (see Figure 4b). There the goal was to have a robot that could perform a large number of movements. The methodology was to record human movements and then to reproduce them on the robot, which leads to a relatively natural behavior of the robot. On the other hand, control is extremely complex and there is no exploitation of the intrinsic dynamics as in the case of the passive dynamic walker. The implication is that the movement is not energy efficient. It should be noted that even if the agent is of high complexity as the Honda robot, there is nothing in principle that prevents the exploitation of its passive dynamics. There are two main conclusions that can be drawn from these examples. First, it is important to exploit the dynamics in order to achieve energy-efficient and natural kinds of movements. The term “natural” does not only apply to biological systems. Artificial systems also have their intrinsic natural dynamics. Second, there is a kind of trade-off or balance between exploitation of the dynamics, simplicity of control and amount of neural processing: the better the exploitation of the dynamics and the simpler the control, the less neural processing will be required and vice versa. At this point one might be tempted to say “Well, this is all very interesting, but how does it relate to the goal of artificial intelligence, i.e., understanding and building intelligent systems? How can robots help us make progress towards this goal?” I do not have an answer to these questions. However, I would like to propose an approach which might bring us closer to this goal, developmental robotics. Using this approach, I will show how the ideas developed so far in this paper can be taken one step further. This requires a bit of a digression into the foundations of cognition and its development.
4. Developmental robotics — a synthesis Developmental robotics designates an approach whose goal is to design robots in which cognition develops as the robots interact with their physical and social environment over extended periods of time. For the purposes of the present
Robots as cognitive tools
a.
b.
c.
Figure 3.Evolving the morphology of an “insect eye”. (a) The Eyebot used for experiments on motion parallax. (b) The experiment seen from the top. The robot has to maintain a minimal lateral distance to an obstacle (indicated by the vertical light tube) by modifying its morphology, i.e., the positioning of the facet tubes. This is under the control of an evolution strategy. The same EMDs are used for all pairs of facets. (c) Final distribution of facets from three different runs. The front of the robot is towards the right. In all of the runs, the distribution is more dense towards the front than on the side. In all of them, there are no facets directly in the front of the robot. This is because of the low resolution (the aperture) of the tubes.
paper I restrict myself to the physical environment. As I will show below, in this process morphology and materials play an essential role and robots that utilize both can help us acquire a deeper understanding of the developmental processes. Before going into the details, let me briefly argue why development is essential to our understanding of cognition.
Rationale Before I can introduce the approach, however, I need to make a short comment on categorization. One of the most fundamental abilities of agents — animals,
117
118
Rolf Pfeifer
a.
b.
Figure 4.Two approaches to robot building. (a) The passive dynamic walker, (b) the Honda robot.
humans, and robots — in the real world, is their capacity to make distinctions: food has to be distinguished from non-food, predators from con-specifics, the nest from the rest of the environment, and so forth. This ability is also called categorization and forms the basis of concept formation and, ultimately, of high-level cognition. It turns out that making distinctions in the real world is very hard, since the proximal stimulation (the stimulation on the retina) from one and the same object varies greatly depending on distance, orientation, and lighting conditions. Moreover, the agent is confronted with a continuously changing stream of sensory stimulation which, in addition, strongly depends on the current behavior of the agent. Categorization in the real world is not well understood. As the vast literature on computer and robot vision documents, categorization and object recognition cannot be achieved by simply mapping the pixel array from a camera onto some form of internal representation. In categorization behavior, processes of sensory-motor coordination play an essential role. Often, when a process is poorly understood, a developmental perspective may shed new light on the process: once we understand how a particular ability came about, we may have a deeper understanding of it. The basic idea of a developmental approach, i.e., the attempt to understand cognition by investigating its ontogenetic development, is shared by an increasing number of researchers (e.g., Clark, 1997; Edelman, 1987; Elman et al., 1996; Metta et al., 1998; Thelen and Smith, 1994, to mention but a few). Thelen and
Robots as cognitive tools
Smith, for example, argue that while in human infants behavior is initially highly sensory-motor and is directly coupled to the system-environment interaction, during development some processes become “decoupled” from the direct sensory-motor interaction, but the underlying neural mechanisms remain essentially the same. The advent of the discovery of mirror neurons (For an overview see, e.g., Rizzolatti et al., 2000), i.e., neurons that are equally activated when performing or just observing an action, adds validity to this view. The question of what the mechanisms are through which, over time, this “decoupling” from the environment takes place is, to my knowledge, an unresolved research issue. And here is where robots might come into play, in spite of the fact that they are extremely simple compared to a human: Abstractions, simplifications, and leaving out detail are always necessary in order to achieve explanatory power. In robots we can record the sensory-motor and internal states (e.g., of a neural network) and trace the entire developmental history. In particular, we can measure, and thus get a precise image of, the patterns of sensory stimulation that originate from an interaction with the real world such as grasping a cup. These patterns form, so to speak, the raw material for the neural substrate to process. It has been shown (Lungarella and Pfeifer, 2001; Pfeifer and Scheier, 1997, 1999; Scheier, Pfeifer, and Kuniyoshi, 1998) that the continuously varying, highly complex sensory stimulation is significantly simplified if the interaction is in the form of a sensory-motor coordination, i.e., behavior in which there is a tight coupling between the action and the sensory stimulation as, for example, in foveating, in moving towards an object, or in grasping. In other words, sensory patterns are induced by sensory-motor coordination. Note that sensory-motor coordination does not only include information processes but also physical processes. This sensory-motor coordination has strong information theoretic implications in that it reduces the complexity of the sensory-motor patterns, and this reduced complexity is a prerequisite for learning and development.
Exploiting morphology and materials If our objective is to model human development, it seems natural to specifically employ a humanoid robot for this purpose, i.e., a robot whose morphology, at least superficially, resembles that of a human. Such robots typically have very many degrees of freedom and are, as such, hard to control. However, a look at the natural system may provide some inspiration towards a solution for the robots which then, in turn, may provide insights into the functioning of the natural — the human — system.
119
120 Rolf Pfeifer
To this end, let us pursue the idea of exploiting the dynamics a little further and include material properties which can also be conveniently exploited when designing actual robots. Most robot arms available today work with rigid materials and electrical motors. Natural arms, by contrast, are built of muscles, tendons, ligaments, and bones, materials that are non-rigid to varying degrees. All these materials have their own intrinsic properties like mass, stiffness, elasticity, viscosity, temporal characteristics, damping, and contraction ratio to mention but a few. These properties are all exploited in interesting ways in natural systems. For example, there is a natural position for a human arm which is determined by its anatomy and its material properties. Grasping an object like a cup with the right hand is normally done with the palm facing left, but could also be done — with considerable additional effort — the other way around, i.e., the palm facing right. Assume now that the palm of your right hand is facing right and you let go. Your arm will immediately turn back into its natural position. This is not achieved by neural control but by the properties of the muscle-tendon system: On the one hand, the system acts like a spring — the more you stretch it, the more force you have to apply and, if you let go, the spring moves back into its resting position. On the other hand, there is intrinsic damping. Normally, reaching an equilibrium position and damping are conceived of in terms of electronic (or neural) control, whereas in this case, this is achieved (mostly) through the material properties. If these ideas are applied to robots, control becomes much simpler. Many researchers have started building artificial muscles (for reviews of the various technologies see, e.g., Kornbluh et al., 1998, and Shahinpoor et al., 2000) and use them for robots (Figure 5). ISAC, a service robot, and the artificial hand by Lee and Shimoyama use pneumatic actuators, Cog uses the series elastic actuators, and the Face Robot uses shape memory alloys. Facial expressions also provide an interesting illustration for the point made here. If the facial tissue has the right sorts of material properties in terms of elasticity, deformability, stiffness, etc., the neural control for the facial expressions becomes much simpler. Take, for example, smiling. It involves the entire face, but its actuation is very simple: the “complexity” is added by the tissue properties.
Implications of morphology and materials for neural processing Although artificial muscles have been and are increasingly used in robotics, their intrinsic dynamic properties (elasiticity, damping, providing constraints through the muscle-tendon system) have to date not been really exploited. However, for developmental robotics their exploitation will be essential for a number of reasons.
Robots as cognitive tools
a.
b.
c.
d.
Figure 5.Robots with artificial muscles. (a) ISAC (pneumatic actuators). (b) COG (Series-elastic actuators). (c) Lee-Shimoyama hand (pneumatic actuators). (d) Face Robot (shape-memory alloys).
Because of the material properties of the muscle-tendon system, control is not only simpler, but highly decentralized, thus freeing the control systems from a lot of neural processing. In addition, and more specifically, the human hand-arm-shoulder system has a particular morphology (or anatomy): the arms with the hands and fingers facing mostly inwards, i.e., towards the body. Assume now that there is random neural stimulation of the muscles in this system. Rather than performing a random movement, the arm will swing in a highly constrained fashion, the constraints being given by the anatomy and the material properties of the muscle-tendon system. For example, the palm with (the inside of) the finger tips will roughly face in the direction of the arm movement. Thus, if the hand hits an object (or the body for that matter), it will most likely hit it with the palm or the finger tips. Because the latter have an
121
122 Rolf Pfeifer
extremely high density of sensors (in particular for touch), there will be rich haptic sensory stimulation. Assume also, that a grasp reflex is triggered as the hand hits an object. On the one hand, this will generate rich sensory stimulation originating from the densely spaced sensors on the finger tips and, on the other hand, there is a high chance that the object will be brought into the visual field by this movement; perhaps the agent will even stick it into its mouth, thus creating an additional rich pattern of sensory stimulation. This way, temporarily stable patterns of sensory stimulation are induced in several sensory channel (the visual, the haptic, and the proprioceptive one). In other words, correlations are induced which can be exploited by the neural system for forming crossmodel associations, a process deemed essential in concept formation (e.g., Thelen and Smith, 1994). Over time, the information acquired through one sensory channel (e.g., the visual one) can become a partial predictor for the information obtained from a different one (e.g., the haptic or olfactory one). Because the sensory stimulation and the state of the neural networks controlling a robot can be fully recorded, the sensory patterns induced can be quantitatively analyzed, as demonstrated by Lungarella and and Pfeifer (2001). This type of analysis provides the basis for exploring models of neural mechanisms of categorization from which an — artificial — developmental process can be bootstrapped that might eventually lead to behavior that we would call high-level cognition. Moreover, this type of analysis can then be used to formulate hypotheses about the neural processing and control in natural systems, because currently, relatively little is known about the true sensory stimulation and the internal processes (though great strides have been made by employing brain imaging techniques). This “closes the loop” from natural systems to robots and back to natural systems. The story is much longer, but these comments should illustrate the basic ideas.
5. Summary and conclusions Let us briefly summarize the main points from the various case studies. Traditionally, the focus of research in artificial intelligence has been on the control architecture (in the form of computer programs or neural network models). By contrast, in the present approach we have looked at complete agents and their interaction with the real world. It has been shown that there is an intricate relation between morphology, materials and control and that all these aspects, not only control, are essential in understanding intelligent behavior. The
Robots as cognitive tools
Didabot experiment demonstrated that by exploiting the constraints of the ecological niche (Styrofoam cubes of a particular size, closed arena), control can be enormously simplified. Experiments with the Eyebot showed how appropriate morphology can perform some of the preprocessing which, because it is performed by the physics, is fast and “for free.” The passive dynamic walker illustrated the exploitation of morphology to achieve natural locomotion with no — or in general, little — control and high energy efficiency. Finally, the approach of developmental robotics integrates all of these ideas by incorporating processing of sensory-motor coordination where sensor morphology (physical nature and distribution of the sensors on the agent), morphology of the motor system (anatomy), and materials (properties of the muscle-tendon system) all work together to provide the basis for cognitive development. As mentioned earlier, one cannot say with certainty that these ideas would not have evolved had it not been for the robots, but certainly, using robots as cognitive tools has helped a great deal for the following reasons: –
–
–
– –
Robots are physical systems interacting with the real world. Designing and building robots for particular tasks forces the designer’s attention on the fundamental issues and there is no “glossing over” hard problems. Although robots are different from natural systems, because of their nature as real-world devices, they are subject to the same physical conditions as natural systems, which makes them excellent candidates for the synthetic methodology. Sensory-motor and internal state can be measured, recorded into time series files and analyzed, thus providing an objective basis from which to bootstrap a process of ontogenetic development. Recording sensory stimulation and internal state is only possible to a very limited extent in natural systems. Different control schemes can be explored (network or otherwise), experiments can be performed with different sensory and motor systems, and with different materials, which may or may not exist in nature. Because the systems can be engineered, there is much more flexibility in experimentation than if one works with natural systems, as in the analytic framework. This is essential in generating testable hypotheses for biologists. Because experiments are performed on artifacts in the real world, these artifacts may be exploited for practical applications. Robots provide an excellent vehicle for communication in transdisciplinary projects which is notoriously hard as researchers from different fields not only have different backgrounds, but they also use different languages.
123
124 Rolf Pfeifer
As mentioned in the introduction, it is interesting to view this approach of using robots as cognitive tools in term of scaffolding. By designing and building robots we structure our research environment in ways that enable new, more sophisticated types of interactions. This is one of the central goals of Cognitive Technology. I expect the potential of this approach to increase with progress in embodied cognitive science in general, and with robotics technology and material science in particular. Going back to our initial characterization of Cognitive Technology as a discipline concerned with how the human mind can be explored via the very technologies it produces, I have pointed to one particular way in which this might be achieved, namely by employing the synthetic methodology. Historically speaking, technology has always shaped the way human nature has been perceived. Recently, the computer technology has suggested the very powerful and popular metaphor of the brain as a computer. In this paper, I have suggested that by instantiating embodied systems robotics technology provides an entirely novel and more appropriate perspective on the functioning of the human mind. More importantly, from the viewpoint of Cognitive Technology, it also opens the window for further investigations into the workings of the tool-empowered, embodied mind.
Note * This paper contains significant portions of two papers entitled “Embodied Artificial Intelligence” (Pfeifer, 2001) and “Teaching powerful ideas with autonomous mobile robots” (Pfeifer, 1996a). I would like to thank Barbara Gorayska for suggesting that I write this chapter to further Cognitive Technology research. I would also like to thank the members of the Artificial Intelligence Laboratory for many discussions and for their patience in discussing the same issues with me over and over again. Last but not least, I would like to thank the Swiss National Science Foundation who has generously supported this research with grants #11–65310.01 and #20–61372.00.
References Brooks, R. A. (1991a). Intelligence without representation. Artificial Intelligence, 47, 139–160. Brooks, R. A. (1991b). Intelligence without reason. Proceedings International Joint Conference on Artificial Intelligence-91, 569–595. Clancey, W. J. (1997). Situated cognition. On human knowledge and computer representations. Cambridge, UK: Cambridge University Press.
Robots as cognitive tools
Clark, A. (1997). Being there: Putting brain, body, and world together again. Cambridge, Mass.: MIT Press. Edelman, G. E. (1987). Neural Darwinism. The theory of neuronal group selection. New York: Basic Books. Elman, J. L., E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, D. Parisi & K. Plunkett (1996). Rethinking innateness. A connectionist perspective on development. Cambridge, Mass.: MIT Press. Ferrari, F., P. Q. J. Nielsen & G. Sandini (1995). Space variant imaging. Sensor Review 15, 17–20. Franceschini, N., J. M. Pichon & C. Blanes (1992). From insect vision to robot vision. Philosophical Transactions of the Royal Society, London B, 337, 283–294. Hara, F. & R. Pfeifer (2000). On the relation among morphology, material and control in morpho-functional machines. In J. A. Meyer, A. Berthoz, D. Floreano, H. Roitblat, and S. W. Wilson (Eds.), From Animals to Animats 6. Proceedings of the sixth International Conference on Simulation of Adaptive Behavior 2000, pp. 33–40. Kornbluh, R. D., R. Pelrine, J. Eckerle & J. Joseph (1998). Electrostrictive polymer artificial muscle actuators. In Proceedings of 1998 IEEE International Conference on Robotics and Automation, pp. 2147–2154. New York, N. Y.: IEEE. Lichtensteiger, L. & P. Eggenberger (1999). Evolving the morphology of a compound eye on a robot. In Proceedings of the third European Workshop on Advanced Mobile Robots (Eurobot’99) (Cat. No.99EX355), pp. 127–134. Piscataway, N. J.: IEEE. Lungarella, M. & R. Pfeifer (2001). Robots as cognitive tools: information theoretic analysis of sensory-motor data. Proceedings of the IEEE-RAS International Conference on Humanoid Robots, 2001, pp. 245–252. Maris, M. & R. te Boekhorst (1996). Exploiting physical constraints: heap formation through behavioral error in a group of robots. In Proceedings of IROS ’96, IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1655–1660. McGeer, T. (1990a). Passive dynamic walking. International Journal of Robotics Research 9, 62–82. McGeer, T. (1990b). Passive walking with knees. In Proceedings of the IEEE Conference on Robotics and Automation 2, pp. 1640–1645. Metta, G., G. Sandini & J. Konczak (1998). A developmental approach to sensori-motor coordination in artificial systems. In Proceedings of IEEE Conference on Systems, Man and Cybernetics, pp. 11–14. San Diego, USA. Piscataway, N. J.: IEEE Service Center. Pfeifer, R. (1996a). Teaching powerful ideas with autonomous mobile robots. Computer Science Education 7, 161–186. Pfeifer, R. (1996b). Building “Fungus Eaters”: Design principles of autonomous agents. In P. Maes, M. Mataric, J. A. Meyer, J. Pollack & S. W. Wilson (Eds.), From Animals to Animats 4, Proceedings of the 4th International Conference on Simulation of Adaptive Behavior, pp. 3–12. Cambridge, Mass.: A Bradford Book, MIT Press. Pfeifer, R. (1999). Dynamics, morphology, and materials in the emergence of cognition. In Burgard, W., Christaller, T., Cremers, A. B. (Eds.), KI-99 Advances in Artificial Intelligence. Proceedings of the 23rd Annual German Conference on Artificial Intelligence, Bonn, Germany, 1999, Lecture Notes in Computer Science 1701, pp. 27–44. Berlin: Springer.
125
126 Rolf Pfeifer
Pfeifer, R. (2000). On the role of morphology and materials in adaptive behavior. In J. A. Meyer, A. Berthoz, D. Floreano, H. Roitblat, & S. W. Wilson (Eds.), From Animals to Animats 6. Proceedings of the sixth International Conference on Simulation of Adaptive Behavior 2000, pp. 23–32. Pfeifer, R. (2001). Embodied Artificial Intelligence: 10 years back, 10 years forward. In R. Wilhelm (Ed.), Informatics — 10 years back, 10 years ahead. Lecture Notes in Computer Science, pp. 294–310. Berlin: Springer. Pfeifer, R. & C. Scheier (1997). Sensory-motor coordination: the metaphor and beyond. Robotics and Autonomous Systems 20, 157–178. Pfeifer, R. & C. Scheier (1998). Representation in natural and artificial agents: an embodied cognitive science perspective. Zeitschrift für Naturforschung 53c, 480–503. Pfeifer, R. & C. Scheier (1999). Understanding intelligence. Cambridge, Mass.: MIT Press (2nd printing 2000 – paperback edition). Rechenberg, I. (1973). Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart: Frommann-Holzboog. Rizzolatti, G., L. Fogassi & V. Gallese (2000). Cortical mechanisms subserving object grasping and action recognition: A new view of the cortical motor functions. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences, pp. 539–552. Cambridge, Mass.: MIT Press. Scheier, C., R. Pfeifer & Y. Kuniyoshi (1998). Embedded neural networks: exploiting constraints. Neural Networks 11, 1551–1569. Shahinpoor, M., Y. Bar-Cohen, J. O. Simpson & J. Smith (2000). Ionic Polymer-Metal Composites (IPMC) as biomimetic sensors, actuators & artificial muscles — A review. http://www.unm.edu/~amri/paper.html Thelen, E. & L. Smith (1994). A dynamic systems approach to the development of cognition and action. Cambridge, Mass.: MIT Press, Bradford Books. Toepfer, C., M. Wende, G. Baratoff & H. Neumann (1998). Robot navigation by combining central and peripheral optical flow detection on a space-variant map. In Proceedings of the Fourteenth International Conference on Pattern Recognition, pp. 1804–1807. Los Alamitos, CA: IEEE Computer Society. Vinkhuyzen, E. (1998). Expert systems in practice. Unpublished Ph.D. Dissertation, University of Zurich. Winograd, T. & F. Flores (1986). Understanding computers and cognition. Reading, Mass.: Addison-Wesley.
The origins of narrative In search of the transactional format of narratives in humans and other social animals* Kerstin Dautenhahn University of Hertfordshire
1.
Introduction: The social animals
Humans share fundamental cognitive and behavioral characteristics with other primates, in particular apes (orangutan, gorilla, chimpanzee, bonobo). Although it is widely accepted that humans and other apes have a common ancestor and that human behavior and cognition are grounded in evolutionary ‘older’ characteristics, many people still insist that human intelligence and human culture are ‘unique’ and qualitatively different from most (if not all) other non-human animals. Traditionally human language has often served as an example of a ‘unique’ characteristic. However, due to Donald Griffin, the founder of the field of cognitive ethology, it is recognized as a valid endeavor to study the evolutionary continuity of mental experiences (Griffin, 1976). Humans are not discontinuous from the rest of nature. The particular topic of this chapter is narrative. With a few exceptions (Read and Miller, 1995), most discussions on the ‘narrative mind’ have neglected the evolutionary origins of narrative. Research on narrative focuses almost exclusively on language in humans (see, e.g., Turner, 1996). Similarly, narrative is often conceived of as a (sophisticated) art form, rather than serving a primarily communicative function. The work presented here1 argues that human narrative capacities are not unique and that an evolutionary continuity exists that links human narratives to transactional narrative formats in social interactions among non-human animals. Also, from a developmental point of view, we argue that narrative capacities develop from pre-verbal, narrative, transactional formats that children get engaged in with their parents and peers. Instead of focusing on differences between humans and other animals, we point out
128 Kerstin Dautenhahn
similarities and evolutionary, shared histories of primates with specific regard to the origins and the transactional format of narratives. The chapter sets off by reviewing the main arguments of a debate that is currently discussed intensively in primatology and anthropology, namely, that the primary function of human language might have been its capacity to afford coping with increasingly complex social dynamics. Based on this framework of the social origin of human intelligence, we discuss the Narrative Intelligence Hypothesis (NIH), first suggested in (Dautenhahn, 1999), that points out the intertwined relationship between the evolution of narrative and the evolution of social complexity in primate societies. The underlying assumptions and arguments are discussed in greater detail. The NIH as referred to in this paper consists of the following line of arguments: a. Individualized societies are a necessary (but possibly not sufficient) ‘substrate’ for the evolution of narratives. In such societies members know each other individually. b. The specific narrative format of such transactions serves an important communicative function among primates and has possibly evolved independently in other groups of species that live in individualized societies. Narrative capacities co-evolved in order to cope with increasingly complex dynamics. c. The evolution of communication in terms of narrative language (storytelling) was an important factor in human evolution that has shaped the evolution of human cognition, societies and human culture. The use of language in a narrative format provided an efficient means of ‘social grooming’ that maintains group coherence. d. Pre-verbal transactions in narrative format bootstrap a child’s development of social competence and social understanding. e. Human cultures which are fundamentally ‘narrative’ in nature provide an environment that young human primates are immersed in and facilitate not only the development of a child into a skilled story-teller and communicator, but also the development of an autobiographical self. The NIH is speculative and part of ongoing research. The particular contribution of this chapter is that it discusses in more detail the transactional and canonical format of narrative that can be found in different verbal and nonverbal social interactions among primates, and in preverbal communication of human infants.2 While this chapter discusses work in progress, it is hoped that future research in this area can lead to a theory of the (social) origins of
The origins of narrative 129
narrative. Essential for the development of such a theory is empirical evidence. The current paper only provides supporting material that helps in (a) the process of synthesizing ideas from various research fields, and (b) in formulating the NIH. In Section 5 we discuss experiments that are needed in order to test/falsify a theory on the social origins of narrative. The NIH implies a better understanding of the origins of narrative intelligence in humans and other animals. Such an understanding can point out issues relevant to the design of narrative technology. Therefore, Section 6 concludes this paper by discussing implications of the NIH for technology that meets the social and cognitive needs of human story-tellers.
2. The Social Brain Hypothesis Primate societies belong to individualized societies with complex types of social interactions, social relationships and networks. In individualized societies group members individually recognize each other and interact with each other based on a history of interactions as part of a social network. Many mammal species (such as primates, elephants, and cetaceans) live in highly individualized societies, so do bird species such as corvids and parrots. Preserving social coherence and managing cooperation and competition with group members are important aspects of living in individualized societies. Dealing with such a complex social field often requires sophisticated means of interaction and communication which are important for the Narrative Intelligence Hypothesis discussed in this article. 2.1 Primate group sizes and the neocortex Why do humans have, relatively speaking, large brains? No other organ of the human body consumes as much of the body’s energy (20%), even at rest, while making up only 2% of an adult’s body weight. How can human primates afford such an expensive organ? What were the particular selective pressures in human evolution that led to such costly brains, or to put it differently, what are brains good for? In the context of human (or generally primate) intelligence the Social Intelligence Hypothesis (SIH), sometimes also called Machiavellian Intelligence Hypothesis or Social Brain Hypothesis, suggests that the primate brain and primate intelligence evolved in adaptation to the need to operate in large groups
130 Kerstin Dautenhahn
where the structure and cohesion of the group required a detailed understanding of group members (cf. Byrne and Whiten, 1988; Whiten and Byrne, 1997; Byrne, 1997). Given that maintaining a large brain is very costly, it is assumed that the necessity to evolve social skills (which allow interpreting, predicting and manipulating conspecifics) has been a prominent selective factor accelerating primate brain evolution. Identifying friends and allies, predicting behavior of others, knowing how to form alliances, manipulating group members, making war, love and peace, are important ingredients of primate politics (de Waal, 1982). Thus, there are two interesting aspects of primate sociality: it served as an evolutionary constraint which led to an increase of brain size in primates, which, in return, led to an increased capacity to further develop social complexity. Research in primatology that studies and compares cognitive and behavioral complexity in and among primate species, can shed light on the origins of primate cultures and societies. Particularly relevant for the theme of this chapter are the potential relationships between social complexity and brain evolution. A detailed analysis by Dunbar and his collaborators (Dunbar, 1992, 1993, 1998) suggests that the mean group size N is a function of relative neocortical volume CR (volume of neocortex divided by volume of the rest of the brain (see formula (1) and Figure 1)). log10 (N) = 0.093 + 3.389 log10 (CR)
(1)
This correlation does not provide ‘hard’ evidence, which is fundamentally difficult to obtain for many aspects of the evolution of animal (and human) minds, but it supports the argument that social complexity might have played a crucial role in primate brain evolution. In order to manage larger groups bigger brains might provide the required ‘processing capacity’. No such correlates have been found when comparing the increase of neocortex size with the complexity of the environment, such as the size of the home range of a species.3 The causality and complexity of the argument ‘complex social dynamics led to larger neocortices’ are still not completely understood, but in primatology it is currently widely acknowledged that social complexity provided an important, and possibly causal, factor in the evolution of primate (social) intelligence. How can primate societies cope with an increase in the number of group members and relationships among them? How are social networks and relations established and maintained? How is cohesion and stability preserved? What are the mechanisms that serve as ‘social glue’?
The origins of narrative
Mean group size
Social Complexity
Neocortex ratio
Figure 1.Group size plotted against neocortex ratio (logarithmic scales). Correlations were found, e.g., in 36 primate genera (Dunbar, 1993). Similar relationships (not necessarily on the same grade as the primate regression) have been found in carnivors, some insectivors (Dunbar and Bever, 1998), cetaceans (Marino, 1996), and some bats (Barton and Dunbar, 1997). Thus, it seems that a common relationship between social complexity and encephalization (relationship between brain size and body size) can be found in animal species that live in stable social groups, although each species might live in very distinctive environments, with very distinctive ‘brains’ and very different evolutionary histories.
2.2 Preserving cohesion in primate societies: Grooming and language Judging from our own experience as a member of human society, communicating via language seems to be the dominant mechanism for the purpose of preserving social cohesion. However, non-human primates in the wild do not seem to use a human-like language. Here, social cohesion is maintained through time by social grooming. Social grooming patterns generally reflect social relationships; they are used as a means to establish coalition bonds, for reconciliation and consolation and other important aspects of primate politics. Social grooming is a one-to-one behavior extended over time, that poses particular constraints on the amount of time an animal can spend on it, given other needs such as feeding, sleeping, etc. Also, cognitive constraints limit the complexity of social dynamics that primates can cope with, as discussed in the following paragraph.
131
132
Kerstin Dautenhahn
Given the neocortical size of modern humans, Dunbar (1993) extrapolated from the non-human primate regression (relative neocortical volume vs. group size) and predicted a group size of 150 for human societies. This number limits the number of relationships that an individual human can remember and monitor. It is the upper group size limit which still allows social contacts that can be maintained and interaction-histories that can be remembered, thus supporting effective coordination of tasks and information-flow via direct person-to-person contacts. The number 150 is supported by analysis of contemporary and historical human societies. But how do humans preserve cohesion in groups of 150 individuals, a function that (physical) social grooming serves in non-human primate societies? In terms of survival needs (resting, feeding, etc.) primates can only afford to spend around 20% of their time on social interactions and social grooming, much less than a group size of 150 requires. It was therefore suggested by Dunbar (1993) that, in order to preserve stability and coherence in human societies, human language has evolved as an efficient mechanism of social bonding, replacing social grooming mechanisms in non-human primate societies where direct physical contact affords only much smaller groups. Following this argument, language allowed an increase in group size while still preserving stability and cohesion within the group. The next section will elaborate this argument further by analyzing what the particular features of communication via language are that make it an efficient ‘social glue’ in human societies.
3. The Narrative Intelligence Hypothesis According to the primatologist Richard Byrne (Byrne, 1997), in the context of the evolution of human intelligence, the Social Intelligence Hypothesis offers little explanation for the evolution of specific ape and human kinds of intelligence (e.g., involving mental representations): clear evidence for a systematic monkey-ape difference in the neocortex ratio is lacking. Great apes do not form systematically larger groups than monkeys do, which draws attention to physical rather than social factors (e.g., tool use, processing plant food, etc.). Why have in particular human apes evolved sophisticated representational and mental skills? Are there any candidate factors that could have accelerated the evolution of human intelligence? If the evolution of language played an important role, as suggested by others (e.g., Dunbar, 1993; Donald, 1993), what are the particular characteristics of language that matter?
The origins of narrative
3.1 What is special about language? A closer look at the ontogeny of language and narrative, i.e., the role of language in the development of children, can give important hints about the special characteristics of language: Studies in developmental psychology of how children develop narrative skills, show that narratives play a crucial role in how young human primates become socially skilled individuals (cf. Nelson, 1993; Engel, 1995). Narrative psychology considers stories the most efficient and natural human way to communicate, in particular, to communicate about others (Bruner, 1987, 1990, 1991). According to Read and Miller “stories are universally basic to conversation and meaning making”, and as developmental and cross-cultural studies suggest, “humans appear to have a readiness, from the beginning of life, to hear and understand stories” (Read and Miller, 1995, p. 143). The Narrative Intelligence Hypothesis (Dautenhahn, 1999) interprets such observations from the ontogeny of human language in the context of primate evolution. It proposes that the evolutionary origin of communicating in stories co-evolved with increasing social dynamics among our human ancestors, in particular the necessity to communicate about third-party relationships (which in humans seems to reach the highest degree of sophistication among all apes (cf. gossip and manipulation (Sinderman, 1982)). According to the NIH, human narrative intelligence might have evolved because the format of narrative is particularly suited to communicate about the social world. Thus, in human evolution, we can observe an evolutionary trend from physical contact (non-human primates) to vocal communication and language (hominids), to communicating in stories (highly ‘enculturated’ humans living in complex societies), correlated with an increase in complexity and sophistication of social interaction and communication. This trend demonstrates the evolution of increasingly efficient mechanisms for time-sharing the processes of social bonding. While physical grooming is generally a dyadic activity, language can be used in a variety of ways, extending the dyadic use in dialogues to, e.g., one-to-many communication as it is used extensively in the mass media (television, books, email, etc.) today. It can be estimated (Dunbar, 1993) that the human bonding mechanism of language is about 2.8 times as efficient as social grooming (the non-human primate bonding mechanism). Indeed, Dunbar’s studies indicate that conversational groups usually consist of one speaker plus two or three listeners. Of course larger groups can be formed easily but, in terms of actively participating and following different arguments within
133
134 Kerstin Dautenhahn
the group, 1+2(3) seems to be the upper limit for avoiding ‘information processing overload’ in the primate social brain. Also, because of its representational nature, language affords documentation, preservation in storage media and transmission of (social) knowledge to the next generation, as well as communication between geographically separated locations (Donald, 1993). 3.2 Narrative, the social context and meaning Discussions in the social domain (e.g., on social relationships and feelings of group members) are fundamentally about personal meaning (Bruner, 1990). Narrative might be the ‘natural’ format for encoding and transmitting meaningful, socially relevant information (e. g., emotions and intentions of group members). Humans use language to learn about other people and third-party relationships, to manipulate people, to bond with people, to break up or reinforce relationships. Studies suggest that people spend about 60% of conversations on gossiping about relationships and personal experiences (Dunbar, 1993). Thus, a primary role of language might have been to communicate about social issues, to get to know other group members, to synchronize group behavior, to preserve group cohesion. To summarize, the following strategies of coping with a complex social field in primate societies were outlined in the preceding sections: stage 1: non-verbal, physical, social grooming as a means of preserving group cohesion, limited to one-to-one interaction stage 2: communicating about social matters and relating to others in the narrative format of transactions with non-verbal ‘enacted’ stories (see Section 4) stage 3: using language and verbal narratives in order to cope with social life The Narrative Intelligence Hypothesis suggests that the evolution and development of human narrative capacities might have gone through these different stages, not replacing preceding stages, but adding additional strategies that extend an individual’s repertoire of social interaction. These range from physical contact (e.g., in families and very close relationships) to preverbal ‘narrative’ communication in transactions with others (let alone the subtleties of body language and nonverbal social cues, not necessarily conscious (cf. Hall, 1968; Farnell, 1999)), to developing into a skilled story-teller within the first years of life and refining these skills throughout one’s life. The next section gives a few examples of where we might find narratives in the behavior of
The origins of narrative
humans, other animals, and possibly even artifacts. To begin with, we need to have a closer look at the specific canonical format of narrative.
4. In search for narratives 4.1 What are narratives? Many definitions and theories of narrative exist in the literature. In the following we select and discuss a few definitions. With respect to adult literature and conversation, what we usually mean by narrative is a story with the following structure: First, a certain introduction of the characters is given (making contact between individuals, actors, listener and speaker). Then, the story develops a plot, namely a sequence of actions over time that convey meaning (value, pleasurable, not pleasurable), usually with a high point and a resolution (reinforcement or break-up of relationships), and focuses on unusual events rather than stereotypical events. Note that such a structure is typical for ‘adult’ narratives. Children’s narratives can have rudiments of this structure but still count as narratives that describe personal experience of the story-teller. Children are not born as perfectly competent story-tellers. The format of narrative story-telling is rapidly learnt by typically developing children during their first years of life. Here, the social environment is crucially important in developing and shaping children’s narrative skills (Engel, 1995; Nelson, 1989, 1993); narrative skills are socially constructed. The narrative styles and abilities of children develop during their daily interactions and conversations that they participate in and listen to. The environment, e.g., parental input, shapes and influences this development. Children’s narrative styles and abilities reflect particular styles, values and conventions used in families, cultures, etc. Story-telling development is a highly social and interactive process, ‘tutoring’ is usually informal and playful (Engel, 1995). The typical ‘adult’ story format of beginning, middle and end is usually mastered by 3-year olds. The following two examples show the differences between a typical story of a 2-year old girl and a 5-year old boy, both telling the story to their mother: “We went trick and treating. I got candy. A big red lolly-pop and I lost my hat.” (Engel, 1995, p. 16) “Once there was a monster that lived where other monsters lived just like him. He was very nice. He made bad people good. He lived always happy. He loved
135
136 Kerstin Dautenhahn
to play with kids. One day he gets caught in a hurricane. The lights went off except there was flashlights there. He jumped into the ocean. He meeted all the fish. And he lived in water.” (Engel, 1995, p. 71)
Thus, children’s stories can occur in rudimentary forms (see first example), or can be elaborated (second example), from which the step towards fully-fledged adult story-telling seems relatively small. Note that even the rudimentary story told by a 2-year old is ‘successful’ in terms of its ‘meaning’ and its communicative function: a significant experience in the child’s life is recalled and reconstructed. While many discussions of the format of narrative focus on narratives in oral format, in this chapter we refer to narrative in a wider sense, including written and spoken narratives. Note that structural aspects relating to the syntax of a story are not our primary concern. Story grammars (Mandler, 1984), i.e., notational, rule-based systems for describing regularities and formal structures in stories (e.g., in traditional folk-tale or problem-solving stories), are not the kind of narrative formats that are the focus of this article. Instead, we stress the transactional nature of narratives and the way that narratives convey meaning, can create intersubjectivity and are embedded in a social context. According to this perspective, we tentatively propose the following definition of narrative: A narrative consists of verbal (spoken or written) or non-verbally ‘enacted’ social transactions with the following necessary properties: a. Narratives have an important communicative function. “We use stories to guide and shape the way we experience our daily lives, to communicate with other people, and to develop relationships with them. We tell stories to become part of the social world and to know and reaffirm who we are” (Engel, 1995, p. 25). Thus, the major topic of narratives is the social field, involving transactions of intentional, social agents acting in a social context. Narratives are means to create intersubjectivity between people who communicate with each other, or between ourselves and our former or future ‘self’, which leads us to the second important property of narrative: b. Remembered experience, when put in the format of a narrative, allows us to think about the past (Engel, 1995, p. 26) and to ‘go back in time’. More generally, narrative extends the temporal horizon from the present (the ‘here and now’), to the past (‘how things used to be’) and to the future (‘how things might be’) (cf. Nehaniv et al., 1999). Narratives allow us to travel back and forth in time, to create imaginary or alternative realities, to re-interpret the past, and in this way are fundamentally different from communicative non-narrative events that are limited to the immediate present. One might speculate that,
The origins of narrative
because narratives extend the ‘temporal horizon’, they are crucial to the development of a ‘self ’ (Nelson, 1989, 1993), an autobiographic self. But what is the format of narrative that provides all this, namely, creating intersubjectivity and extending the temporal horizon? We suggest that: c. The narrative follows a particular transactional format which in its simplest form, found in preverbal children, and possibly other non-verbal non-human animals, consists of the following sequence: canonical steady state, precipitating event, restoration, and a coda marking the end. This transactional format was suggested by Bruner and Feldman (1993), see Section 4.2. Other transactional formats of narrative might exist, but, for the purpose of this paper, we focus on this simple format suggested by Bruner and Feldman. In the fields of narrative psychology and narrative intelligence Jerome Bruner’s theories and work have been very influential (Bruner, 1987, 1990, 1991). Particularly relevant to this chapter is Bruner’s notion that stories are primarily dealing with people and their intentions; they are about the social and cultural domain rather than the domain of the physical world. Narratives are often centered towards subjective and personal experience. According to Bruner (1991), narrative is a conventional form that is culturally transmitted and constrained. Narrative is not just a way of representing or communicating about reality, it is constituting and understanding (social) reality. Unlike scripts (Schank and Abelson, 1977) that describe regular events, narratives are about ‘unusual events’, ‘things worth telling’ (Bruner, 1991). Narratives describe people or other intentional and mental agents, acting in a setting in a way that is relevant to their beliefs, desires, theories, values, etc., and they describe how these agents relate to each other. Although narrative capacities (understanding and producing stories) are capacities shaped by society, they clearly develop in an individual (cf. Nehaniv, 1997; Dautenhahn and Coles, 2001) with an important meaning for the individual agent. For example, stories that children tell to themselves play an important part in a child’s abilities to make meaning of events (cf. Nelson, 1989; Engel, 1995). Nevertheless, stories, at least for fundamentally social animals such as humans, are most effective in communication in a social context: “We converse in order to understand the world, exchange information, persuade, cooperate, deal with problems, and plan for the future. Other human beings are a central focus on each of these domains: We wish to understand other people and their social interactions; we need to deal with problems
137
138
Kerstin Dautenhahn
involving others; and other people are at the heart of many of our plans for the future.” (Read and Miller, 1995, p. 147)
Human culture has developed various means of artistic expression (sequential visual arts, dance, pantomime, comics, literature, etc.) which are fundamentally ‘narrative’ in nature, conveying meaning about people and how people relate to the world. Children who are immersed in human culture, exposed to those narratives, develop as skilled story-tellers, as is shown in the following story called “Jig Jags Day”, written by a 9-year old girl when asked to write a story about a robot. This story and the one mentioned in Section 4.2 were part of a project with typically developing children, summarized in (Bumby and Dautenhahn, 1999). The story fits Bruner’s criteria very well: “Once there was a robot called Jig Jag and Jig Jag lived in the countryside. One day Jig Jag’s lights started to flash, that meant that the robot had an idea. “I think I will go for a walk”, so Jig Jag went into a field with some sheep in it and the silly robot tried to talk to the sheep, “Silly, silly, Jig Jag”. Next Jig Jag saw some cows in the next field, so silly Jig Jag tried to talk to the cows! After that Jig Jag went to the shops, he wanted to buy some bolts and oil. So Jig Jag went into the hardware shop, but poor Jig Jag set the alarm off. So Jig Jag went into another hardware store across the road. So the robot tried to get into the shop but again Jig Jag set the alarm off. So poor Jig Jag had to go home empty handed.”
4.2 Narratives and autism Traditionally psychologists interested in the nature and development of narratives have a particular viewpoint of narratives in terms of human verbal storytelling. Interestingly, Bruner and Feldman (1993) proposed the narrative deficit hypothesis of autism, a theory of autism that is based on a failure of infants to participate in narrative construction through preverbal transactional formats. Children with autism generally have difficulty in communication and social interaction with other people. A variety of competing theories attempt to explain the specific communication and social deficits of people with autism (Jordan, 1999). Among them is the well known Theory of Mind (TOM) (cf. Leslie, 1987; Baron-Cohen, 1995). TOM models of mindreading have a clear modular, computational and metarepresentational nature. However, the TOM explanation of autistic deficits is controversial and other researchers suggest that primary deficits in emotional, interactive, or other factors central to the embodied and intersubjective nature of social understanding, might be causing
The origins of narrative 139
autism (e.g., Rogers and Pennington, 1991; Hobson, 1993). Deficits in narrative skills have been observed in children with autism (e.g., Loveland et al., 1990; Charman and Shmueli-Goetz, 1998). Bruner and Feldman’s theory suggests that autistic deficits in communication and social interaction can be explained in terms of a deficit in narrative communication skills. This theory, which differs from TOM, assumes that transactional capacities, and the lack thereof, are at the heart of autistic deficits. As we discuss later in this chapter, this work gives important hints about the transactional structure of narratives, a structure that we believe is of wider importance, not limited to the specific context of autism. What exactly is a narrative transactional format? Bruner and Feldman distinguish different stages. They suggest that the first transactional process is about reciprocal attribution of intentionality and agency. The characteristic format of preverbal transactions is, according to Bruner and Feldman, a narrative one, consisting of four stages: 1. 2. 3. 4.
canonical steady state precipitating event a restoration a coda marking the end.
An example is the peek-a-boo game where (1) mutual eye gaze is established between infant and caretaker, (2) the caretaker hides her face behind an object, (3) the object is removed revealing the face again, and (4) “Boo”, marking the end of the game. Let us consider the following story called “Weebo”, told by an 11-year old girl: “In America there was a professor called Peter Brainared and in 1978 he created a robot called Weebo. Weebo could do all sorts of things: she could create holograms, have a data bank of what the professor was going to do, show cartoon strips of what she was feeling like by having a television screen on top of her head which could open and close when she wanted to tell Peter how she felt. And she could record what she saw on television or what people said to her. Weebo looked like a flying saucer about as big as an eleven year old’s head also she could fly. Peter Brainared had a girlfriend called Sarah and they were going to get married but he didn’t turn up for the wedding because he was too busy with his experiments so she arranged for another one and another one but he still didn’t turn up, so she broke off the engagement and when he heard this he told Weebo how much he loved her and she recorded it, went round to Sarah’s house and showed her the clip on her television screen to show Sarah how much he loved her and it brought Sarah and Peter back together.”
140 Kerstin Dautenhahn
Bruner and Feldman’s four stages of the transactional narrative format are clearly identifiable in this written narrative: 1. 2. 3. 4.
introduction of setting and actors Peter misses the wedding and is sad Weebo comes to the rescue: he shows Sarah how much Peter loves her happy ending: Sarah and Peter are back together
Interestingly, although a central protagonist in the above story is a robot, it is depicted as an intentional agent (Dennett, 1987), embedded in a social context and behaving socially. Bruner and Feldman suggest that problems of people with autism in the social domain are due to an inability early in their lives to get engaged in ‘appropriate’ transactions with other people. These transactions normally enable a child to develop a narrative encoding of experiences that allows it to represent culturally canonical forms of human action and interaction. Normally, this leads a child, at 2–3 years of age, to rework experiences in terms of stories until she ultimately develops into a skilled story-teller (Engel, 1995). As research by Meltzoff, Gopnik, Moore and others suggests, transactional formats play a crucial role very early in a child’s life when she takes the first steps of becoming a ‘mindreader’ and socially skilled individual: reciprocal imitation games are a format of interaction that contributes to the mutual attribution of agency (Meltzoff and Gopnik, 1993; Meltzoff and Moore, 1999), immediate imitation creates intersubjective experience (Nadel et al., 1999). By mastering interpersonal timing and sharing of topics in such dyadic interactions, children’s transition from primary to pragmatic communication is supported. It seems that imitation games with caretakers play an important part in a child’s development of the concept of ‘person’ (Meltzoff and Gopnik, 1993; Meltzoff and Moore, 1999), and are a major milestone in the development of social cognition in humans. As we mentioned above, studies by Bruner and Feldman (1993) and others (e.g., Loveland et al., 1990) indicate that children with autism seem to have difficulty in organizing their experiences in a narrative format, as well as a difficulty in understanding the narrative format that people usually use to regulate their interactions. People with autism show a tendency to describe rather than to narrate, lacking the specific causal, temporal and intentional pragmatic markers needed for story-making. A preliminary study with highfunctioning children with autism, reported by Bruner and Feldman (1993), indicates that, although they understood stories (gave appropriate answers
The origins of narrative
when asked questions during the reading of the story), they showed great difficulty in retelling the story, i.e., composing a story, based on what they knew. The stories they told preserved many events and the correct sequence, but lacked the proper emphasis on important and meaningful events, events that motivated the plot and the actors. The stories lacked the narrative bent and did not conform to the canonical cultural expectations that people expect in ordinary social interaction. Such a lack of meaning-making makes conversations in ordinary life extremely difficult, although, as Bruner and Feldman note, people with autism can show a strong desire to engage in conversations (Bruner and Feldman, 1993). 4.3 Narratives in animal behavior? Stories have an extended temporal horizon, they relate to past and future, they are created depending on the (social) context. Do animals use (non-verbal) narrative formats in transactions? Studies, e.g., with bonobos, Grey parrots and dolphins, on animal language capacities usually focus on teaching the animals a language (using gestures, icons or imitating human sounds), and test the animal’s language capacities primarily in interactions with humans (SavageRumbaugh et al., 1986; Pepperberg, 1999; Herman, 2002). In the wild, the extent to which animals use a communication system as complex as human language is still controversial. For example, dolphins and whales are good candidates for sophisticated communicators. However, we argue that looking for verbal and acoustic channels of communication might disguise the nonverbal, transactional nature of narratives, as shown in preverbal precursors of narratives in the developing child, and possibly evolutionary precursors of (non-verbal) narrative that can be found in non-human animals. Michael Arbib (2002) proposes an evolutionary origin of human language in non-verbal communication and body language that can be found in many social species (e.g., mammals, birds). He suggests that imitation (and the primate mirror neuron system (Gallese et al., 1996)) provided the major mechanisms that facilitated the transition from body language and nonverbal imitation to verbal communication. Arbib’s work supports the arguments as presented in this chapter, namely, proposing a) the existence of a strong link between non-verbal, preverbal and verbal communication, and b) stressing the important role of dynamic formats of interactions, such as imitative games, in the development of social communication. With this focus on interactional structure and non-verbal narratives, what can stories in non-human primate species look like, and how can we recognize
141
142 Kerstin Dautenhahn
them? To date we are not aware of any ‘hard’ empirical evidence for storytelling capacities in non-human animals. However, it is known that primates are excellent ‘politicians’ in primate societies, involving extensive knowledge about direct (one-to-one) and third-party relationships. Primate behavior is not confined to fulfilling their immediate biological needs. Actions taken by an individual need to consider the social context, the primate social field. Primatologists know numerous examples of interactions that cannot be understood without assuming that the animals are aware of the social context. Note that any description of animal behavior can be biased by the narrative mind of the human observer, the story-teller. When watching a paramecium under a microscope, we can use our imagination to ‘make-up’ a story about an intentional agent that is ‘hungry’, ‘chases prey’, ‘searches for a mate’, etc. However, in the case of single-cell organisms, it is safe to assume that their ‘social field’ is far less developed (if at all) than in primate or other social species. Because of this danger of using imagination and anthropomorphism to attribute a narrative structure to animal behavior, below we give examples of stories of animal behavior told by primatologists who have been working for many years with their subjects, and who are more likely than untrained observers to report on observable sequences of events and their own well informed interpretations of the animal’s intentions and motivations. Let us consider Frans de Waal’s description of an event of reconciliation in chimpanzees. “On this occasion Nikkie, the leader of the group, has slapped Hennie during a passing charge. Hennie, a young adult female of nine years, sits apart for a while feeling with her hand on the spot on her back where Nikkie hit her. Then she seems to forget the incident; she lies down in the grass, staring in the distance. More than fifteen minutes later Hennie slowly gets up and walks straight to a group that includes Nikkie and the oldest female, Mama. Hennie approaches Nikkie, greeting him with soft pant grunts. Then she stretches out her arm to offer Nikkie the back of her hand for a kiss. Nikkie’s hand kiss consists of taking Hennie’s whole hand rather unceremoniously into his mouth. This contact is followed by a mouth-to-mouth kiss. Then Hennie walks over to Mama with a nervous grin. Mama places a hand on Hennie’s back and gently pats her until the grin disappears.” (de Waal, 1989, pp. 39, 42)
This example shows that the agent (Hennie) is interacting with an eye to future relationships, considering past and very recent experiences. Hennie, Nikkie and Mama have histories, autobiographic histories as individual agents (Dautenhahn, 1996), as well as a history of relationships among each other and as
The origins of narrative 143
members of a larger group. Although the event might be interpreted purely on the basis of behavioristic stimulus-response rules, for many primatologists the interpretation of the event in terms of intentional agents and social relationships is the most plausible explanation. Interestingly, Hennie’s interaction with Nikkie can be interpreted in terms of the canonical format of narrative transactions among intentional agents described in Section 4.2: 1. canonical state: greeting: soft pant grunts 2. precipitating event: Hennie reaches out to Nikkie (attempt at reconciling relationship) 3. restoration: kissing (relationship is restored) 4. end: Hennie is comforted by Mama The second example we discuss is a different type of primate social interaction, namely, tactical deception whereby the animal shifts the target’s attention to part of its own body. In this particular case the animal (a female Olive baboon) distracts the target (a male Olive baboon) with intimate behavior. “One of the female baboons at Gilgil grew particularly fond of meat, although the males do most of the hunting. A male, one who does not willingly share, caught an antelope. The female edged up to him and groomed him until he lolled back under her attentions. She then snatched the antelope carcass and ran.” Cited in (Whiten and Byrne, 1988, p. 217)
Here, the analysis in terms of transactional narrative formats looks as follows: 1. canonical state: male brings antelope, female waits 2. precipitating event: distraction by grooming 3. restoration: female snatches food and runs away (resolution, female achieves goal) 4. end: female eats meat (not described) Episodes of animal behavior as described above are very different from other instances of structured and sequential animal behavior, such as the chase-tripbite hunting behavior of cheetahs. Also, the alarm calls of vervet monkeys (Cheney and Seyfarth, 1990), although serving an important communicative function in a social group and having a component of social learning, are not likely to be narrative in nature. It is not the short length of such calls that makes it difficult to interpret them in terms of narrative, it is the fact that their primary function is to change the behavior of others as a response to a non-social stimulus, i.e., the sight of a predator, causing an appropriate behavior such as
144 Kerstin Dautenhahn
running to the trees after hearing a leopard alarm. The narrative format in animal behavior, on the other hand, refers to communicative and transactional contexts where communication is about the social field, i.e., group members, their experiences and relationships among them. Narratives are constructed based on the current context and the social context (communicator/speaker plus recipients/audience). The primate protagonists described above apparently interacted with respect to the social context, i.e., considering the social network and relationships among group members, with the purpose of influencing and manipulating others mental states. Thus, such kinds of non-verbal narratives are fundamentally social in nature. Table 1 summarizes the role of narratives in human ontogeny and phylogeny as discussed above. Table 1. Human Ontogeny
Primate Phylogeny
Primary mechanism for social bonding
Transactions with a narrative format in infant-caretaker interactions (Dyadic interactions, direct relationships, preverbal children)
Non-verbal transactions in Grooming narrative format: narratives ‘enacted’ in primate social interactions (Direct and third party relationships, primates)
Language in a narrative format: narratives spoken, written (Direct and third party relationships, verbal humans)
Language in a narrative format: narratives spoken, written (Direct and third party relationships, humans)
Language
A lot more work is necessary for a more detailed analysis of narrative formats in animal behavior. For example, the characteristics of the transactional format that Bruner and Feldman (1993) suggested need to be elaborated, possibly revised or replaced, and might need to be adapted to specific constraints of the primate social field. So our interpretation can only give a first hint of what aspects one might be looking for when searching for narrative formats in animal behavior.
The origins of narrative
5. How could the narrative intelligence hypothesis be tested? If human language and narrative intelligence, rooted in nonverbal narrative skills in non-human primates, have evolved to deal with an increasing need to communicate in more and more complex societies, what predictions can be made based on this hypothesis? How could the Narrative Intelligence Hypothesis be tested? What are important research directions based on the importance of narrative in animals and artifacts? Let us first consider how the NIH might be tested or falsified. As with other hypotheses on the origin of primate/human intelligence and language, animal behavior and communicative abilities are not directly documented in the fossil record. They can only be inferred indirectly from anatomical features (e.g., the vocal system that is necessary to produce a human-like language) and remains that indicate social structures (e.g., remains of nests or resting places, or groups of animals that died together). However, recent primate species that could serve as models of ancestors of the human species might give clues of what groups of primate species one might analyze if one wants to trace the origins of human narrative intelligence. Possible narrative structures confirmed in primate behavior might then be correlated with the complexity of the social field in these species. Today’s primates show a great variety of social organizations and group living. The Narrative Intelligence Hypothesis would predict that comparative studies of communicative and, in particular, narrative formats of interactions across primates species with different social organizations can identify a correlation between the complexity of the narrative format and an increasing complexity of the primate social field. Such an increase of social complexity need not be limited to group size. It could also cover all other aspects of social complexity, such as an increasing number of different types of interactions and roles of group members, and the dynamics of how the social network can change and adapt to changes. Such stages of social organization can be related to behavioral as well as cognitive and mental capacities of primates. The NIH suggests a search for the narrative format in interactions, a format that is so efficiently suited to communicate and deal with the complexity of social life. What kind of research directions and research methods could the NIH inspire?
Testing with robotic and computational models Robots have been increasingly used as models for understanding behavior, and sensori-motor control, in humans and other animals. Similarly, robots might
145
146 Kerstin Dautenhahn
have their place in the study of the origins of narrative intelligence. In an initial study (Dautenhahn and Coles, 2001) we investigated precursors of narrative based on episodic memory in autonomous robots. Following a bottom-up, Artificial Life approach towards narrative (Dautenhahn and Nehaniv, 1998; Nehaniv and Dautenhahn, 1998), we studied a single robot that could remember sequences of events (‘pre-narratives’). A particular goal in this project was to study minimal experimental conditions of how story-telling might emerge from episodic memory. An initial experiment (Dautenhahn and Coles, 2001) showed that ‘story-telling’ could be beneficial even to a single agent (cf. Nehaniv, 1997), since it increased the behavioral variability of the robot. The benefit of communicating episodic memory has also been shown in multi-agent simulation studies (Ho et al., 2004). Such research with an experimental, computational and robotic test-bed demonstrates a bottom-up approach towards studying narrative and how it can arise and evolve from pre-narrative formats (e.g., episodic memory abilities and formats that are necessary, but not sufficient, for narratives, as discussed in previous sections) in agents and agent societies. Also, it can provide a means to design and study narrative robots with ‘meaningful’ narratives that are grounded in the robot’s own experiences, and means of interacting, with the world and other agents (including robots), so as to contribute to the robot’s agenda to survive. The work described above indicates how artifacts might be used as scientific instruments to explore and experimentally test the design space of narrative intelligence. Narratives in this sense need to have a ‘meaning’ for an (intentional) agent. The approach of using artifacts as experimental test beds has been used successfully for many years in the areas of Adaptive Behavior and Artificial Life, yielding many interesting results that (a) help understand animal behavior and (b) help design life-like artifacts, in this case artifacts with narrative skills.
Study and analysis of animal narrative capacities Since the Narrative Intelligence Hypothesis does not assume any fundamentally ‘novel’ development in the transition from nonverbal (through evolution) or preverbal (development) to verbal narrative intelligence, a detailed study and analysis of the structure and format of animal narrative communication is required in order to develop a proper theory. Many animal species are highly social and use non-verbal means of body language in interaction and communication. Narrative intelligence has a communicative function (as a means of discourse and dialogue). However, it also has an individual dimension (understanding and thinking in terms of narrative, recreating a ‘self ’). Revealing
The origins of narrative 147
narrative structure in animal communication might, therefore, further our understanding of meaningful events in the lives of these animals.
Interesting open research questions (this is not an exhaustive list) – Relationship between preverbal and verbal narrative intelligence in humans (ontogeny) – Relationship between nonverbal narrative intelligence in non-human animals and narrative intelligence in humans (phylogeny) – The format of nonverbal narrative intelligence in animals (Species specific? Specific to social organization of animal societies?) – Can we identify narrative formats of interaction in different animal species? The work presented in this chapter is a small first step towards developing a theory of narrative that shows the evolutionary and developmental continuum of narrative capacities in humans and other animals. However, if, as we argued above, narrative and the narrative formats of transaction are deeply rooted in our ontogeny and phylogeny, then these provide important constraints and requirements for the design of artifacts that can meet the cognitive and social needs of ‘Homo narratus’.
6. ‘Homo narratus’: Implications for Human Society and Technology There are many implications of the Social Brain Hypothesis and the Narrative Intelligence Hypothesis for technology development. Human cognitive and narrative capacities are constrained by evolution and development. Even technological extensions and enhancements (new media, new means of communication, new interfaces and implants) need to operate within the boundaries set out by biology. Firstly, ‘imagined relationships’ might stand in for human beings, in particular when the ‘real’ social network is smaller than 150. With the help of book, television or email we can easily know (by name or sight) more than 150 people, e.g., have more than 150 phone numbers stored in our mobile’s database. However, these are not the types of individually known kin, friends, allies, or even enemies who are mutually known over an extended period of time so that the term ‘relationship’ applies. In particular, mass media such as television can give us the illusion that we ‘know’ news presenters, talk show hosts, movie stars, comic or video game characters, etc. The roles of friends and
148 Kerstin Dautenhahn
social partners might be filled by such imagined ‘partners’, and might serve a role similar to real human networks (Dunbar, 1996). However, any such ‘relationships’ are uni-directional; feelings such as love and admiration can only be expressed from a distance and will (realistically) not be returned. Recently emerging interactive agent technology adds another dimension to such ‘imagined friends’: virtual or robotic agents that give the illusion of life, namely, show appearance and behavior of real humans, such as embodied conversational agents (Cassell et al., 2000). However, no matter how many virtual and robotic friends will become members of our social network, these extensions are not without limits. There are biological limits, constrained by the cognitive group size limit of 150 that characterizes the size of social networks of human primates. As Dunbar argues (1996), modern information technology might change a number of characteristics of how and with whom and with what speed we communicate, but will not influence the size of social networks, nor the necessity of direct personal contact that is needed to provide trust and credibility to social relationships. “Yet underlying it all are minds that are not infinitely flexible, whose cognitive predispositions are designed to handle the kinds of small-scale societies that have characterized all but the last minutes of our evolutionary history.” (Dunbar, 1996, p. 207). We cannot escape our biology, as Dunbar (1992, p. 469) put it: “species will only be able to invade habitats that require larger troops than their current limit if they evolve larger neocortices.” Consequently, for us to exceed the magic number 150, our environmental ‘niche’ would have to change so that larger group sizes have a selective advantage and biological evolution (if it still applies to the human species today) can select for larger neocortices. Expanding this argument to a hypothetical super-human species that might evolve, we might speculate that this ‘Homo narratus’ would have enhanced narrative intelligence that enables the species to deal with an increasing group size. It is impossible to predict what the stories of the future might look and sound like: Will they be beautifully complex and experience rich? Will language itself have changed, adapting to an enhanced need to deal with a complex social field? Generally, we can expect that empowering human skills of forming and maintaining social networks might be advanced by supporting the development of narrative skills in children and adults. As we have shown in this chapter, narratives are not only entertaining and fun; they serve an important cognitive function in the development of social cognition and a sense of self (Dennett, 1989). Humane technology needs to respect human narrative grounding (Nehaniv 1999).
The origins of narrative 149
The narratives of the future might reflect our ability to preserve coherence and structure in human societies that consist of increasingly fragmented, temporally and geographically distributed, social networks. In shaping this development it is important to investigate the evolutionary heritage of our narrative capacities and the natural boundaries it provides. Also, appreciating the stories other non-human animals tell will allow us to put our familiar stories-aswe-know-them into the broader perspective of stories-as-they-could-be.
Notes * I would like to thank Barbara Gorayska and three anonymous reviewers for very helpful comments on a previous version of this paper. Chrystopher Nehaniv helped with many discussions on narrative over the past few years. Penny Stribling and Tim Luckett gave me very useful pointers to literature on autism and narrative. 1. This article is a modified version of K. Dautenhahn (2001). See also related work in (Dautenhahn, 1999) and (Dautenhahn, 2003). 2. The relationships between narrative, on the one hand, and culture and autobiography, on the other hand, are only touched upon in this chapter but are discussed in more detail elsewhere (Dautenhahn, 1999; Dautenhahn, 2003). 3. Note that group size as such is not the only indicator of social complexity: other researchers have found, e.g., that primate species with relatively larger neocortices exhibit more complex social strategies than species with smaller neocortices (Pawlowski et al., 1998).
References Arbib, M. (2002). The mirror system, imitation, and the evolution of language. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in Animals and Artifacts, Cambridge, MA; MIT Press. Baron-Cohen, S. (1995). Mindblindness. Cambridge, MA, London, England: A Bradford Book, The MIT Press. Barton R. A. & R. I. M. Dunbar (1997). Evolution of the social brain. In A. Whiten & R. W. Byrne (Eds.), Machiavellian Intelligence II: Extensions and Evaluations, pp. 240–263. Cambridge: Cambridge University Press. Bruner, J. (1987). Actual Minds, Possible Worlds. Cambridge, MA: Harvard University Press. Bruner, J. (1990). Acts of Meaning. Cambridge, MA: Harvard University Press. Bruner, J. (1991). The Narrative Construction of Reality. Critical Inquiry 18(1), 1–21. Bruner, J. & C. Feldman (1993). Theories of mind and the problem of autism. In S. BaronCohen, H. Tager-Flusberg, D. J. Cohen (Eds.), Understanding other Minds: Perspectives from Autism. Oxford: Oxford University Press.
150 Kerstin Dautenhahn
Bumby, K. & K. Dautenhahn (1999). Investigating Children’s Attitudes Towards Robots: A Case Study. In K. Cox, B. Gorayska & J. Marsh (Eds.), Proceedings of the. Third International Conference on Cognitive Technology: Networked Minds (CT’99), pp. 359–374. (Available at www.cogtech.org) Byrne, R. W. (1997). Machiavellian intelligence. Evolutionary Anthropology 5, 172–180. Byrne, R. W. & A. Whiten (Eds.) (1988). Machiavellian Intelligence. Oxford: Clarendon Press. Cassell, J., J. Sullivan, S. Prevost & E. Churchill (Eds.) (2000). Embodied Conversational Agents. Cambridge, MA: MIT Press. Charman, T. & Y. Shmueli-Goetz (1998). The relationship between theory of mind, language, and narrative discourse: an experimental study. Current Psychology and Cognition 17(2), 245–271. Cheney, D. L. & R. M. Seyfarth (1990). How Monkeys See the World. Chicago: University of Chicago Press. Dautenhahn, K. (1996). Embodiment in animals and artifacts. In Proceedings of the AAAI Symposium on Embodied Cognition and Action, pp. 27–32. Menlo Park, California: AAAI Press. Dautenhahn, K. (1999). The lemur’s tale — Story-telling in primates and other socially intelligent agents. In M. Mateas & P. Sengers (Eds.), Proceedings of the AAAI Symposium on Narrative Intelligence, pp. 59–66. Menlo Park, California: AAAI Press. Dautenhahn, K. (2001). The Narrative Intelligence Hypothesis: In Search of the Transactional Format of Narratives in Humans and Other Animals. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Proceedings of the Fourth International Cognitive Technology Conference, CT2001: Instruments of Mind, pp. 248–266. Berlin: Springer Verlag. Dautenhahn, K. (2003). Stories of Lemurs and Robots — The Social Origin of Story-Telling. To appear in M. Mateas & P. Sengers (Eds.), Narrative Intelligence, pp. 63–90. Amsterdam & Philadelphia: John Benjamins. Dautenhahn, K. & S. Coles (2001). Narrative Intelligence from the bottom up: A computational framework for the study of story-telling in autonomous agents. Journal of Artificial Societies and Social Simulation (JASSS) 4(1), January 2001. Dautenhahn, K. & C. L. Nehaniv (1998). Artificial life and natural stories. In Proceedings of the. Third International Symposium on Artificial Life and Robotics, Volume 2, pp. 435–439. Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press. Dennett, D. C. (1989/91). The origins of selves. Cogito 3, 163–73, Autumn 1989. Reprinted in D. Kolak and R. Martin (Eds.) (1991), Self & Identity: Contemporary Philosophical Issues. New York: Macmillan. de Waal, F. (1982). Chimpanzee Politics: Power and sex among apes. London: Jonathan Cape. de Waal, F. (1989). Peacemaking among Primates. Cambridge, MA: Harvard University Press. Donald, M. (1993). Precis of Origins of the modern mind: Three stages in the evolution of culture and cognition. Behavioral and Brain Sciences 16, 737–791. Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution 20, 469–493. Dunbar, R. I. M. (1993). Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences 16, 681–735.
The origins of narrative
Dunbar, R. I. M. (1996). Grooming, Gossip and the Evolution of Language. London, Boston: Faber and Faber Limited. Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary Anthropology, 6, 178–190. Dunbar, R. I. M. & J. Bever (1998). Neocortex size predicts group size in carnivores and some insectivores. Ethology 104, 695–708. Engel, S. (1995/99). The Stories Children Tell: Making Sense of the Narratives of Childhood. New York: W. H. Freeman and Company. Farnell, B. (1999). Moving Bodies, Acting Selves. Annual Review of Anthropology 28, 341–373. Gallese, V., L. Fadiga, L. Fogassi & G. Rizzolatti (1996). Action recognition in the premotor cortex. Brain 119, 593–609. Griffin, D. R. (1976). The question of animal awareness: Evolutionary continuity of mental experience. New York: The Rockefeller University Press. Hall, E. T. (1968). Proxemics. Current Anthropology 9(2–3), 83–95. Herman, L. M. (2002). Vocal, social, and self imitation by bottlenosed dolphins. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in Animals and Artifacts. Cambridge, MA: MIT Press. Ho, W. C., Dautenhahn, K., Nehanv C. L. & R te Boekhors (2004). Sharing memories: An experimental investigation with multiple autonomous autobiographical agents. In F. Groen, N. Amato, A. Bonarini, E. Yoshida & B. Kröse (Eds.), Intelligent Autonomous Systems 8 (IAS8). IOS Press, pp. 361–370. Hobson, P. (1993). Understanding persons: the role of affect. In S. Baron-Cohen, H. TagerFlusberg & D. J. Cohen (Eds.), Understanding other minds, Perspectives from autism, pp. 204–227. Oxford: Oxford University Press. Jordan, R. (1999). Autistic Spectrum Disorders: An introductory handbook for practitioners. London: David Fulton Publishers. Leslie, A. M. (1987). Pretence and representation: The origins of “Theory of Mind”’. Psychological Review 94 (4), 412–426. Loveland, K. A., R. E. McEvoy & B. Tunali (1990). Narrative story telling in autism and Down’s syndrome. British Journal of Developmental Psychology 8, 9–23. Mandler, J. M. (1984). Stories, scripts, and scenes: Aspects of schema theory. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Marino, L. (1996). What can dolphins tell us about primate evolution? Evolutionary Anthropology 5(3), 81–86. Meltzoff, A. N. & A. Gopnik (1993). The role of imitation in understanding persons and developing a theory of mind. In S. Baron-Cohen, H. Tager-Flusberg & D. J. Cohen (Eds.), Understanding other minds, Perspectives from autism, pp. 335–366. Oxford: Oxford University Press. Meltzoff, A. N. & M. K. Moore (1999). Persons and representation: why infant imitation is important for theories of human development. In J. Nadel & G. Butterworth (Eds.), Imitation in Infancy, pp. 9–35. Cambridge: Cambridge University Press. Nadel, J., C. Guerini, A. Peze & C. Rivet (1999). The evolving nature of imitation as a format of communication. In J. Nadel & G. Butterworth (Eds.), Imitation in Infancy, pp. 209–234. Cambridge: Cambridge University Press.
151
152
Kerstin Dautenhahn
Nehaniv, C. L. (1997). What’s Your Story? — Irreversibility, Algebra, Autobiographic Agents. In K. Dautenhahn (Ed.), Proceedings of the AAAI Symposium on Socially Intelligent Agents, pp. 150–153. Menlo Park, California: AAAI Press. Nehaniv, C. L. (1999). Story-Telling and Emotion: Cognitive Technology Considerations in Networking Temporally and Affectively Grounded Minds. In K. Cox, B. Gorayska & J. Marsh (Eds.), Proceedings of the. Third International Conference on Cognitive Technology: Networked Minds (CT’99), pp. 313–322. (Available at www.cogtech.org) Nehaniv, C. L. & K. Dautenhahn (1998). Embodiment and Memories — Algebras of Time and History for Autobiographic Agents. In R. Trappl (Ed.), Proceedings of the 14th European Meeting on Cybernetics and Systems Research, pp. 651–656. Vienna: Austrian Society for Cybernetic Studies. Nehaniv, C. L., K. Dautenhahn & M. J. Loomes (1999). Constructive Biology and Approaches to Temporal Grounding in Post-Reactive Robotics. In G. T. McKee & P. Schenker (Eds.), Sensor Fusion and Decentralized Control in Robotics Systems II, Proceedings of The International Society for Optical Engineering (SPIE), Volume 3839, pp. 156–167. Nelson, K. (Ed.) (1989). Narratives from the crib. Cambridge, MA: Harvard University Press. Nelson, K. (1993). The psychological and social origins of autobiographical memory. Psychological Science 4(1), 7–14. Pawlowski, B., C. B. Lowen & R. I. M. Dunbar (1998). Neocortex size, social skills and mating success in primates. Behaviour 135, 357–368. Pepperberg, I. M. (1999). The Alex Studies. Cognitive and Communicative Abilities of Grey Parrots. Cambridge, MA: Harvard University Press. Read, S. J. & L. C. Miller (1995). Stories are fundamental to meaning and memory: For social creatures, could it be otherwise? In R. S. Wyer (Ed.), Knowledge and Memory: the Real Story, pp. 139–152. Hillsdale, N J: Lawrence Erlbaum Associates. Rogers, S. J. & B. F. Pennington (1991). A theoretical approach to the deficits in infantile autism. Development and Psychopathology 3, 137–162. Savage-Rumbaugh, E. S., K. McDonald, R. A. Sevcik, W. D. Hopkins & E. Rubert (1986). Spontaneous symbol acquisition and communicative use by pygmy chimpanzees (Pan paniscus). Journal of Experimental Psychology: General 115, 211–235. Schank, R. C. & R. P. Abelson (1977). Scripts, Plans, Goals and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale, NJ: Erlbaum. Sindermann, C. J. (1982). Winning the Games Scientists Play. New York & London: Plenum Press. Turner, M. (1996). The Literary Mind. Oxford: Oxford University Press. Whiten, A. & R. W. Byrne (1988). The manipulation of attention in primate tactical deception. In R. W. Byrne & A. Whiten (Eds.), Machiavellian Intelligence pp. 211–237. Oxford: Clarendon Press. Whiten, A. & R. W. Byrne (Eds.) (1997). Machiavellian Intelligence II: Extensions and Evaluations. Cambridge: Cambridge University Press.
The semantic web Knowledge representation and affordance* Sanjay Chandrasekharan Carleton University, Ottawa
Introduction The World Wide Web is a complex socio-technical system, and can be understood in many ways. One dominant view looks at the Web as a knowledge repository, albeit a very disorganized one, and the challenge is to get the maximum knowledge out of it in the minimum time possible. Most of the Semantic Web effort (which develops standards for metadata), and the work on search engines, assume this view of the Web. However, there’s another way of understanding the web, which is to view it as an action-enabling-space, where you can buy, sell, bid, book, gamble, play games, debate, chat, etc. There is not much of an effort to understand and classify the web from this point of view. For instance, there are no search engines that allow you to search exclusively for possible actions, like sending_flowers, buying_tickets, booking_rooms, etc., though all these are activities possible over the Web. And there are no action metatags. The primary reason for this absence is the overarching nature of the first view — the web-as-information one — which subsumes the action-space view. This results in information about actions being treated as just another kind of information. So, if you need to know about buying tickets and booking rooms, you search Google (or Froogle). And if you need to execute the action of buying tickets or booking a room, you search Google again, probably using the same keywords. In this chapter, I make a distinction between these two ways of approaching the Web, and argue that the design of the Semantic Web should focus more on possible actions humans and artificial agents can execute on the Web. This means we should develop ways to distinguish between search for actions and
154
Sanjay Chandrasekharan
search for knowledge. In particular, I argue that the current design of topdown, exhaustive ontologies does not consider the representation of possible actions on the Web. The following are the two major theoretical assumptions of this chapter: –
The web is a world-mediating system (Clark, 2001). According to Clark, “it mediates between users and a part of the world, often by manipulating machine representations of the world. State changes in the software system may cause state changes or side effects in the real world.” In his article in xml.com, Clark explains this notion using the following example: “Consider a web-based banking application. Performing banking tasks by using a web application is functionally equivalent to performing them at the bank’s physical location. There are obvious phenomenological differences to the user in each case, but there aren’t any differences to the user’s bank account. A $100 withdrawal from a teller is equivalent, in all respects relevant to the bank account itself, to a $100 web application withdrawal. A web-based funds transfer just is a funds transfer, as a matter, among other things, of convention and institutional fact.”
–
From this view, of the web as a world-mediating or action-mediating space, it follows that the development of the Semantic Web (developing tags that allow documents and other entities to describe themselves to programs), involves building action-infrastructure for agents, both human and artificial ones. That is, the design of the Semantic Web is akin to designing environments that support human actions in the world — environments like cockpits, kitchens and studios. The Semantic Web effort is thus about designing action-enabling information structures in the web world, to fit the actions web agents want to perform. The difference from cockpits and kitchens is that the actions performed on the web are based on linguisticacts, and therefore the environment designed to fit those actions is also a linguistic one.
The view of the web as world-mediating and action-enabling turns the structure provided by the Semantic Web into affordances for action1 (Norman, 1993 and 1998; Reed, 1996; Gibson 1979), or action-oriented-representations (Clark, 1997). However, the commonly accepted view is that Semantic Web structures are knowledge representation structures, designed to facilitate knowledge recovery and inference. So should the Semantic Web be creating affordances or knowledge representation? Or both? Is there a distinction between the two? What advantage, if any, do affordances offer? How do
The semantic web
affordances fit in with current approaches to agent design? These are the questions I will be tackling in this chapter. The chapter is structured as follows. In Section 1, I will describe briefly a framework for understanding how humans and other organisms generate structures in the world to help them (or others) perform actions better. In Section 2, I introduce a classification of agent-environment relationships, to understand how adding structure to the world fits in with current agent design methodologies. In Section 3, I consider the Semantic Web as an instance of changing the world for better cognition. Section 4 applies the insights gained from the previous sections to the design of ontologies. Section 5 discusses the design of category-based ontologies and affordance-based ontologies from a cognitive load-balancing point of view. Section 6 considers the question we started with — whether the Semantic Web should provide affordances or knowledge representation — and provides the conclusion.
1.
Distributed cognition and the web
When organisms generate structures in the world for action, it results in tailoring the world to the agents’ capabilities, in such a way that the world contributes to cognition and action at run-time. The generation of such “congenial” structures for action, in physical and representational environments, is explored by the Distributed Cognition (DC) framework (Hutchins, 1995; Hollan et al., 2000; Kirsh, 2001). Within Distributed Cognition, Kirsh (1996), and to some extent Hutchins (1995) have explored such world-changing in detail. Kirsh’s analysis considers how animals change their environment to make their tasks easier. He identifies two kinds of structure animals create in the environment, physical and informational. An example of physical structure created in the environment for improved action is tools used by animals — for instance Caledonian crows using twigs to probe out insects from the ground. The crows even redesign their tools, by making probes out of twigs bitten from living trees, and even wires in laboratory conditions, as illustrated by Weir and colleagues recently (Weir et al., 2002). An example of informational structure created for action is people reorganizing their cards in a game of gin rummy. In this case, the player is using the cards to encode his plans externally. The card grouping “tells” the player what she needs to do, she does not have to remember it. In Kirsh’s terms, the player
155
156 Sanjay Chandrasekharan
makes a “call” to the world when he uses the grouping of the cards, thus making the world part of cognition. The gin rummy algorithm is distributed across the player and the card set. The action of sorting the card set reorganizes the environment for “mental rather than physical savings”. Kirsh and Maglio (1994) term these kind of actions “epistemic actions” as different from “pragmatic actions”. Epistemic action changes the world to provide agents knowledge, pragmatic action changes the world for the actual physical execution of the task. According to Kirsh and Maglio, the first kind of structures created in the environment, informational structures, furthers “cognitive congeniality”, as against physical congeniality. We will term such structures, which improve cognitive congeniality for agents, epistemic structure. Many animals create epistemic structures in the world to reduce their own and others’ cognitive complexity. Wood mice (Apodemus sylvaticus) distribute small objects, such as leaves or twigs, as points of reference while foraging. They do this even under laboratory conditions, using plastic discs. Such “waymarking” diminish the likelihood of losing interesting locations (Stopka and MacDonald, 2003) during foraging. Red foxes (Vulpes vulpes) use urine to mark food caches they have emptied. This marking acts as a memory aid and helps them avoid unnecessary search (Henry, 1977, reported in Stopka and MacDonald, 2003). Ants drop pheromones to trace a path to a food source. Many mammals mark up their territories. Plants develop colors and smells to attract pollinators, sometimes even to fight predators (Heiling et al., 2003; Beck, 2001). The bower bird creates colorful nests to attract mates (Zahavi and Zahavi, 1997). Many birds advertise their desirability as mates using some form of external structure, like colorful tails, bibs etc. (Bradbury and Vehrencamp, 1998). Other animals have signals that convey important information about themselves to possible mates and even predators (Zahavi and Zahavi, 1997). At the most basic level, cells in the immune system use antibodies that bind to attacking microbes, thereby “marking” them. Macrophages use this “marking” to identify and destroy invading microbes. Bacterial colonies use a strategy called “quorum sensing” to know that they have reached critical mass (to attack, to emit light, etc.). This strategy involves individual bacteria secreting molecules known as auto-inducers into the environment. The auto-inducers accumulate in the environment, and when it reaches a threshold, the colony moves into action (Silberman, 2003). These kind of structures (usually termed signaling) form a very important aspect of animal life across biological niches. We will use one case of signaling to analyze the advantages provided by changing the informational structure of the environment. Consider the peacock’s tail, the paradigmatic instance of an
The semantic web
animal signal. The tail’s function is to allow female peacocks (peahens) to make a mating judgment, by selecting the most-healthy male (Zahavi and Zahavi, 1997). The tail reliably describes the inner state of the peacock, that it is healthy (and therefore has good genes). The signal is reliable because it pays only a peacock with enough resources to produce a flamboyant tail. If you are a sickly male, you cannot spend resources to produce ornaments. Thus the health of the peacock is directly encoded in the tail; the peacock carries its internal attributes on its tail, so to speak. To see the cognitive efficiency of this mechanism, imagine the peahen having to make a mating decision without the existence of such a direct and reliable signal. The peahen will need to have a knowledgebase of how the internal state, of health, can be inferred from behavioral and other cues. Let’s say “good dancing”, “lengthy chase of prey”, “long flights” (peacocks fly short distances), “tough beak” and “good claws” are cues for the health of a peacock. To arrive at a decision using these cues, first the peahen will need to “know” these cues, and that some combinations of them imply that the male is healthy. Armed with this knowledge, the female has to sample males for an extended period of time, and go through a lengthy sorting process based on the cues (rank each male on each of these cues: good, bad, okay). Then it has to compare the different results, keeping all of them in memory, to arrive at an optimal mating decision. This is a computationally intensive process. The tail allows the female peacock to shortcut all this computation, and go directly to the mosthealthy male in a lot. The tail provides the peahen a single, chunked, cue, which it can compare with other similar ones to arrive at a decision. The tail provides a standardized way of arriving at a decision, with the least amount of computation. The peacock describes itself using its tail. Reliable self-description, like the peacock’s tail, is one of nature’s ways of avoiding long-winded sorting and inference. In using self-describing metadata structures like XML, we are seeking to emulate nature’s design in the Semantic Web. The peacock example (and others above) shows that the reduction of others’ cognitive complexity using externally stored informational structures is very common, and it can be considered one of the building blocks of nature. Signaling exists at all levels of nature, from bacteria to plants, crickets, gazelles and humans. Note that the signal provides cognitive congeniality to the receiver, and not to the sender. The sender, for instance the peacock, gains because he has an interest in being selected for mating. This is very similar to the case of semantic mark-up, where the cognitive congeniality (less processing load) is for the reader, the encoder does the marking up because she stands to gain in some way.
157
158
Sanjay Chandrasekharan
Even though signaling is a basic structure of cognition, it has received very little attention from agent design methodologies. Many researchers have considered the role of stigmergy in changing environment structure. Stigmergy is a coordination mechanism where the action of one individual in a colony triggers the next action by others (Susi, 2001). It is a form of indirect communication, and has been a favoured mechanism for situated AI because it avoids the creation of explicit representations. Signaling, on the other hand, is closer to being a representation, and therefore more useful in understanding the creation of representations, like in the case of socio-technical systems like the Web. In the following section I develop a framework to understand how epistemic structures like signals fit in with agent-environment relationships in current agent design.
2. Agent design based on epistemic structure I categorize agent design into four frameworks. To illustrate these four frameworks, I will use the problem of providing disabled people access to buildings. There are four general approaches to solve this problem. –
–
–
–
Approach I: This involves building an all-powerful, James Bond-style, vehicle that can function in all environments. So it can run, jump, fly, climb spiral stairs, raise itself to high shelves, detect curbs, etc. This design does not incorporate detailed environment structure into the vehicle, it is built to overcome limitations of all environments. Approach II: This involves studying the vehicle’s environment carefully and using that information to build the vehicle. For instance, the vehicle will take into account the existence of curbs (and them being short), stairs generally being non-spiral and having rails, level of elevator buttons, etc. So it will have the capacity to raise itself to short curbs, climb short flights of straight stairs by making use of the rails, etc. Note that the environment is not changed here. Approach III: This involves changing the environment. For instance, building ramps and special doors so that a simple vehicle can have maximum access. This is the most elegant solution, and the most widely used one. Here the environment is changed, so that it contributes to the agent’s action. Our analysis will focus on this approach. Approach IV: The fourth one is similar to the first one, but here the environment is all-powerful instead of the vehicle. The environment becomes “smart”, and the building detects all physically handicapped people, and
The semantic web 159
glides a ramp down to them, or lifts them up etc. This solution is an extreme case of approach III, we will ignore it in the following analysis. Now, the first approach is similar to the centralized AI methodology, which ignores the structure provided by specific environments during design. The environment is something to overcome, it is not considered a resource. This methodology tries to load every possible environment on to the agent, as centrally stored representations (see footnote 2). The agent tries to map the encountered world on to this internal template structure, and when the template structure does not obtain in the world, fails (see Figure 1, centralized AI). The second approach is similar to the situated AI model promoted by Rodney Brooks (1991).2 This methodology recognizes the role of the environment as a resource, and analyses and exploits the detailed structure that exists in the environment while building the agent. Notice that the environment is not changed here. This is a passive design approach, where the environment is considered a given. (See Figure 1, Brooksian AI.) In the third approach, the designer actively intervenes in the environment and gives structure to it, so that the agent can function better. This is Active Design, or agent-environment co-design. The idea is to split the intelligence load — part to the agent, part to the world. This is agent design guided by the principle of distributing cognition, where part of the computation is hived off to the world. Kirsh (1996) terms this kind of “using the world to compute” Active Redesign. (See Figure 2, Active Design.) This design principle underlies many techniques to minimize complexity. At Kirsh’s physical level, the Active Design principle can be found in the building of roads for wheeled vehicles. Without roads, the vehicles will have a hard time, or all vehicles will need to have tank wheels. With roads the movement is a lot easier for average vehicles. This principle is also at work in the “intelligent use of space” where people organize objects around them in a way that helps them execute their functions (Kirsh, 1995). Kitchens and personal libraries (which use locations as tags for identifying content) are instances of such use of space in cognition. A good example of active design at Kirsh’s information level (the cognitive congeniality level) is bar coding. Without bar coding, the checkout machine in the supermarket would have to resort to a phenomenal amount of querying and object-recognition routines to identify a product. With bar coding, it becomes a simple affair. The Semantic Web enterprise is another instance of Active Design at the information level.3 The effort is to create structure in an information
160 Sanjay Chandrasekharan
Passive Design Approach 1: Centralized AI
???
Designer abstracts structure from environment and stores it within the agent. Imposes his/her structure on agent. Design Time
Queries for structure at run time. Compares stored structure with perceived one. Fails if there is no match. Run-time
Passive Design Approach 2: Brooksian AI
Environment studied and query-action associations developed, to exploit structure in the environment at runtime. Little or no structure stored within agent. Design Time
Queries constantly for external structure. Executes action if structure obtains. More robust design. Run-time
Figure 1.Passive approaches to agent design. The environment is considered a given, the designer makes no changes to the environment.
environment (the Web) so that software and human agents can function effectively in it. The Active Design principle is also at work in the Auto-ID4 and the Physical Markup Language efforts, which try to develop low-cost Radio-frequency Identification (RFID) tags and a common standard to store information in such tags. These tags can be embedded in products, quite like meta tags in web pages. Such tagged objects can be easily recognized by agents fitted with RFID readers (for instance, robots working in a recycling plant). The tags essentially create a referable world for such agents (See Chandrasekharan and Esfandiari, 2000, for more on the relation between agents and worlds). I consider the Auto-ID effort as an extension of the web, because most of these tagged objects will be tracked by supply-chain applications over the web, some such applications already exist.
The semantic web
Active Design Approach
Doping the World: The designer actively intervenes in the environment and adds structure to it Design Time
Queries for added structure at run time. Executes action when structure obtains. Run-time
Figure 2.Active design, or agent-environment co-design. In the third approach, the knowledge is split equally between the agent and the environment. The world is “doped” in a way that it has some necessary properties. The agent and the environment evolve together. In the fourth case (which is not illustrated here), it is the environment that is designed, and the agent is assumed to have minimal capabilities.
These tagged objects would thus become the web’s real-world nodes, and could be manipulated over the web. The Active Design approach is applied at the social level as well, especially in instances involving Trust. Humans actively create structure in the environment to help others make trust decisions. Formal structure created for trust includes credit ratings, identities, uniforms, badges, degrees, etc. These structures serve as reliable signals for people to make trust decisions. Less reliable, and more informal, structure we create include standardized ways of dressing, talking, etc. The fourth approach in our agent design taxonomy is the ubiquitous/ pervasive computing idea. This is an extreme version of the Active Design approach. Real design can be seen as a combination of two or more of these approaches. As illustrated by the examples, the third approach is the most elegant one — change the world, redesign it, so that a minimally complex agent can work effectively in that world.
161
162 Sanjay Chandrasekharan
3. Semantic Web and Active Design Active Design is a minimal design framework based on the principle of distributing cognition to the world. In this design methodology, there is an active effort to split the cognitive load and push it to the world, thereby making the environment work for the agent. The Semantic Web effort can be considered an instance of Active Design, because the effort is to provide machine-understandable structure to documents and programs (the environment in this case) so that other programs (software agents) and people can work better. The structure provided “stabilizes” the environment (Hammond et al., 1995, Kirsh, 1995) for particular functions other agents want to execute. This way of looking at meta-tags — as structure generated to fit agents’ functions — puts meta-tags closer to the ecological psychology notion of built environment and created affordances (Reed, 1996; Gibson, 1979) than to knowledge representation. All standardizations result in such stable environments for actions. For instance, library classification schemes stabilize document environments for search. In the case of the Semantic Web, there is stabilization at different levels, the different metadata formats create different structure. Thus XML provides standardized syntax, and RDF provides a standardized description of resources. Notice that while low-level standardization (like XML) provides stabilization that supports a variety of functions, high-level standardization (like library organizations and filename extensions) usually are designed to “fit” particular functions, for instance high-level search, instead of cue-based search. How does the change in the environment reduce complexity? The complexity-reduction happens through the ‘focusing’ of structures to function (See Agre and Horswill, 1997, for an elaboration of this point). The computational and neural basis of this process is extremely complex and poorly understood. For design purposes, we can say that the created structure in the environment ‘fits’ the function (or action) the agent seeks to fulfil or perform. A good example from the animal world is again the tail of the peacock, which ‘fits’ the mate selection function of the peahen — the tail provides information exclusively for that function, and it exists just for that purpose. An artifact example is bar coding. The information in the barcode is extremely focused to the functioning of the supermarket check-out machine. It provides the machine with a number, which it uses to retrieve information such as the name of the product, weight, date of expiry and price from the point-ofsale computer. It does not retrieve information such as the container is round, or is made of plastic, or that it was made during foreman Bill’s shift at a factory
The semantic web 163
in Paradise Falls. There could be functions that need such information. For instance, one can imagine using the barcode to categorize containers (retrieve the type of container “Made of plastic”), and a machine in a recycling plant using that information to sort plastic containers. Notice that the check-out machine has no use for this information, and that the presence of this information in the file it retrieves will only add complexity to the check-out machine’s decision-making. The optimal design is where the agent (the check-out machine) gets just the information it needs, like price, etc. Similarly, the recycling machine needs only information on the type of material, not the prices and date of expiry and whose shift the container was made. For Active Design to work best, the structure provided to the environment should focus on the function an agent needs to perform. A uniform, generalised, structure that is potentially useful for all agents is not an efficient structure. Such a generalised structure would only add complexity to the processing any single agent needs to perform, because such a structure would increase search and inference, as it is not focused to the function the agent wants to perform.
4. Ontologies: Knowledge representation vs. affordance Formal ontologies, an aspect of the Semantic Web, is currently designed as a generalized structure that supports any function an agent wants to perform. A formal ontology is a “specification of a conceptualization” (Gruber, 1993). In less formal terms, it is a standardized description of concepts and their relations. An ontology metatag essentially classifies a document as being part of a knowledge domain (or a real world domain). It also supports inference by providing a pointer to a file that describes the categories and the possible relationships between categories that exist within that domain. This is very similar to the functioning of a barcode, except that ontologies are exhaustive. So if you have a document with the metatag USE-ONTOLOGY ID="csdept-ontology", this means the agent encountering the document should use the cs-dept-ontology,5 which is available at a URL, to categorise the document and its content elements. This ontology formally captures content elements found in computer science department pages and their relationships. Note that this is a generalized structure, and can be used by any agent, for any function. Imagine the use case of an agent that roams the web and collects the names and e-mail addresses of all the faculty members in computer science departments. Once the ontology part of the Semantic Web is in place, all the agent
164 Sanjay Chandrasekharan
needs to do is look at the ontology meta-tag on a web page. If it says cs-deptontology, then the agent looks up the ontology, and then parses the document to see whether the faculty categories mentioned in the cs-dept-ontology exist within the document, and if one of them does, gets that category, name and e-mail of the faculty member. This provides computational efficiency for the agent, because it doesn’t have to infer from the page content whether the page is a computer science professor’s or not. However, notice that this simple agent is using only a snippet of the detailed categories and relationships captured by the formal ontology, the rest of the categories and relationships only provide computational complexity for this agent. So even though there is some computational advantage, there is an overall wastage of resources. On the side of the designer and user, notice that the harvesting of computer science professors’ e-mail addresses is not a function that the developer of the ontology, and the designer of the site, necessarily wanted to support.6 If the professors’ pages refer to ontology snippets (like in the case of bar code files), and provide just enough information to support desirable functions, this kind of exploitation of structure would be more difficult. Now, think of the popular use case of a software agent going out on the web to find and book the best room for your holiday. Suppose hotel sites visited by the web agent uses the hotel-ontology meta-tag. The hotel-ontology tag allows the agent to know that it is on a hotel site, and to infer that hotels have rooms, with states available and unavailable. It also allows the agent to know the rent for a room, and to know that rooms have features like air-conditioning. However, the hotel-ontology meta tag and the generalized description of hotels at some URL don’t provide the agent with a computationally efficient way of booking a room, given its preferences. Categorizing an entity is not enough to decide how to go about using the entity (think of VCRs). To actually execute the action of booking a room, the agent needs a more action-oriented tag set, something like “check_dates”, “compare_prices” and “book_room”. In some cases, generalized structures will just not work. An example from the Physical Markup Language (PML) domain illustrates this. PML allows designers to provide structure to everyday objects using RFID (Radio-frequency identification) tags. If these objects are tracked using the web, they could be considered as real-world nodes of the Semantic Web. Suppose that we want to mark up a coffee cup so that a housemaid robot can find the cup and bring us coffee. We can markup the cup using current formal ontologies in the following manner:
The semantic web 165
[Object] Container Cup Coffee cup Jim’s Coffee cup [Measure-ont] LengthUnit 20 cm WeightUnit 200 Grams VolumeUnit 77.9 cm3
Let us call this the property model of ontologies. This markup in an RFID tag (or a retrieved file) identifies the cup and provides some of its properties, and the robot can detect this cup using its RFID reader. Like the barcode, this information allows the robot to short cut object recognition routines. However, to execute the robot’s action of filling the cup with coffee, this information is not enough. This is because the markup does not say what the functions supported by the cup are. To use the above information to execute its function, the robot has to know that coffee cups are used for filling coffee, and the procedure to fill a cup with coffee is to hold it open-side up under the coffee machine’s tap after switching it on. It also has to know that it should not hold the cup upside down once the coffee is filled. Finally, the robot has to know where it can find coffee cups and what actions to select from its repertoire of actions to use on the cup to fetch coffee. A much more useful informational structure for the robot would be: [ontology: cyc (container); object: coffee cup; properties: radius (3 cm), height (20 cm); volume(77.9 cm3); owner: Marvin; best_supported_function: get_coffee; supported_actions: grasp, lift, hold, fill; constraints: (this_side_up, touch_pot_lip); default_location: kitchen cupboard 1;]
Let us call this the affordance-model of ontologies. Here the self-description provided by the cup explicitly tells the robot what functions it supports, and leads to a lot less inference by the robot. Of course, the cup can be used for many other functions, such as measuring rice, holding candles, as a paperweight, etc. But these are not the prototypical functions of a cup, and to put in all these functions in the tag would make the structure-creation a never-ending exercise. It is better to put in the prototypical functions, and leave the rest to the resourcefulness of the agents encountering the cup, as is the case with humans. This description exploits the “selective representation” model of the world (Mandik and Clark, 2002), where an organism is considered to perceive and cognize a “relevant-to-my-lifestyle world, as opposed to a world-with all-itsperceptual-properties”. In this view, a rabbit seeing a looming shape does not
166 Sanjay Chandrasekharan
take an action by trying to categorize the shape using generalized propertybased queries like “does it have spots”, “is it red” or “does it have a snout”, etc. It uses self and action-oriented queries like “is it big”, “is it moving towards me”, “is it acting predator-like”, etc. This kind of querying is much more efficient computationally than property-based queries, because they avoid the assembly of properties into task-relevant structure, look-up of objects based on these properties, and selection of action. Action-oriented queries “chunk” many operations into one. This efficiency in computation is reflected in structures generated in nature by organisms, like markers and mating signals, which are tailored to particular actions, and are “action-oriented representations” (Clark, 1997). On the property side, the above tag just includes the properties that are required for the agent to execute the functions suggested/desired. This makes the job of the tag designer (the PML equivalent of the ontology designer) a lot easier, because function-based tagging means that a lot less information needs to be put in. Interestingly, this affordance-based tag structure has some positive side effects: 1. The agent can find out its location by sending out a query and collating the default locations returned by objects. If most of the objects reply to the query with “kitchen” as their default location, the agent can infer with a high probability it is in the kitchen. This is very similar to how humans infer their location during a power blackout — if the objects you encounter are kitchen objects, you are in the kitchen. A functional location is an assembly of functional objects. 2. States of objects can be inferred from locations (e.g., in sink means dirty). 3. Navigation route for a task can emerge from the objects involved in the task. For instance, cup Æ coffee pot Æ user for the coffee-bringing task. 4. Objects can be returned to their stable locations at the end of the day. None of these can be achieved using the first category-based generalized structure, which does not take into account the locations of the cups.
5. Cognitive load balancing The basic distinction between the two ontology designs above is the way the cognitive load is carved between the agent and the environment. Traditionally,
The semantic web 167
functions have been considered as something the agent brings to the world (or objects), and to execute its function the agent needed to know only the properties of the object. The properties are considered to be the object’s only contribution to the action. The agent can infer whether a given object supports the function, based on the properties the object “possesses”. In classical AI methodology, the agent had an internal knowledge-base of properties possessed by objects. Actions were selected based on matching properties in the knowledge base to the properties gleaned from objects. The Semantic Web effort moves away from this design, by shifting the agent’s knowledge base into the world, and storing it in objects (or URLs linked to objects) using self-descriptions. This design does away with the object-identification process and the inference involved in finding relationships between properties. However, the rest of the model remains the same, including the inference-driven model of action-selection and context-identification. This means other problems in agent design, like context identification, actionselection, etc., remain challenging. In contrast, the affordance approach shifts some parts of the action and context also to the object, as possibilities, or preferences, for action. If the actions the agent wants to execute are the same ones the object “affords”, there is a better “fit” between the world and the agent, and there is less cognitive overhead. The context embedded in the object allows the agent to make better decisions. From the point of view of making the environment work for agents, this kind of function-based ontologies is much more useful and efficient than general purpose, exhaustive ontologies, which require extensive search and inference on the part of the agent. The detailed categories and hierarchical relations encoded in such ontologies make them cumbersome to develop, and computationally intensive to use. In the affordance approach, actions/functions embedded in objects act as “lenses” that edit out unnecessary structure, both for agents and for tag designers. In the Semantic Web, function-oriented ontologies help users, agents and search engines to easily discover web pages that provide functions like buy, sell and bid, allowing them to distinguish between the information part of the web from the functional part. Action-oriented ontologies will allow the web to be split along action and knowledge domains, and allow for separate and detailed searches in both. Another advantage of putting actions/functions in web pages is that we can create a network of functions, by linking pages by function, instead of topic. This is an instance of a point made by Cox (1999), who argues that externalized
168 Sanjay Chandrasekharan
representations help unconnected cognitive systems interact. This is not possible if the function resides within agents. Another related advantage is the ability of agents to exploit the functions as paths (like in the coffee-bringing case), which is also not possible if the function resides within the agent. Also, as pointed out earlier, by providing focused ontology snippets that support desirable functions, we could also potentially keep out “eavesdroppers” like e-mail harvesting bots.
6. Meta-tag pluralism Now, to the question we started with. Should the Semantic Web effort be directed at such action-oriented structures exclusively? Interestingly enough, the action-oriented approach does not contradict the design of top-down, exhaustive ontologies, or other top-down category-based structures. This is because affordances and categories exist at different levels, and serve different purposes. Top-down ontologies and other structures, with their exhaustive listings of categories and relationships, establish a standard way of using terms. This leads to better interoperability. The view of function acting as a lens (allowing agents to focus on only the needed environment structure for action) is based on computational efficiency, and functional structuring of task environments. This is a higher level view, and does not contradict the standardization role of topdown structures like ontologies. The affordance approach just makes a distinction between low-level interoperability structures and high-level functionoriented structures. The design confusion comes from using a low-level structure, designed for interoperability, to enable actions. It can be made to work to some extent, but the design is not efficient. A rough analogy would be trying to develop a user manual using just labels of components and hierarchical ordering of components. The design implications of the affordance view are the following: –
–
The designer of an individual web page or an RFID tag should not have to put in (or refer to) an entire exhaustive ontology. The designer should be able to put in, or refer to, ontology snippets, focused to particular functions she wants the object to serve. This is similar to the bar-code design. She should be able to pick and choose these snippets in any way she wants to, without restrictions based on hierarchy or inheritance. Two, there needs to be ontologies for actions and standard locations, and these ontologies need to be accessible as snippets as well.
The semantic web 169
–
And finally, there needs to be graphical tools that allow users to mix and match these ontology snippets to create action-supporting tags, even networks of them. Note that the action-oriented approach makes such tagging more user-friendly, because it is hard for people to think of categories supporting actions, if they don’t know what actions they want to support. If users can pick and choose the actions they want to support, the selection of categories becomes easier. Action-orientation may thus help solve the vexing problem of designing user-friendly tools for generating meta-tags.
So the answer to the question we began with (Is the Semantic Web about affordance or knowledge representation?) is this: it is both. The property-based approach seeks to create a common vocabulary, and the affordance-based approach seeks to exploit the existence of this common vocabulary for higherlevel functions. The second approach advocates the use of elements of a common vocabulary as affordances in objects, i.e. as action-structures focused to functions. This is similar to the way humans (and other organisms) use ontologies — we access only action-relevant parts of perceived objects, and they are accessed in a customized manner to ‘fit’ tasks or functions. We are selective about the parts of the environment we attend to during a task, and we almost never use all the properties of an object to execute a task. The extension of this principle to the Web is quite natural if we take the stance of the Web as an action-mediating space. The inclusion of functionoriented structure as a “lens” in a self-describing object or website allows agents who need to perform that function to easily detect that structure, and only that structure, and use the structure to efficiently perform the task, or a set of interconnected tasks.
Notes * A significant portion of this chapter was developed while the author was pursuing a predoctoral fellowship with the Adaptive Behavior and Cognition (ABC) Group of the Max Planck Institute for Human Development, Berlin. I would like to thank Dr. Peter Todd of the ABC group for supporting and sharpening the ideas reported here. 1. I am not committed to the “environmental determinism” inherent in the Gibsonian view of affordances, where organisms are considered not to have any kind of mental representations.
170 Sanjay Chandrasekharan
2. Brooks questioned the traditional picture of artificial intelligence, where an idealized, representation of the world is stored within the agent and compared with the environment at run-time. The program executes actions based on these comparisons. Brooks observes that to port to a system, this notion of intelligence needs an objective world model provided by the programmer. This model would then be compared against “situations” and agents in the real world. Brooks has convincingly argued that this is not a robust way of building intelligence, because the world does not always fit the models made by the programmer. Instead, Brooks advocates a design where the designer considers the environment’s structure in detail and builds low-level perception-action pairs (like obstacle-run_away) based on that structure. The agent constantly queries the environment to gain information on the structure of the environment, and acts on the basis of that information. This is a more robust design. Unfortunately, in the process of developing this design framework, Brooks took a stance against representations, which has resulted in this design framework not being applied much in representational domains like the Web. 3. Interestingly, if we consider the Semantic Web effort as the most recent development in the history of processing natural language, it follows the three design levels outlined above. First, in the era of NLP, language was considered as something that could be processed by using centrally stored rules and representations (first design approach). Then came automatic classification, the idea of trying to understand the structure of a document based on its context and domain (environment), using pattern analysis and vocabularies (second approach). And now we have the Semantic Web, where the designer actively seeks to change the document environment, by providing structure to the document. 4. http://www.autoidlabs.mit.edu/index.htm 5. This example is based on SHOE. For details see: http://www.cs.umd.edu/users/hendler/ sciam/walkthru.html 6. In animal signaling, this kind of undesirable exploitation of informational structures by others is termed “eavesdropping”. For example, the songs male crickets sing to attract females are used by some parasitic flies to locate the male crickets and deposit their eggs on them. The flies kill the cricket when the eggs hatch. The problems the Semantic Web effort faces from signal exploitation and its more potent complement — dishonest signaling — are topics by themselves, but they are beyond the scope of this paper.
References Agre, P. & I. Horswill (1997). Lifeworld Analysis. Journal of Artificial Intelligence Research 6, 111–145. Beck, C. (2001). Chemical signal mobilizes reserve units. Max Planck Research, Science magazine of the Max Planck Society 4/2001, 62–63. Bradbury, J. W. & S. L. Vehrencamp (1998). Principles of Animal Communication. Sunderland, Mass: Sinauer Associates. Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence 47(1–3), 139–160.
The semantic web
Chandrasekharan S. & B. Esfandiari (2000). “Software Agents and Situatedness: Being Where?”. Proceedings of the Eleventh Mid-west conference on Artificial Intelligence and Cognitive Science, Menlo Park, CA, AAAI Press, pp. 29–32. Clark A. (1997). Being There: putting brain, body, and world together again. Cambridge, Mass.: MIT Press. Clark, G. K. (2001). The Politics of Schemas, available at: http://www.xml.com/pub/a/2001/ 01/31/politics.html Cox, R. (1999). Representation construction, externalised cognition and individual differences. Learning and Instruction (Special issue on learning with interactive graphical systems) 9, 343–363. Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin. Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition 5(2): 199–220. Hammond, K. J., T. M. Converse & J. W. Grass (1995). The stabilization of environments. Artificial Intelligence 72(1–2): 305–327. Heiling, A. M., M. E. Herberstein, & L. Chittka, (2003). Crab-spiders manipulate flower signals. Nature 421: 334. Hollan, J. D., E. L. Hutchins & D. Kirsh (2000). Distributed cognition: A new theoretical foundation for human-computer interaction research. ACM Transactions on HumanComputer Interaction 7(2) (2000), 174–196. Hutchins, E. (1995). How a cockpit remembers its speeds. Cognitive Science, 19, 265–288. Kirsh, D. & P. Maglio (1994). On distinguishing epistemic from pragmatic action. Cognitive Science 18, 513–549. Kirsh, D. (1995). The Intelligent Use of Space. Artificial Intelligence 73, 31–68. Kirsh D. (1995b). Complementary Strategies: Why we use our hands when we think. Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, pp. 212–217. Hillsdale, NJ: Lawrence Erlbaum. Kirsh, D. (1996). Adapting the Environment Instead of Oneself. Adaptive Behavior 4 (3/4), 415–452. Kirsh, D. (2001). The Context of Work. Human Computer Interaction 16, 305–322. Mandik, P. & A. Clark (2002). Selective Representing and World Making. Minds and Machines, 12, 383–395. Norman, D. A. (1993). Things That Make Us Smart. Addison-Wesley Publishing Company, Reading, MA. Norman, D. A. (1998). Affordances and Design, available at http://www.jnd.org/dn.mss/ affordances-and-design.html Reed, E. S. (1996). Encountering the World: Toward an Ecological Psychology. New York: Oxford University Press. Silberman, S. (2003), The Bacteria Whisperer. Wired Online, 11.04, April 2003. available at: http://www.wired.com/wired/archive/11.04/quorum.html Stopka, P. & D. W. Macdonald (2003). Way-marking behaviour: an aid to spatial navigation in the wood mouse (Apodemus sylvaticus). BMC Ecology, published online, http:// www.biomedcentral.com/1472-6785/3/3
171
172 Sanjay Chandrasekharan
Susi, T & T. Ziemke (2001). Social Cognition, Artefacts, and Stigmergy: A Comparative Analysis of Theoretical Frameworks for the Understanding of Artefact-mediated Collaborative Activity. Cognitive Systems Research 2(4), 273–290. Weir, A. S., J. Chappell & J. Kacelnik (2002). Shaping of Hooks in New Caledonian Crows. Science 297(5583), 981. Zahavi, A. & A. Zahavi (1997). The Handicap Principle: A missing piece of Darwin’s puzzle. Oxford: Oxford University Press.
Part II
Applications
Cognition and body image Hanan Abdulwahab El Ashegh and Roger Lindsay Department of Psychology, University of Cambridge / Department of Psychology, Oxford Brookes University
Introduction
1.
Natural technology and cognition
Though Cognitive Technology is a new discipline, human beings have been developing and employing cognitive technologies for thousands of years. When we consider non-natural information processing devices such as computers, it is generally accepted that a software program enabling some computation to be executed is a functional artefact and hence falls within the domain of technology. If the program embodies knowledge or beliefs, as most non-trivial programmes do, then the technology involved is cognitive. Now, suppose that instead of being developed to run on a computer, a ‘program’, consisting of a set of procedures for generating a specific set of outcomes, was developed to allow humans to accomplish a particular mental function: for example, multiplying numbers, or remembering a list of words exceeding the span of immediate memory. The resulting program is no less a contribution to technology, and no less cognitive, just because the relevant procedures were developed to be executed by a human information processing device. There seems to be no principled reason for withholding the term cognitive technology when procedures enabling specific functions to be computed more efficiently or effectively are developed to constrain human information processing operations, rather than those of inorganic devices. Accordingly, we assume from the outset that many contributions to cognitive technology, including conspicuously, the development of spoken language and arithmetic, consist of procedures developed by humans to enhance their own mental operations. Meenan and Lindsay (2002) introduce the term natural technology to describe those cases in which no physical artefact is involved beyond the mental apparatus itself, for example,
176 Hanan Abdulwahab El Ashegh and Roger Lindsay
knowing how to multiply numbers, as opposed to getting the same result by using a calculator: “Technology is the use of knowledge with the intention of bringing about positive change in the world. Cognitive technology is centrally concerned with those uses of knowledge which modify mental competencies and capabilities. Sometimes mental capabilities are modified through the use of physical objects external to the agent: writing, and the use of calculators are examples. In other cases, for example in the use of speech, or mnemonic systems to improve memory, cognitive artefacts are used to modify the user’s mind, or the minds and behaviour of other agents without the use of physical mediation. We shall refer to cases of the latter sort as examples of natural technology. Natural technology includes the deliberate use of action sequences to modify the behaviour of other people: the domain of social behaviour.” (Meenan and Lindsay, 2002, pp. 234/5)
Natural technology also includes cases in which individual human physical competence is extended, for example by developing teachable skills such as singing, juggling, or using an abacus. It excludes evolved skills such as eating and walking that do not need to be taught formally. Such competences do not fall within the domain of technology because the procedures that underlie them were not intentionally developed (or fortuitously discovered, but intentionally applied) to achieve an explicit goal. Evolution has no goals and hence produces no technological artefacts. Goldberg (2001) has made similar claims about the relationship between culturally transmitted software and brain processes: “The whole history of human civilisation has been characterised by a relative shift of the cognitive emphasis from the right hemisphere to the left hemisphere owing to the accumulation of ready-made ‘templates’ of various kinds. These cognitive templates are stored externally through various cultural means, including language, and are internalised by individuals in the course of learning as ‘cognitive prefabricates’ …” (Goldberg 2001, p. 52)
Despite more than half a century of startlingly rapid progress in non-natural cognitive technology (e.g., programming computers), humans remain capable of many learned cognitive achievements that still cannot be emulated by machines. The study of natural cognitive artefacts thus offers rich possibilities for transferring technology from the natural to the non-natural domain. Perhaps more importantly, treating learned human mental competencies as “portable” cognitive artefacts promotes a theoretical and methodological stance which presupposes that they are all analysable in terms of information processing
Cognition and body image 177
resources and procedures. Perhaps the major obstacle to this approach, is a formidable difficulty characterised as the decoupling or unitisation problem. Essentially, this is the problem of identifying the basic components that make up human cognition. Does language comprehension result from the application of a perception module? Or the application of a general intelligence module? Or is it the result of applying many special purpose modules, such as phoneme identification, lexeme identification, syntactic analysis and so on? The term “module” has fallen into some disrepute, at least partly because Fodor’s (1983) identification of modules with “faculties” such as perception and language, has not proved to be helpful. Cognitive modularity has also often been identified with symbolic sequential stage models, which in turn are widely believed to have been fatally undermined by the development of connectionist alternatives. The latter belief is simply mistaken. The claim that there are functionally specialised processing units with relatively wide band internal communication and relatively narrow band communication with other units does not imply anything about the nature of the processing operations within a unit. Moreoever, there is now a tremendous accumulation of neuropsychological evidence demonstrating selective loss of specific competencies whilst equally complex, and apparently closely related abilities are spared (for example, loss of the ability to recognise human faces whilst object recognition remains normal, and vice versa). This evidence irresistibly suggests that at least in part, the human cerebral cortex and the information processing operations it carries out, are functionally organised. That is to say, specific regions of the brain are specialised for particular tasks. The trouble is that complex information processing tasks can be accomplished in many different ways, and in most performance domains psychological evidence is too weak to discriminate between the possibilities. One of the main challenges within this research framework at present is to find sets of cognitive operations that map onto the functional organisation of the cortex. Successful mappings confer validity on both neuropsychological theories of brain function and information processing analyses of behaviour. At a deeper level, they support the realist view that the brain actually does process information in a manner similar to non-natural artefacts such as computers. A strong case can be made for a function-specific processing module that is responsible for generating cognitive representations of the physical self. It is easy to assume that human beings have a generic capability to represent entities in the physical world by constructing symbolic models based upon perceptual information. Using essentially the same mechanism, one instant a person might
178 Hanan Abdulwahab El Ashegh and Roger Lindsay
symbolically model a tree or a dog, and the next, another person or themselves. There is a good deal of persuasive evidence that this account is false. At the hard end of the evidence spectrum, clinical phenomena such as anosognosia and phantom limb experiences demonstrate that people frequently maintain cognitive representations of their own body that are grossly inaccurate. In anosognosia, patients may insist that they have full control over a hand or leg, even though it is actually paralysed and moribund as a result of brain damage. In phantom limb disorder, patients attribute pain to a limb that has long since been removed as a result of accident or surgery. These cognitive failures are highly specific to physical self-image: sufferers do not show general perceptual failure, nor widespread faulty inference, nor yet erroneous judgements about the intactness of other people. There is also a plethora of softer evidence. People often seem to “misperceive” their own physical characteristics, denying with apparent honesty and real indignation, the evident fact that they are overweight; or, conversely, subjecting themselves to harsh dietary and exercise regimes to rid themselves of “excessive” weight. Others might claim that an entirely unexceptionable nose is in truth, intolerably large, or that breasts or buttocks are shamefully small, or embarrassing in their magnitude. Again, clinical syndromes have been identified: Eating disorders as anorexia nervosa are commonly claimed to involve misperception of one’s own body as fat, even when it may be chronically undernourished and when this is shockingly evident to all except the sufferer.1 Such conditions are highly specific to judgements about the physical self, and do not seem to result from any general impairment in ability to judge the dimensions of objects or other people. Another clinical syndrome: body dysmorphia applies when people unreasonably regard a feature of their body as shameful or disfiguring.2
2. Body schema and body image In spite of the central part played by physical conceptions of self in determining the quality of people’s lives and even their mental health, theoretical understanding of the processes involved is poorly developed and the relevant academic literature is shot through with confusion. The most striking manifestation of this is the co-existence of two rival constructs each supposed to underpin a person’s conception of physical self. One of these models: body schema3 has been developed by neurologists to explain the consequences of brain damage; the other: body image4 is a construct primarily used to explain psychological
Cognition and body image 179
disorders that have own-body dissatisfaction as a central feature. Whilst body schema is usually considered to be a perceptual model of the body, body image is generally believed to be primarily a cognitive model that possesses social and emotional components. Outside neuropsychology ‘body image’ has come to be used as part of a loose characterisation of the ego or social self (Bower, 1977), but such formulations are unfortunately not articulated with sufficient explicitness to be useful. It seems that there is widespread agreement in the neuropsychological literature that we experience and describe our bodies with the assistance of a multidimensional cognitive construct general designated the body schema (this agreement is not by any means universal).5 When psychological rather than neurological mechanisms (such as emotions and values) are the focus of attention, researchers tend to implicate a second multidimensional cognitive construct known as the body image (Gallagher, 1986; Fisher, 1990; Gallagher and Cole, 1995). The relationship between these two constructs has been little explored. Gallagher (1986) analysed recent psychological studies on the relationship between body image and body schema. He concluded that the operations of the body schema sometimes place constraints on intentional consciousness. In particular, Gallagher suggests that changes in various aspects of body schema can affect the way subjects perceive their own bodies, that is: change in body schema can cause change in body image. Gallagher’s conclusion inevitably raises the question of whether body schema and body image are truly independent at all. Evidence that one construct causally affects the other is prima facie also evidence that there are not two constructs but only one. This suspicion is reinforced by the fact that body image is invariably used in the absence of any clear definition. In an attempt at clarifying matters, Altabe and Thompson (1996) suggested that “one concept of body image is an internalized view of one’s appearance that drives behaviour and influences information processing” (Altabe and Thompson 1996, p. 190). This definition assimilates body image to the more widely accepted concept of a ‘cognitive schema’. Thompson and Altabe report a series of studies, the outcome of which they take to support the idea that body image cognitions act like schemas. Slade (1994) has argued that an individual’s perception of their own body is highly influenced by cognitive, affective, attitudinal, and other variables. Slade proposes a general schematic model of body image as a loose mental representation of the body that is influenced by at least seven sets of factors:
180 Hanan Abdulwahab El Ashegh and Roger Lindsay
“These sets are the history of sensory input to body experience, the history of weight change fluctuation, cultural and social norms, individual attitudes to weight and shape, cognitive and affective variables, individual psychopathology, and biological variables.” (Slade, 1994, p. 501)
The body schema and body image constructs thus seem to have long coexisted as rival constructs supported by overlapping, but distinct bodies of evidence. ‘Body schema’ tends to be preferred by researchers who are interested in brain mechanisms, but not in psychological processes; ‘body image’ tends to be employed when evaluative processes are of most interest. There seems to be an opportunity to integrate these two constructs in a productive way. Specifically, we propose that it is possible to explain phenomena in both domains with greater theoretical economy. Instead of two distinct cognitive representations of self, one underpinning perception and action, the other underlying attitudes to self and social behaviour, we propose that there is only one internal model of physical self, but that this is operated upon by an evaluative process. We now turn to the clinical literature on disordered experience of body.
3. A unified model of body image representation Two theoretical constructs have been used to provide a basis for mental representations of physical self. Body schema is a construct supported by hard neurological evidence that can explain perceptual and motor dysfunctions caused by brain pathology, but that says little or nothing about psychological dysfunctions involving self-representation. Body image incorporates the judgemental and evaluative information essential to explain psychopathologies such as eating disorders and Body Dysmorphic Disorder (BDD) but has been specified only in vague terms and has relatively little empirical support. We have already suggested above that these two constructs are ripe for integration. The neurological evidence strongly supports the suggestion that the human brain contains a specific representation of physical self that can be directly affected by insults to the cortex. The psychological evidence suggests that ownbody representations can be dramatically affected by belief and judgement. Anorexics do not appear to perceive their own bodies accurately, but persist in believing they are overweight despite this perceptual evidence. And not all BDD sufferers see their noses or buttocks as contoured within normal limits, but insist nonetheless that they are grossly disfigured. Rather, in both of these classes of disorder the evidence is compelling that distortions of judgement accompanied by
Cognition and body image
obsessive preoccupation with specific bodily features can result in a modification of the manner in which those bodily features are mentally represented. Evidence of this kind seems to suggest that in some individuals, judgement of their own bodily dimensions is defective. This impairment of judgement might result from distortion of a normal magnitude estimation process, such as might be caused by a calibration error, or the basic process may operate abnormally, but estimation of one quantity such as size, may be inappropriately affected by another (e.g., self-worth). There does seem to be a strong case for arguing that judgements about the physical self involve a special-purpose body image generator, the operation of which plays an essential part in the cognitive processes underlying body image judgements, and impairment of which helps to explain body image disturbances. More speculatively, it is possible that there are two distinct sub-types of body image disorder, one arising from a negative view of an undistorted internal representation of body, associated with internalized cultural norms, depression, and alleviated by drugs that are effective for the latter condition (see Note 2). The second subtype of body image disorder seems more likely to result from a faulty cognitive representation of body. This type of disorder is expected to be independent of cultural stereotypes and depressive illness, but may be related to defective perceptual and cognitive processes. Investigations of conditions such as anosagnosia suggest that knowledge of bodily impairments resulting from neurological lesions may not be cognitively available, despite such impairments being grossly apparent to others. This in turn seems to imply that cognitive awareness of body does not result from direct perception, but is mediated by an internal representational process that may fail to register impairment. This hypothetical system for representing body will be referred to as the Body Image Generator (BIG), an internal cognitive mechanism that underlies an individual’s view of his/her physical appearance and provides the images towards which evaluations of the physical self are directed. It is important to remember throughout that we are concerned here with cases of pathological belief that one’s body is defective, not those cases in which a person must emotionally adjust to a real bodily defect such as limb loss, obesity or facial disfigurement. The assumption that there is a dedicated cognitive system that has a distinct and independent neurological locus allows the two subtypes of BID distinguished above to be more clearly differentiated. i.
Type 1 BDD: actual body parameters are normal, and BIG generates an accurate image of the body, but individuals judge their body image as defective because of inappropriate norms or the application of negative
181
182 Hanan Abdulwahab El Ashegh and Roger Lindsay
schemas to themselves. ii. Type 2 BDD: actual body parameters are normal, but BIG generates an image that is anomalous in some way (e.g., excessively large in some dimensions). Individuals with this impairment may have normal bodies, and normal evaluation criteria, but still perceive their body as defective. It seems probable that these two cases are not completely independent, in that judgements about one’s body image may be capable of causally modifying the image (Gallagher, 1986; Gallagher and Cole, 1995). This capability might have many psychological advantages, for example, it would allow people to mitigate the distress caused by undesirable physical characteristics that are outside their control. In this way, the disfigured, disabled or aesthetically unfortunate could perhaps avoid or reduce the debilitating effects of depression. Though there is considerable evidence to support the claim that negative attitudes to one’s self can lead to negative changes in body image, there is presently no evidence that positive attitudes to self can enhance cognitive representations of one’s physical self. However, the absence of such evidence is hardly surprising: individuals with an excessively positive image of their bodies are hardly likely to turn up in the psychiatric consulting room, as with BDD. The possibility of representations of the physical self being modified via psychological processes creates a clear functional rationale for isolating self-representations from the mechanisms responsible for producing representations of other features of the physical world. It would be unfortunate for instance, if perceiving oneself as thinner to avoid psychological discomfort, resulted in universal misperception of breadth. There is one further implication: we have noted that negative attitudes to self are largely, if not entirely, culturally mediated. As cultural factors can only begin to operate after considerable learning has already taken place, it seems likely that BIG is not hardwired in from birth but a cognitive construct that incorporates cultural values and assumptions. Myers and Biocca (1992) refer to the “elastic body image” and have reported research suggesting that a female’s body image is constructed with reference to a number of source models, the socially represented ideal body, the individual’s internalized ideal body, the present body image and the object body image. Myers and Biocca’s research on female university students aged concludes that an individual’s body shape perception can be changed with less than 30 minutes exposure to television. They also found that young females had a tendency to overestimate their body size, which seemed to indicate that they may have internalized the idealized body image
Cognition and body image
presented by advertising. The evidence that BIG is cognitively malleable and adapts to cognitive context, seems to justify the assumption that it falls under the characterisation of natural technology presented at the beginning of the present paper. Finally, though there might be psychological benefits from enhancing representations of one’s physical self, there is an inevitable Faustian downside associated with a mechanism that offers this possibility. Negative attitudes to self will be equally capable of producing negative distortions in body image. There is apparently an increase in people suffering from negative body image at present. For example, Gordon (1992) reports that: “A survey of 33,000 women, of varying age and employment levels in the early 1980’s revealed that 75% felt that they were too fat, even though according to conservative weight tables only 25% were actually overweight. The particular body parts that caused the most distress were the thighs, hips and stomach. When asked whether they would be happiest a) losing weight b) hearing from an old friend c) a date with a man you admire or d) success at work; 42% indicated losing weight, 21% indicated dating and 22% indicated work success.” (Gordon, 1992, p. 71)
Another survey of over 1000 high school students in Berkeley, California revealed that “56% of 12th grade girls considered themselves to be overweight, whereas objective measure revealed only 25% to be moderately or extremely overweight.” (Gordon, 1992, p. 80). Gordon (1992, p. 32) comments that: “it is not surprising that disorders of body image, in which people have difficulties seeing themselves accurately, have become rampant”. The present review argues that in making this statement Gordon is confusing the two-hypothesized subtypes of BDD. It is not surprising, in an era of mass transmission of images of human perfection, that people might tend to judge themselves accurately, but why they should fail to see themselves accurately requires further explanation. In the investigation described below, it is intended to measure the accuracy with which people judge various aspects of their own body and investigate whether such judgements are correlated with general intelligence and more specific spatial and magnitude judgements. It is anticipated that the data will allow us to decide whether defective representations of body result from generic cognitive inadequacies, or might with more plausibility be attributed to a special purpose module such as BIG. In the following section a full account of the methodology employed to test this hypotheses is reported.
183
184 Hanan Abdulwahab El Ashegh and Roger Lindsay
4. Method 4.1 Participants The sample was an opportunity sample in that, though a truly random sampling procedure was not employed, the method of selecting participants does not introduce biases relevant to the hypotheses under investigation. Fifty males and fifty females, ranging from the age of 18–55 participated in the study. 70 of the participants were undergraduate or graduate students at Oxford Brookes University. The remaining 30 participants were drawn from people who regularly work out at gymnasiums or were acquaintances of participants already in the experiment. 4.2 Materials The psychometric tests used are listed and described below. A tripod-mounted Olympus CL 840 1.3 megapixel digital camera was used to produce the photographic images on which size judgements were based. Participants made their judgements whilst viewing digital images displayed on a laptop computer. The objects used in this experiment were a table and a chair also digitally displayed on a laptop screen. To allow quantification of magnitude judgements, a physical measuring tool (the Estimation Caliper) was designed and constructed specifically for the experiment. The Estimation Caliper and examples of objects used
Figure 1.The Estimation Caliper.
Cognition and body image
Figure 2.Digital image of table.
Figure 3.Digital image of chair.
in the study are illustrated in Figures 1–3 below. All photographic images were taken under controlled conditions (e.g., standard lighting, standard background, standard distance of 3 metres from the camera, and with the tripod/ camera always set at a standard distance of 133 centimetres from the floor). 4.3 Psychometric tests Psychometric measures were used to allow quantification of a number of independent variables that the research literature suggests might be associated with body-size misjudgement. Participants completed 5 questionnaires and an interview-style VOSP Battery: – – – – – –
VOSP BDI BDDQ-R STAI-X EDI EPI
(Visual Object/Space Perception Battery) (Beck Depression Inventory) (Body Dysmorphic Disorder Questionnaire (revised) (State-Trait Anxiety Inventory) (Eating Disorder Inventory) (Eysenck Personality Inventory)
The Psychometric measures are described more fully below. 4.3.1 The Visual Object/Space Perception Battery (VOSP) The VOSP was incorporated into the study to provide assessments of the general competence and accuracy of participants on perceptual tasks. The VOSP includes the following sub-tests:
185
186 Hanan Abdulwahab El Ashegh and Roger Lindsay
1. 2. 3. 4. 5.
Shape Detection Incomplete Letters Silhouettes Object Decision Dot Counting
6. 7. 8. 9.
Progressive Silhouettes Position Discrimination Number Location Cube Analysis
Warrington and James (1991), who developed the VOSP, report that: “The (VOSP) Visual Object and Space Perception Battery consist of nine tests each designed to assess a particular aspect of object or space perception, while minimizing the involvement of other cognitive skills. The VOSP will enable an assessor to compare the scores of a subject with those of a normal control sample and those obtained by patients with right- and left-cerebral lesions. Although a theoretical issue was the original motivation for each of these tests, it was their pragmatic strength in terms of their selectivity and sensitivity that determined their selection for inclusion in the battery. They are all untimed and should be administered at a pace suitable to the individual patient. The tests can be administered singly, in groups, or as a whole battery; and, apart from the initial screening test, in any order. This battery of eight visual object and space perception tests (VOSP) has been developed, validated and standardized in the Psychology Department of the National Hospital for Neurology and Neurosurgery, Queen Square, London. The majority of these tests require very simple responses. Each was devised to focus on one component of visual perception, while minimizing the involvement of other cognitive skills. The tests are all untimed and should be administered at a pace suitable to the individual patient.” (See Appendix F.) (Summary information taken from Warrington and James, 1991)
4.3.1.1 VOSP subtest 1: Shape Detection. The test stimuli are random patterns, on half of which a degraded ‘X’ is superimposed. The subject is required to judge whether the ‘X’ is present or absent. The 20 stimulus items are preceded by two practice items (A&B), which are used to explain the tasks. Warrington et al. (1991) advise that if participants score below 15 on this subtest, they should be considered to have failed this screening test and it would therefore be inappropriate to administer the remainder of the VOSP battery (Warrington and James, 1991). 4.3.1.2 VOSP subtest 2: Incomplete letters. Neuropsychological studies have established that patients with right-hemisphere lesions may have selective deficit reading degraded letters. Incomplete letters were constructed by photographing a letter through a random mask so that either 30 percent or 70 per cent of the letter was obliterated. The test stimuli are 20 stimulus letters
Cognition and body image 187
(degraded by 70 per cent) and the two practice items F and B (degraded by 30 per cent) that are used to explain the task. Participant are shown the practice items and asked to name them. The test is abandoned if the participant is unable to name or identify the practice items. Participants are then told that the remaining capital letters are rather more incomplete and asked to name or identify each one. The total number correct (maximum = 20) is recorded (Warrington and James, 1991). 4.3.1.3 VOSP subtest 3: Silhouettes. This subtest is based on findings that recognition of common objects from an unusual view may be selectively impaired in patients with lesions in posterior regions of the right hemisphere. Participants are shown animal silhouettes, told that they are drawings of an animal, and asked to name them (e.g., dog, camel). The silhouettes were constructed from outline drawings of each object rotated through varying degrees from the lateral axis. The test was constructed to be of graded difficulty ranging from very easy silhouettes that could be identified by all participants to difficult silhouettes that only a proportion of the normal sample identified. The tests consist of 15 silhouette drawings of animals and 15 silhouette drawings of inanimate objects. The silhouettes of these two sets are arranged in order of difficulty. The total number of silhouettes named or identified (maximum = 30) is recorded as the score (Warrington and James, 1991). 4.3.1.4 VOSP subtest 4: Object decision. The origin of the Object Decision test was the finding that patients with right hemisphere lesions had a significant selective deficit in the ‘selection’ of the real object when presented with a two dimensional silhouette drawing of an object together with three nonsense shapes (Warrington and James, 1991). The test stimuli consisted of two dimensional silhouette drawings of objects constructed from the original 3D shadow images by tracing the projected outline of the object at an angle of rotation at which approximately 75% of a normal control group could identify it. Distracter items were constructed to be similar object-like shapes but are in fact entirely imaginary. The object decision test consists of 20 arrays, each of which displays one real two-dimensional object together with three distracter items. Participants must point to the real object. The number of correct choices (maximum = 20) is recorded (Warrington and James, 1991). 4.3.1.5 VOSP subtest 5: Progressive silhouettes. A series of 10 silhouettes was constructed by varying the angle of view from 90 degrees rotation to 0 degrees
188 Hanan Abdulwahab El Ashegh and Roger Lindsay
rotation of the lateral axis. The test consists of two series, a gun and a trumpet. The first silhouette of the series is presented and used to explain the task. Silhouette drawings of an object are presented which become progressively easier to identify. With each new silhouette version the participant is asked: to name the object. The number of trials required to identify each object are summed and recorded as the score (maximum trials=10+10) (Warrington and James, 1991). 4.3.1.6 VOSP subtest 6: Dot counting. The test stimuli consist of arrays of black dots on a white card. There are two arrays each of five, six, seven, eight and nine dots and each array is randomly arranged. The maximum distance of a dot from the centre of a card was 120mm and the minimum distance between dots was 10mm. The first array is used to explain the task and to confirm (Warrington and James, 1991). 4.3.1.7 VOSP subtest 7: Position discrimination. Each test stimulus consists of two adjacent horizontal squares, one with a black dot (5mm) printed exactly in the centre and one with a black dot just ‘off’ centre. In each of the 20 stimuli the ‘off’ centre dot is in a different position within the square, in ten stimuli the centre dot is in the left square and in ten in the right square. The first test stimulus is used to explain the task. “One of these two dots is exactly in the centre of the square. I want you to point to the dot that is in the centre. If you are not certain I would like you to guess”. Subjects who consistently choose the square to the left or the right should be reminded to “Look at both squares before deciding”. The number of correct choices is recorded (Warrington and James, 1991). 4.3.1.8 VOSP subtest 8: Number location. The ten stimuli consist of two squares (62mm × 62mm), one above the other with a small gap between them. The top square contains randomly placed numbers (1–9) and the bottom square a single black dot corresponding to the position of the numbers. The position of the dot is different in each of the stimulus cards and there are four different number arrays. The task is to identify the number that corresponds with the position of the dot. There are two practice stimulus cards that are used to explain the task. “One of the numbers in the square corresponds with the position of the dot in the square; tell me the number that matches the position of the dot”. If there is an error on the first card, the subject should be told the correct number before proceeding to the second practice card and the test should be abandoned if the subject fails to get either practice card at least approximately correct. The ten
Cognition and body image 189
test cards are presented and the number selected is noted and the total number of correct responses is recorded (maximum=10) (Warrington and James, 1991). 4.3.1.9 VOSP subtest 9: Cube analysis. The test stimuli consist of black outline representations of a 3D arrangement of square bricks. There are two practice items, which are used to explain the task and ten stimuli. The two practice stimuli are representations of three bricks. The ten test stimuli are graded in difficulty by increasing the number of bricks from five up to 12 and by including hidden bricks. The subject is told: “This is a drawing of some solid bricks; how many solid bricks are represented in the drawing?” If an error is made on either of the practice items the task is explained again. The task is abandoned if the subject is unable to count the bricks in both practice items. The ten stimulus items are presented and on the first occasion, if an omission error is made to a ‘hidden’ brick, the subject is asked to “Try again and remember the bricks that are underneath the other bricks”. The subjects response is noted and the total number of correct ‘counts’ (maximum=10) is recorded (Warrington and James, 1991). 4.3.2 The Eysenck Personality Inventory (EPI) The EPI was designed around a theory of personality developed by Hans Eysenck (Eysenck and Eysenck, 1964). This is a nomothetic theory, meaning that it is applicable to all people and more or less rigorously testable. Eysenck believed that all people could be placed on a pair of independent continua, dealing respectively with Extraversion-Introversion and Neuroticism-Stability. He argued that the basis of personality was genetic, and, specifically, that the degree of Extraversion depended crucially upon the level of arousal in the Ascending Reticular Activating System of the brain. Eysenck’s personality inventory can be taken as an example of a personality questionnaire. It is a psychometric test; it aims to measure particular psychological characteristics. In this case it is measuring extroversion and neuroticism, the two dimensions which Eysenck believed to be sufficient to describe an individual’s personality. 4.3.3 The Eating Disorders Inventory (EDI) The Eating Disorder Inventory (EDI — Garner, Olmstead and Polivy, 1983; Garner et al., 2003) is a 64-item, 6-point forced-choice inventory assessing several behavioural and psychological traits common in two eating disorders, bulimia and anorexia nervosa. The EDI, a self-report measure, may be utilized as a screening device, outcome measure, or part of typological research. It is not reported to be a diagnostic test for anorexia nervosa or bulimia; rather, it is
190 Hanan Abdulwahab El Ashegh and Roger Lindsay
designed as a self-report measure of “psychological and behavioural traits common in anorexia nervosa and bulimia.” (Eysenck, ibid.). It can be administered to a population of ages 12 and over. There are 8 subscale scores they are: drive for thinness, bulimia, body dissatisfaction, ineffectiveness, perfectionism, interpersonal distrust, interoceptive awareness, and maturity fears. The EDI is recommended to delineate subtypes of anorexia nervosa in clinical or research settings; and the test average administration time is (15–25) minutes (Garner, Olmstead and Polivy, 1983; Garner et al., 2003). 4.3.4 The State-Trait Anxiety Inventory (STAI-X) The State-Trait Anxiety Inventory (STAI) was designed as a research instrument for the study of anxiety in adults (Spielberger, 1970). It is a 40-item selfreport assessment device, which includes separate measures of state and trait anxiety. According to the author, state anxiety reflects a “transitory emotional state or condition of the human organism that is characterized by subjective, consciously perceived feelings of tension and apprehension, and heightened autonomic nervous system activity.” State anxiety may fluctuate over time and can vary in intensity. In contrast, trait anxiety denotes “relatively stable individual differences in anxiety proneness . . .” and refers to a general tendency to respond with anxiety to perceived threats in the environment. Scores on the STAI have a direct interpretation: high scores on their respective scales mean more trait or state anxiety and low scores mean less. Both percentile ranks and standard (T) scores are available for male and female working adults in three age groups (19–39, 40–49, 50–69), and for male and female high school and college students, male military recruits, male neuropsychiatric patients, male medical patients, and male prison inmates (Spielberger et al., 1983). 4.3.5 The Body Dysmorphic Disorder Questionnaire — revised (BDDQ-R) BDDQ questions are in a self-report format. The BDDQ mirrors the DSM-IV diagnostic criteria for BDD and scores indicate whether these criteria are met by particular patients. The BDDQ assesses whether BDD is likely to be present. (See Appendix III for an example of a completed BDDQ.) The BDDQ can suggest that BDD is present but cannot provide give a definitive diagnosis. The final diagnosis must be determined by a trained clinician in a face-to-face interview (Phillips, 1996b). 4.3.6 The Beck Depression Inventory (BDI) The original version of the BDI was introduced by Beck, Ward, Mendelson, Mock
Cognition and body image
and Erbaugh in 1961. The BDI was revised in 1971 (Groth-Marnat, 1990). The original and revised versions have been found to be highly correlated (Lightfoot and Oliver, 1985, cited in Groth-Marnat, 1990). The BDI is a 21 item self-report rating inventory measuring characteristic attitudes and symptoms of depression. Participants require a fifth — sixth grade reading age to adequately understand BDI questions and the inventory takes approximately10 minutes to complete (Groth-Marnat, 1990). The content of the BDI was obtained by consensus from clinicians regarding symptoms of depressed patients (Beck et al., 1961). The revised BDI items are consistent with six of the nine DSM-111 categories for the diagnosis of depression (Groth-Marnat, 1990). Each item provides an assessment of a specific component of depression. The 21 components are: 1. Sadness; 2. Pessimism; 3. Sense of failure; 4. Social withdrawal; 5. Guilt; 6. Expectation of punishment; 7. Dislike of self; 8. Self Accusation; 9. Suicidal ideation; 10. Episodes of crying; 11. Indecisiveness; 12. Change in body image; 13. Retardation; 14. Insomnia; 15. Fatigability; 16. Loss of appetite; 17. Dislike of self; 18. Expectation of punishment; 19. Loss of Weight; 20. Loss of Weight; 21. Low level of energy. Scores are summed over the twenty-one questions to obtain the total. The highest score on each of the twenty-one questions is three, the highest possible total for the whole test is sixty-three. The lowest possible score for the whole test is zero. One score is added per question (the highest rated if more than one option is circled)
Interpretation of depression scores 05–09 Normal fluctuations in affect 10–18 Mild to moderate depression 19–29 Moderate to severe depression 30–63 Severe depression Below 4 = Possible denial of depression, faking good; this is below usual scores for normals. Over 40 = This is significantly above even severely depressed persons, suggesting possible exaggeration of depression; possibly characteristic of histrionic or borderline personality disorders. Significant levels of depression are still possible (Groth-Marnat, 1990).
191
192 Hanan Abdulwahab El Ashegh and Roger Lindsay
4.4 Procedure To impose comparability between judgements of own versus other bodies, all judgements of body size were made using digital photographs taken under standard conditions of illumination and from a standard position and distance. The three classes of object photographed (own-body, other-body, objects) were intended to permit inferences about the specificity of underlying cognitive mechanisms (e.g., are all objects, including human bodies, misjudged in the same way, or do human body judgements differ from judgements of nonhuman objects? Does the accuracy of other-body judgements differ from ownbody judgements etc.). Digital photographs were transferred to a laptop PC and later displayed on the computer screen for participants to make size estimations. To eliminate errors associated with verbal processes or the language used to express judgement, size estimations were made using a specially constructed caliper, which was purpose-built, for the present study. There were two sessions each of about 45 minutes duration. In session 1 participants first had their photographs taken in the standard poses required by the design of the study. Physical measurements such as height and weight were then recorded and finally participants were then asked to complete the psychometric tests. A second testing session was then scheduled to occur within 7 days of session 1. In session 2, participants viewed their own images on a laptop screen and were asked to make judgements of physical size using the estimation caliper. They were then asked to look at digital images of another person, and 2 objects and asked to make size judgements of those images. Instructions for sessions 1 and 2 are presented below. 4.4.1 Session 1 Each participant was tested individually. Participants were asked to stand with their toes on a line of tape stuck to the floor at 250 cm distance from a digital camera. The line was located so as to ensure that participants were photographed against a plain background with no features that could be used as cues to size. The position of the digital camera was also marked by a tape line on the floor at 250 cm distance from the mark used to position participants. The digital camera was mounted and fixed on a tripod 150 cm from the floor. The room was illuminated by fluorescent strip lights supplemented by natural light from windows set in the wall behind the camera. The digital camera’s flash facility was used for every photograph. Participants were photographed from two positions facing the camera (front) and in profile (i.e., facing 90 degrees to
Cognition and body image 193
Shoulders Front & Side Waist Front & Side
Thighs Front & Side
Figure 4.Digital images of participant in standard poses from front and side.
the camera: side). Examples of standard photographs of participants appear in Figure 4. 4.4.1.1 Initial instructions to participants. The following instructions were read aloud to each participant: “Please stand facing the camera behind the line on the floor labelled ‘standbehind this line’. One picture of you will be taken facing the camera and another facing to the side. These pictures will be saved onto a computer and used by you later to make body size estimates of yourself. The estimates will be those of your shoulders, waist, and thighs. Then you will be asked to look at images of another person and to make the same estimates of them from the front view and side view of ‘shoulders, waist and thighs. Following this you will be asked to look at digital images of two objects, one of a table and one of a chair and asked to estimate the width and length of both. All these judgements will be made using a measuring caliper, which the experimenter will show you after taking your two photos. If you have any question please feel free to ask. If you are uncomfortable with any of these procedures you are free to pull out of the experiment at any time”. After reading these instructions photographs of the participant from front and side were taken.”
4.4.1.2 Body height and weight. Height was measured by the experimenter and scales were used to record body weight. From weight and height together, the height: weight ratio can be calculated which is a reasonable indicator of whether an individual is over/under weight for his/her age, gender and height.
194 Hanan Abdulwahab El Ashegh and Roger Lindsay
4.4.1.3 Psychometric tests. Participants were seated and asked to complete the five questionnaires and the VOSP interview-style psychometric test. Participants were given standard instructions on how to complete the VOSP before attempting this. 4.4.2 Session 2 4.4.2.1 Own body measurement: Procedure and instructions. Participants were shown on a computer screen the standard digital images of themselves taken in Session 1 and asked to make estimates of body width across shoulders, waist and thighs, first as viewed from the front, and then as viewed from the side. To eliminate problems caused by language and measurement terminology, all estimates were made by adjusting the caliper to match estimated body size and reading off the measurements in centimetres. Instructions “Please hold the caliper at arms length in front of your body at about the height of your shoulders, and look at the digital images on the computer screen in front of you. You are standing 1 exactly metre away from the screen. Please do not move forward or shift your upper body forward, as this will distort the standard arrangements that have been set for all participants. Try to maintain this same distance throughout the experiment. You are going to be making judgements about the width of your own shoulders, waist and thighs as they appear in the image. Please move the caliper to the width you believe to be correct for these three body dimensions and then tell the experimenter what measurement in centimetres it comes out to on the caliper. After making all three estimates from the front view, please repeat the procedure from the side view, also for width of shoulders, waist and thighs.”
4.4.2.2 Other person measurement: Procedure and instructions. Participants were shown two sets of standard computer digital images of another person on a computer screen, and asked to make the same three size estimates for each set using the caliper as a measuring instrument. Instructions “Please hold the caliper at arms length in front of your body at about the height of your shoulders, and look at the digital images on the computer screen in front of you. You are standing exactly 1 metre away from the screen. Please do not move forward or shift your upper body forward, as this will distort the
Cognition and body image 195
standard arrangements that have been set for all the participants. You are going to be making judgements about the width of this person’s shoulders, waist and thighs as they appear in the image. Please move the caliper to the width you believe to be correct for these three body dimensions and then tell the experimenter what measurement in centimetres it comes out to on the caliper. After making all three estimates from the front view of the person you are looking at on the screen, please do the same from the side view, also for shoulders width, waist and thighs.”
4.4.2.3 Object measurements: Procedure and instructions. Standardised digital images of two common objects (illustrated in Figures 2 and 3 above) were shown to participants and they were asked to make height and width judgements of each. Instructions “Please stand behind the line. This line is exactly 1 metre from the computer screen you are looking at. Please do not move forward or shift your body, as this will distort the standard arrangements that have been set for all participants. Please look at each of the images and by adjusting the measuring caliper, try to estimate how wide you think the width of the chair is and its height from the ground. Then read out the measurement in centimetres as marked on the caliper. Finally, please make similar size estimates for the table. These instructions are also written on the screen in front of you. If you are uncomfortable with these proceedings or want to stop at any time you are free to do so.”
5. Results Table 5.1 shows that there were significant negative correlations between Eating Disorders Inventory (EDI) & Self Error, and between EDI & Height-Weight Ratio. There are also significant correlations between BDD-R scores Other Error and Average Error. Negative correlations are reported between Gender, Self Error & Other Error. Significant correlations were observed between Overall Front Self Error and EDI subscales for Drive for Thinness & Body Dissatisfaction. These correlations are presented in Table 5.2 below. There were also significant correlations between Overall Self Error (average of Front & Side) and EDI subscales for Body Dissatisfaction, Drive for Thinness, & Perfectionism. These 3 subscales are related to body image concern, but not necessarily to Eating Disorders.
Self Error
r = −0.260* r = 0.049 r =-0.008 r = 0.013 r = 0.152 r = 0.148 r = −0.436* r = 0.136 r = −0.016 r = 0.006
Key to Table r/−r = Pearson correlation; n = 100 p < 0.05, p < 0.01*, p < 0.005**
Mean Eating Disorders Inventory
Mean Visual Object & Spatial Perception
Beck Depression Inventory
Body Dysmorphic Disorder Questionnaire
State Anxiety
Trait Anxiety
Gender
Eysenck Personality Inventory (Psychosis)
Eysenck Personality Inventory (Neurosis)
Eysenck Personality Inventory (Lie Scale)
r = −0.088
r = 0.140
r = −0.116
r = 0.604*
r = 0.118
r = 0.107
r = 0.192**
r = 0.130
r = 0.055
r = 0.090
Other Error
r = 0.145
r = −0.029
r = −0.054
r = 0.044
r = 0.088
r = 0.089
r = 0.008
r = 0.046
r = −0.029
r = −0.082
Object Error
r = 0.025
r = 0.079
r = −0.046
r = 0.244
r = 0.219
r = 0.213
r = 0.154**
r = 0.120
r = 0.049
r = −0.121
Average Error
r = −0.036
r = −0.014
r = −0.023
r = 0.000
r = −0.060
r = 0.004
r = 0.163
r = −0.020
r = 0.141
r = 0.260*
Height:Weight Ratio
Table 5.1.Correlations between a range of test battery scores and errors in estimating the size of own body dimensions, object size, and the body dimension of other people. Correlations between test scores and the height: weight ratio of participants also appear in the table.
196 Hanan Abdulwahab El Ashegh and Roger Lindsay
r = 0.044
r = −0.247
Front Self Error Shoulders r = −0.140
r = −0.295**
r = −0.184
r = −0.295**
r = 0.010
r = −0.258**
r = −0.316**
r = −0.168
r = −0.318**
r = −0.002
r = 0.109
Front Self Error Waist
Front Self Error Thighs
Overall Front Self Error
Side Self Shoulders Error
Side Self Waist Error
Side Self Thighs Error
Overall Self Side Error
Overall Self Error (side & front)
Object Error Chair
Object Error Table
r = −0.058
r = 0.095
r = 0.035
r = −0.267**
r = −0.160
r = 0.009
r = −0.079
r = −0.239
r = −0.197
r = 0.086
r = −0.316**
r = −0.249
r = −0.229
r = −0.157
r = −0.104
r = −0.204
r = −0.142
r = −0.051
r = 0.184
r = −0.262**
r = 0.161
r = −0.139
r = −0.160
r = −0.232
r = 0.083
r = −0.084
r = −0.062
r = 0.270**
r = −0.205
r = −0.073
r = −0.074
r = 0 .148
r = 0.094
r = 0.219
r = 0.151
r = −0.220 r = −0.312**
r = −0.022
r = 0.097
r = 0.080
r = 0.100
r = 0.021
r = −0.077
r = −0.175
r = −0.127
r = −0.245
r = −0.012
r = −0.080
r = −0.049
r = −0.071
r = −0.045
r = −0.106
r = −0.207
r = 0.109
r = −0.047
r = −0.016
r = −0.091
r = −0.004
r = 0.010
r = −0.060
r = 0.100
r = 0.114
r = −0.112
r = −0.079
r = 0.271**
r = 0 .106
r = −0.069
r = −0.022
r = 0.293**
EDI EDI EDI EDI EDI EDI (Body (Ineffectiveness) (Perfectionism) (Interpersonal (Interoceptive (Maturity Fear) Dissatisfaction) Distrust) Awareness)
r = 0.061
r = −0.260**
r = −0.186
r = 0.112
r = −0.135
r = −0.106
EDI (Bulimia)
EDI Key to Table r/−r = Pearson correlation (Drive for Thinness) n = 100 p < 0.05 , p < 0.01*, p < 0.005**
Table 5.2.Correlation between EDI subscales and error scores when judging one’s own body size and the size of physical objects.
Cognition and body image 197
198 Hanan Abdulwahab El Ashegh and Roger Lindsay
As Table 5.3 shows, there were significant correlations between Overall Front-Other Error & EDI Subscales Drive for Thinness, Body Dissatisfaction & Interpersonal Distrust. Correlations were also observed between Overall-Other Error & the EDI subscale of Interpersonal Distrust. Table 5.4 reports significant correlations between the VOSP Position Discrimination subscale & Front-Self Shoulder errors; Front-Self Waist errors and Side-Self Shoulders errors. There were also significant correlations between the VOSP Number Location subscale and both Front-Self Shoulder errors and Side-Self Shoulders errors. Finally, there was a significant correlation between the VOSP Dot Counting sub-scale and Front-Self Shoulder errors. Table 5.5 shows that there were no significant correlations between scores on any VOSP subscale and errors in making judgements about the body size of other people. Similarly, 5.6 shows that no significant correlations were observed between errors in judging the size of one’s own body or the size of objects and scores on the Beck Depression Inventory, the Body Dysmorphic Disorder Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory. Nor were there significant correlations between errors in judging the size Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory of other people and scores on the Beck Depression Inventory, the Body Dysmorphic Disorder Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory. As with judgements of self, so with judgements of others: Table 5.7 shows that there were no significant correlations between errors in judging the size of other people and scores on the Beck Depression Inventory, the Body Dysmorphic Disorder Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory. Psychometric measures were treated as independent variables in interpreting the data and estimation error scores were treated as dependent variables. Estimation error scores consisted of variables directly measured, e.g., Self-Front Shoulder error refers to percent error in frontal estimation of shoulder width. Averaging errors in frontal estimation of shoulders, waist and thighs yields Self-Front Error; averaging profile estimation of shoulders, waist and thighs yields Self-Side Error, etc. Pearson correlations were used to check that individual variables over which averages were computed did not behave differently from the average. For example, when a correlation between Height/Weight Ratio and the summary variable TotalSelf Error is reported, it was established that the correlation with the summary variable was a reasonable reflection of the correlation with the three variables over which the average was computed (the operational criterion for this was that the
r = 0.143
r = 0.249
r = 0.159
Overall Other Error
r = 0.158
r = 0.083
r = 0.058
r = 0.330**
Overall Side Other Error r = −0.063 r = 0.056
r = 0.033
r = 0.011
r = −0.137
r = −0.062 r = 0.129
Side Other Error Waist
r = 0.179
r = 0.143
r = 0 .004
r = 0.275**
r = 0.057
Side Other Error Thighs r = −0.056 r = −0.061 r = −0.059
r = 0.000
Side Other Error Shoulders
r = 0.070
r = 0.322**
Overall Front Other Error r = 0.288** r = 0.189
Front Other Error Thighs r = 0.251 r = 0.308**
r = 0.316
Front Other Error Waist r = 0.300** r = 0.233
r = 0.133
r = 0.134
r = 0.012
r = −0.103
r = −0.132
r = −0.093
r = 0.044
r = 0.105
r = 0.148
r = 0.059
r = 0.042
r = −0.073
r = −0.103
r = −0.002
r = −0.028
r = −0.420** r = −0.047
r = −0.157
r = −0.056
r = −0.138
r = −0.135
r = −0.492** r = −0.009
r = −0.319** r = −0.021
r = −0.483** r = 0.038
r = −0.348** r = −0.036
r = −0.046
r = −0.011
r = −0.156
r = 0.074
r = −0.028
r = −0.060
r = −0.053
r = −0.127
r = 0.036
EDI EDI EDI EDI EDI EDI (Body (Ineffectiveness) (Perfectionism) (Interpersonal (Interoceptive (Maturity Dissatisfaction) Distrust) Awareness) Fear)
r = 0.078
Front Other Error Shoul- r = 0.128 der
EDI EDI Key to Table r/−r = Pearson correlation (Drive for (Bulimia) Thinness) n = 100 p < 0.05, p < 0.01*, p < 0.005**
Table 5.3.Correlation between EDI subscales and error scores when judging the dimensions of other people’s bodies.
r = 0.090
r = −0.037
r = −0.163
r = 0.051
r = 0.080
r = 0.165
r = 0.159
r = 0.178
r = 0.053
EDI Average
Cognition and body image 199
r = −0.080
r = −0.043
r = −0.0114 r = 0.047
Side Other Error Thighs R = 0.097
Overall Side Other Error R = 0.074
Overall Other Error
r = 0.115
r = 0.039
r = 0.089
r = −0.009
r = 0.115
R = 0.037
R = 0.060
r = −0.077
r = −0.064
r = 0.035
r = 0.104
r = 0.122
r = 0.079
r = −0.012
r = −0.028 r = −0.037
Side Other Error Waist
r = −0.133
Overall Front Other Error R = 0.027
r = 0.016
R = −0.003 r = 0.020
r = −0.097
Front Other Error Thighs R = 0.040
r = 0.028
Side Other Error Shoulders
r = −0.082
Front Other Error Waist R = 0.027
r = −0.103 r = 0.049
r = −0.155
r = −0.019
r = −0.003
r = −0.016 r = 0.010
r = 0.074
r = −0.072 r = 0.137
r = 0.184
r = 0.066
r = −0.087 r = 0.016
r = −0.070 r = 0.055
r = −0.048 r = −0.092
r = −0.085 r = 0.072
r = −0.059
r = −0.014
r = −0.078
r = 0.114
r = −0.055
r = −0.076
r = −0.084
r = 0.029
r = −0.118
VOSP VOSP VOSP VOSP VOSP VOSP (Progressive (Shape (Incomplete (Dot (Position (Number Silhouettes) detection) letters) Counting) Discrimination)Location)
Front Other Error Shoul- R = −0.001 r = −0.131 ders
VOSP Key to Table r/r = Pearson correlation (Object Decision) n = 100 p < 0.05, p < 0.01*, p < 0.005**
r = −0.087
r = −0.054
r = −0.003
r = −0.019
r = −0.100
r = −0.084
r = −0.055
r = −0.025
r = −0.113
VOSP (Cube Analysis)
r = −0.055
r = −0.058
r = 0.042
r = −0.134
r = −0.042
r = −0.033
r = 0.017
r = −0.004
r = −0.085
r = 0.055
r = 0.104
r = 0.118
r = 0.062
r = 0.012
r = −0.006
r = −0.018
r = −0.010
r = −0.021
VOSP VOSP (Silhouettes) (Average)
Table 5.4.Correlations between scores on the Visual Object & Spatial Perception Battery (VOSP) and errors in estimating one’s own body dimensions.
200 Hanan Abdulwahab El Ashegh and Roger Lindsay
r = −0.063 r = −0.071
r = 0.068
r = 0.095
r = −0.004
r = 0.061
r = −0.074 r = 0.076
r = 0.197
r = 0.126
r = 0.224
Overall Front Self Error
Side Self Shoulders Error r = 0.237
r = 0.133
Front Self Error Thighs
Side Self Waist Error
Side Self Thighs Error
Overall Self Side Error
Overall Self Error
Object Error
r = 0.125
r = 0.092
r = 0.072
r = −0.010 r = 0.154
r = 0.093
Front Self Error Shoulders r = 0.192
Front Self Error Waist
VOSP (Progressive Silhouettes)
VOSP Key to Table r/−r = Pearson correlation (Object Decision) n = 100 p < 0.05, p < 0.01*, p < 0.005**
r = 0.122
r = 0.100
r = −0.160 r = −0.014
r = −0.077 r = 0.235
r = −0.111 r = 0.221
r = −0.030 r = 0.142
r = −0.101 r = 0.139
r = −0.095 r = 0.209
r = −0.020 r = 0.157
r = 0.073
r = 0.041
r = −0.148 r = 0.091
VOSP VOSP (Shape (Incomplete detection) letters)
r = 0.287*
r = −0.095
r = −0.137 r = 0.140
r = −0.014 r = 0.000
r = 0.021
r = −0.117 r = 0.229
r = −0.170 r = 0.098
r = 0.190
r = −0.005 r = 0.013
r = −0.086 r = 0.107
r = −0.199 r = 0.260*
r = 0.233* r = 0.287*
r = 0.001
r = 0.018
r = 0.129
r = 0.153
r = −0.011
r = 0.091
r = −0.012
r = −0.107 r = −0.042
r = 0.066
r = −0.077 r = 0.007
r = 0.017
r = 0.072
r = −0.192 r = −0.004 r = −0.068
r = −0.175 r = −0.010 r = 0.016
r = 0.296* r = 0.113
r = 0.027
r = −0.029
r = 0.049
r = 0.114
r = −0.107
r = −0.029
r = 0.232
r = 0.014
r = −0.033
r = −0.105
r = 0.142
VOSP VOSP (Silhouettes) (Average)
r = −0.148 r = −0.052 r = −0.001
r = −0.122 r = 0.106
r = 0.304* r = 0.109
VOSP VOSP VOSP VOSP (Dot (Position (Number (Cube Counting) Discrimination) Location) Analysis)
Table 5.5.Correlations between Visual Object & Spatial Perception Battery (VOSP) scores and errors in judging the bodily dimensions of others.
Cognition and body image 201
r = −0.078
r = −0.019
r = 0.049
r = −0.007
r = −0.042
r = 0.020
r = −0.008
r = 0.046
Self Front Error Thighs
Overall Front Self Error
Side Self Shoulders Error
Side Self Waist Error
Side Self Thighs Error
Overall Self Side Error
Overall Self Error
Object Error
r = 0.008
r = 0.013
r = 0.079
r = 0.098
r = 0.022
r = 0.082
r = −0.086
r = −0.145
r = −0.007
r = −0.132
Self Front Error Waist
r = 0.089
r = 0.152
r = 0.176
r = 0.149
r = 0.125
r = 0.147
r = 0.049
r = 0.046
r = −0.005
r = 0.088
r = 0.109
r = 0.120
r = 0.112
r = 0.128
r = 0.065
r = 0.109
r=
r = 0.047
r = 0.050
r = 0.000
Self Front Error Shoulders r = 0.148 r = 0.048
Trait Anxiety
Beck Depression Body State Anxiety Key to Table r/−r = Pearson correlation Inventory Dysmorphic Disorder n = 100 Questionnaire p < 0.05, p < 0.01*, p < 0.005**
r = −0.054
r = 0.136
r = 0.138
r = 0.043
r = 0.116
r = 0.122
r = 0.095
r = 0.070
r = 0.019
r = 0.090
r = −0.029
r = −0.016
r = −0.034
r = −0.088
r = −0.093
r = 0.038
r = 0.036
r = −0.010
r = −0.066
r = 0.128
r = 0.145
r = 0.006
r = 0.014
r = 0.014
r = −0.062
r = 0.063
r = −0.008
r = −0.125
r = 0.039
r = 0.093
Eysenck Personality Eysenck Personality Eysenck Inventory (Psychosis) Inventory Personality (Neurosis) Inventory (Lie Scale)
Table 5.6.Correlations between various measures of psychopathology and errors in judging one’s own bodily dimension and in judging the dimensions of objects.
202 Hanan Abdulwahab El Ashegh and Roger Lindsay
r = 0.067
r = 0.064
Front Other Error Waist
Front Other Error Thighs
r = 0.237
r = 0.157
Side Other Error Shoulders r = 0.159
r = 0.233
r = −0.142
r = 0.097
r = 0.130
r = 0.046
Side Other Error Waist
Side Other Error Thighs
Overall Side Other Error
Overall Other Error
Overall Object Error
r = 0.008
r = 0.192
r = 0.252
r = 0.128
r = 0.071
Overall Other Front Error r = 0.111
r = −0.050
r = 0.127
r = 0.089
r = 0.107
r = 0.102
r = 0.051
r = 0.090
r = 0.066
r = 0.072
r = −0.004
r = 0.127
r = 0.045
r = 0.127
Front Other Error Shoulders
r = 0.087
State Anxiety
Beck Depression Body Dysmorphic Key to Table r/−r = Pearson correlation Inventory Disorder Questionnaire n = 100 p < 0.05, p < 0.01*, p < 0.005**
r = 0.088
r = 0.118
r = 0.079
r = 0.005
r = 0.086
r = 0.080
r = 0.108
r = 0.016
r = 0.135
r = 0.101
Trait Anxiety
r = −0.054
r = −0.116
r = −0.016
r = 0.032
r = −0.028
r = −0.049
r = −0.159
r = −0.023
r = −0.187
r = −0.159
r = −0.029
r = 0.140
r = 0.091
r = 0.059
r = 0.032
r = 0.097
r = 0.132
r = 0.155
r = 0.114
r = 0.043
r = 0.145
r = −0.088
r = −0.077
r = −0.038
r = −0.004
r = −0.123
r = −0.066
r = −0.099
r = −0.012
r = −0.043
Eysenck Personality Eysenck Personality Eysenck Disorder Inventory Disorder Inventory Personality (Psychosis) (Neurosis) Disorder Inventory (Lie Scale)
Table 5.7.Correlations between various measures of psychopathology and errors in judging the bodily dimensions of other people.
Cognition and body image 203
204 Hanan Abdulwahab El Ashegh and Roger Lindsay
Pearson r value for the average did not differ by more than 0.3 from the r value for any of the components of the average). Pearson correlations were also used as a filter to select those independent variables showing an association with the “summary” dependent variables. Observed correlations between subscales of the Eating Disorders Inventory and average overall error scores are reported in Table 5.8 and correlations between overall averages for other measures of psychpathology and average overall error scores are reported in Table 5.9 below. 5.1 Outcome of factor analysis Fourteen variables were entered into the factor analysis (Age, Gender, Height/ Weight Ratio, Body Dissatisfaction Inventory, Body Dysmorphic Disorders Questionnaire, Eating Disorders Inventory — Striving for perfection subscale, Eating Disorders Inventory — Body dissatisfaction subscale, Eating Disorders Inventory — Drive for thinness subscale, Average Visual Object and Space Perception Test score, Average Self-Front error, Average Self-side error, Average Other-front error, Average Other-side error, and Average Object Error). A Principal Components Analysis with Varimax rotation extracted 5 factors with an eigenvalue greater than 1.0, which together accounted for 63.6% of the variance. Factor naming and interpretation was based upon variables with factor loadings of greater than 0.4. Factors meeting this criterion are summarised in Table 5.10 and illustrated in Figure 5. These factors are briefly discussed below. Factor 1 (24.3% of variance) Negative female body image The variables loading onto this factor were: Gender (0.9), Eating Disorders Inventory — striving for Perfection subscale (0.6), Eating Disorders Inventory —
25 20 15 10 5 0 Factor 1
Factor 2
Factor 3
Factor 4
Figure 5.Scree Plot of percent of variance explained by Factors 1–5.
Factor 5
p < −0.01
p < −0.01**
p < 0.01**
p < 0.01**
p < −0.01**
EDI EDI (Perfectionism) (ID)
Overall Self Error (Side & Front)
p < −0.01**
p < −0.01**
EDI EDI (Body (I) Dissatisfaction)
p < 0.01**
p < 0.01**
Side Self Thighs Error
EDI (Bulimia)
Overall Side Self Error
p < −0.01**
p < −0.01**
p < −0.01**
Side Self Waist Error
Side Self Shoulders Error
Overall Front Self Error
Front Self Thighs Error
Front Self Waist Error
Front Self Shoulders Error
EDI (Drive for Thinness)
EDI (IA)
p < 0.01**
p < 0.01**
EDI (Maturity Fear)
Table 5.8.Significant correlations between subscales of the Eating Disorders Inventory and average overall error scores.
p < −0.01**
p < −0.01**
p < −0.01**
p < −.01**
EDI Mean
Cognition and body image 205
Gender
Trait Anxiety
State Anxiety
Body Dysmorphic Disorder-Q
Beck Depression Inventory
Average Visual Object & Space Perception Battery
p < −0.01**
Eating Disorders Inventory Aver- p < −0.01** age
Self Error
p < 0.01**
p < 0.005***
Other Error
Object Error
p < 0.01**
Average Error p < 0.01**
Height:Weight Ratio
Table 5.9.Significant correlations between overall averages for various measures of psychpathology and average overall error scores.
206 Hanan Abdulwahab El Ashegh and Roger Lindsay
0.779
Average Side Other Error
Object Error
0.772
Average Front Other Error
0.762
Average Self Side Error
0.760
Average Visual Object & Spatial Perception Battery
0.715
0.633
Height:Weight Ratio
Average Front Self Error
0.461
0.644
−0.431
Factor 4
Eating Disorder Inventory Average
Factor 3
0.790
0.887
Posneg
0.631
Factor 2
Beck Depression Inventory
−0.507
Gender
Age
Factor 1
Factor Loadings > 0.4
Table 5.10.Factor Loadings with Varimax rotation.
0.956
Factor 5
Cognition and body image 207
208 Hanan Abdulwahab El Ashegh and Roger Lindsay
body dissatisfaction subscale (0.6), Eating Disorders Inventory — drive for thinness subscale (0.6), Average Self-Front error (−0.5), Average Self-side error (−0.7), and Average Other-front error (0.7). The most satisfactory interpretation of this factor seems to be that women who strive for perfection and who are dissatisfied with their body characteristics and have a high drive towards thinness, tend to underestimate their own bodily dimensions. [N. B. the mean value for the self estimation variables is negative (Front: −0.8; Side: −9.5) so a negative factor loading implies that as values on other variables on which the factor loads positively increase, so scores on this variable get smaller, i.e. become more negative]. There is also an association with overestimation of the dimensions of other people when viewing them frontally. Factor 2 (11.9% of variance) Age-dependent body-dissatisfaction The variables loading onto this factor were: Age (−0.5), Body Dissatisfaction Index (0.7), Body Dysmorphic Disorders Questionnaire (0.7), and Eating Disorders Inventory — body dissatisfaction subscale (0.4). The factor interpretation seems to be that the younger a person is, the more dissatisfied they are with the physical parameters of their own body. Factor 3 (10.1% of variance) Self-Objectivity The variables loading onto this factor were: Height/Weight Ratio (0.8), Eating Disorders Inventory — body dissatisfaction subscale (0.5), Eating Disorders Inventory — drive for thinness subscale (0.5), Average Visual Object and Space Perception Test score (0.5). Factor 3 seems to indicate that people who are overweight, and who perceive object and spatial relations accurately, tend to have high body dissatisfaction and a high drive for thinness. Factor 4 (9.3% of variance) Inattention to body-size of others The variables loading onto this factor were: Eating Disorders Inventory — striving for Perfection subscale (−0.4), Average Other-front error (0.4), and Average Other-side error (0.9). This factor indicates that people who do not strive for perfection tend to make errors in estimating the physical size of other people’s bodies. Factor 5 (8.1% of variance) Accuracy orientation The variables loading onto this factor were: Eating Disorders Inventory — striving for Perfection subscale (0.6), Average Visual Object and Space Perception Test score (0.5), and Average Object Error (−0.8). The most satisfactory interpretation for Factor 5 would seem to be that people
Cognition and body image 209
who strive for perfection also tend to accurately perceive objects and spatial relations, and to make few errors in estimating the physical size of objects.
6. Discussion The investigation described in the present report had a broad as well as a narrow purpose. The narrow purpose was to examine experimentally the hypothesis that cognitive representations of the physical self are mediated by mechanisms that are psychologically and neurologically independent of the cognitive operations that model the physical world of space and objects. The broader purpose was to seek evidence that natural technology exists. The rationale here was to consider a cognitive function that seems as basic and fundamental as a cognitive function can be: the set of processes by which a human organism represents itself as a physical and social agent. The survival of an organism depends crucially upon the accuracy and validity of the planning processes that underlie its ability to act in and upon the physical world. In representing the physical self, if anywhere, it might be expected that evolution would have ensured that underlying processes are “user-proof”. Just as physical processes, such as the control of heart or kidney function, are not susceptible to voluntary control, so, if cognitive processes are “hardwired”, self-representation by an agent is so cognitively fundamental that user intervention would surely be impossible. We have argued for an alternative view. Effective human action depends not only upon efficient and accurate computational processes, but also upon motivational factors involving relatively abstract conceptions such as self-worth. Why should agents who attach no value to their own life, develop or implement a plan to avoid death at the hands of a predator or an enemy? In human society self-worth is intimately bound up with conceptions of one’s physical self. Attractiveness to others is not the sole determinant of self-worth, but it is plausible to suppose that it makes an important contribution. This line of thinking suggests why it might be important that representations of physical self should be cognitively mutable — human agents may sometimes need to see themselves not as others see them. If physical self were just one more spatial object, represented within a general cognitive system for modelling the physical world, it would be difficult to mis-represent the self without constantly calling attention to the deceit by over-riding normal scaling operations whenever the self was the object of cognition. If self-deception were the objective, this process would be self-
210 Hanan Abdulwahab El Ashegh and Roger Lindsay
defeating. Whilst it might be possible to achieve the same end by introducing appropriate distortions into all spatial and object modelling, there would be a high price to pay for this, as all action plans would then be computed from spatial models with reduced validity. Given the premiss that misrepresentation of physical self can be cognitively desirable under some circumstances, the most obvious way of achieving this end is by developing a special-purpose modelling system that incorporates exactly the sought distortions, but applies them to no object other than the self. However, except in cases such as hereditary disorders, the misrepresentations required will almost always depend upon contingencies within an individual’s life and cultural context: becoming overweight in a culture valuing youth-like slimness; facial disfigurement in a culture valuing intactness and beauty, and so forth. The self-modelling distortions necessary to conceal or eliminate such aberrations could not be known in advance by evolution, nor could the aesthetic values arising from cultural context. Hence, the modelling process must necessarily be developed within the cognitive lifetime of the agent. It follows that the self-modelling process cannot be hardwired and must instead use cognitive software — in other words it must constitute an example of what we have referred to as natural technology. What then is to be expected in the data if cognitive representation of physical self is hard-wired (or at least, not susceptible to distortion by psychological factors)? And what is to be expected if the system for self-representation has developed with an inbuilt capability to incorporate culturally desirable distortions? If self-representation is not a special-purpose module, then errors in judgements concerning the physical parameters of one’s own body should be highly correlated with errors in judgement concerning the bodies of others, and with errors in judging the spatial characteristics of inanimate objects. Judgements about body and object size should also be positively correlated with measures of spatial intelligence and measures of perceptual accuracy. None of these things should show any correlation with personality variables such as anxiety or depression scores, or with psychometric measures of attitudes to self or one’s own body. When the data is subjected to factor analysis, accuracy of size and spatial judgements should load onto one factor that is entirely independent of personality and attitude variables. If the natural technology hypothesis is correct, the expected outcomes are quite different. Judgements about physical self should show no or low correlations with judgements about the bodily dimensions of other people and physical objects. Because the self-representation module is hypothesized to be
Cognition and body image
sensitive to psychological processes — indeed this is its raison d’être — a positive correlation with personality and attitude measures is to be expected. Factor analysis should yield a rather more complex factor solution with physical self-judgements loading onto the same factor as some personality measures, but showing independence from factors associated with size judgements about objects and other people. The most striking feature of the results we have reported is the outcome of the factor analysis, which suggests that errors in judging the physical size of one’s own physical dimensions from a photograph, errors in judging the size of other people and errors in judging the size of objects such as chairs and tables are assigned to quite separate and independent factors. High error scores in judging one’s own bodily dimensions are associated with females (0.9) with body dissatisfaction (0.6) and a drive for thinness (0.6) particularly amongst participant who strive for perfection (0.6) [Factor 1, see Table 5.8 above]. Inaccuracy in making size judgements of others’ bodies (Front 0.4; Side 0.9) is also associated with striving for perfection, but this time the association is negative (−0.4) [Factor 4, see Table 5.8 above]. The lower participants score on striving for perfection items, the less accurate their judgement of others tends to be. Finally, accurate judgement of the size of objects (−0.8) is negatively associated (−0.8) with VOSP scores (0.5) and positively associated with striving for perfection (0.6) [Factor 5, see Table 5.8 above]. Factor 5 is perhaps least surprising: to the extent that VOSP scores measure what they are intended to measure, accurate perception of objects and spatial relations, this is precisely the outcome to be expected. It seems likely that the association with the striving for perfection subscale of the EDI reflects the importance that participants attach to making accurate judgements. If people accurately perceive objects and spatial relations, and accuracy is important to them, then they make relatively few errors in judging object dimensions. Conventional theories of size estimation, assuming a unitary perceptual apparatus, applied in a standard manner to all classes of input, would predict the same negative association between VOSP scores and errors for judgements of self-size and other-size. The fact that VOSP scores showed little or no association with error scores in either case (see Tables 5.4 and 5.5) suggests that there are important differences in the cognitive operations underlying perceptual judgements about objects, and perceptual judgements about people. Size estimation requires a human judge to cognitively represent the person or object depicted in a photograph, to retrieve from memory an object of known size, to adjust the two cognitive representations to the same scale, and
211
212 Hanan Abdulwahab El Ashegh and Roger Lindsay
finally to read off the size of the unknown parameters in the perceptual representation. The absence of a VOSP connection with errors in estimating the physical parameters of people, suggests that individual differences in error rate between participants arise from the cognitive components of the operation (retrieval and comparison), rather than the perceptual component. The accuracy of the scaling operation seems to be related to the striving for perfection subscale of the EDI. Adult participants are highly skilled at perceptual scaling, so the observed error rate may well reflect the effort that they are prepared to expend in achieving accuracy. In estimating object size, if they have accurately perceived and represented the object, and they strive for accuracy in judgement then low error scores result. In judgements about other people, judgements are accurate except amongst participants who are not predisposed to strive for accuracy. If this account is correct, then judgement errors about objects and judgements errors about others can be explained by the same basic cognitive process, with observed differences in performance resulting from a greater contribution of perceptual factors when objects are judged, and from differences in the extent to which participants strive for accuracy between the two cases. When judging objects, people who try atypically hard make lower than average errors, when judging other people, participants who do not strive for accuracy make higher than average errors. The anomalous case is the error rate in conditions in which people make size judgements about their own body, and in this condition, the tendency to underestimate their own body parameters is associated with female participants in particular. There is no association with VOSP scores, so the errors do not result from faulty perception or spatial judgement. There is a positive association with striving for perfection, so that if this variable can legitimately be interpreted as an index of the effort applied in trying to judge accurately, it would appear that the comparison component of the represent, retrieve, compare model is not the source of the increased error rate. It seems more likely that the errors in self-estimation are related to the retrieval component. If the retrieved image of known size is incorrectly scaled, accurately mapping it onto the perceptual representation derived from the stimulus photograph would still produce incorrect size estimations. We have argued earlier in this chapter that there seems to be a strong theoretical case for supposing that cognitive operations related to the physical self depend upon the existence of a Body-Image Generator. It seems reasonable to use this term to refer to the cognitive process by which an image of one’s own body is produced from memory. The implicit claim that own-body images are not processed by the same image generator that
Cognition and body image
handles objects and other people, is consistent with the evidence that self-size estimates do not seem to be scaled in the same way as object-size estimates or other-size estimates. Evidence from “raw” correlations seems to support this theoretical analysis. Table 5.1, Table 5.4 and Table 5.5 show that though there are significant correlations between Total VOSP scores and Height/Weight Ratio (see Table 5.1. p. 58), there are no correlations between Total VOSP and any of the other body size and object error estimations (see Table 5.4 and Table 5.5). It was hypothesized in advance of the study that inaccurate perception of one’s own body would not be related to any general deficiency in perception or spatial judgement. Instead, it was suggested there may be two distinct sub-types of BDD, both resulting from faulty operation of the BIG mechanism. One BDD subtype was hypothesized to result from a faulty cognitive representation of one’s own body. This type of BDD (Type 1 BDD) was expected to be independent of cultural stereotypes and depressive illness: cases in which though the actual body is normal, BIG generates an image that is anomalous in some way (e.g., excessively large in some dimensions). Individuals with this impairment may have normal bodies, and normal evaluation criteria, but will still perceive their body as defective. The second type of BDD (Type 2 BDD) was expected to be associated with a negative view of an undistorted cognitive representation of body, and associated with internalized cultural norms, depression, and possibly side-effects of drugs prescribed to alleviate emotional imbalance: cases in which the actual body is normal, and BIG generates an accurate image of the body, but individuals judge their body image as defective because of inappropriate norms or the application of negative schemas to themselves The data from the present study, particularly the factor analysis outcomes summarized above, provide clear support for the existence of Type 1 BDD. As expected, this BDD subtype is associated with high body dissatisfaction scores & self-estimation errors, but independent of VOSP scores (as perception is not compromised) & Height/Weight Ratio as body dissatisfaction is not supposed to result from the actual physical characteristics of the body. The extremely strong association with gender (0.9) arises because in the database, females were coded as 1 and males as 0. Hence, the positive association indicates that Type 1 BDD occurs predominantly amongst females. If Type 1 BDD resulted from a structural or genetic deficiency of some kind, it might be expected to occur with roughly equal frequency in both genders. A genetic link between sex and cognitive characteristics is not biologically impossible, but despite an almost obsessive search for male/female differences in cognition, little of substance has
213
214 Hanan Abdulwahab El Ashegh and Roger Lindsay
yet been found (see, for example, Maccoby and Jacklin, 1974). If this argument is accepted, the conclusion: that some females in our culture suffer psychological distress because of faulty cognitive representations of their own body, has two implications. Firstly, the characteristics of BIG, the operations of which appear to underlie the problem, are the result of learning, and as such, are likely to be localized in time and culture. This does indeed seem to be indicated by the data — particularly the association with gender. Though the present study cannot support a detailed analysis of the mechanisms giving rise to inaccurate cognitive representations of faulty body image, a process that would be sufficient to do so can readily be imagined. In a culture that assigns value to females according to their body size and shape, those females who attach importance to social approval will experience distress if their bodies do not conform to the most highly valued body-stereotype (image-stereotype mismatch). Distress could be reduced if image-stereotype mismatch can be eliminated, and this can be achieved either by generating and maintaining a distorted cognitive representation of a high-value female body stereotype, or by generating and storing a distorted cognitive representation of one’s own body. The cost of this strategy will come from having to deal with a constant stream of perceptual information that is discrepant with respect to stored cognitive self-representations. This constant challenge to the veridicality of cognitions can be removed either by systematically misperceiving reality, or by bringing reality into line by changing one’s bodily characteristics to conform to the stored self-image, e.g., by dieting. An optimistic implication of this analysis is that if the characteristics of BIG that underlie Type 1 BDD are learned, then they can also be modified via re-learning processes. This suggests that in extreme cases of BDD re-educating BIG through a systematic training regime intended to produce more accurate judgements of own-body parameters might offer an effective clinical intervention strategy. This would sharply contrast with current clinical approaches based upon attempts to directly modify behaviours eating habits, or to reduce distress by administering drugs such as tranquillizers. Such approaches have been notoriously unsuccessful. If the source of dissatisfaction with their own bodies amongst females lies in cognitive processes, then only cognitive interventions are likely to work, and it is unsurprising that attempts to solve the problem by modifying consequential behaviour or emotions have not been successful. The evidence for Type 2 BDD is indicative rather than compelling, as the present study used normal and not clinical participants, the data showing the presence of few or no cases of eating disorder (mean score on EDI = 3.6. Criterion score for ED is > 10 on all subscales. The number of participants
Cognition and body image
exceeding the criterion score was therefore zero). Nor were there any cases of clinical depression present in the sample (mean score on BDI=10.5; The number of participants exceeding the criterion score for moderate depression was 13, with no participants exceeding the score of 30 required for moderate depression). It is consequently unsurprising that associations between variables directly associated with pathology were insufficiently robust to emerge in the factor analysis. A correlation between EDI Total scores and average overall Front and Side Self estimation error was observed, but as Tables 5.2, 5.3 and 5.8 show, correlations were predominantly between estimation errors concerning Self and specific EDI subscales: Drive for Thinness (EDI-DT), Body Dissatisfaction (EDI-BD), and Perfectionism (EDI-P). These correlations imply, not an association between body estimation errors and eating disorders, but rather a correlation between estimations of body size, and the EDI sub-scales related to body image. The correlational data does, however, suggest that some of the expected relationships are present in the data in a weak form, emerging when the power of the analysis is increased by averaging over variables such as photograph angle (Front or Side) and Self- and Other-judgements (these relationships are reported in Tables 5.8 and 5.9). To take one example, it has been suggested in the research literature (Phillips, 1996b) that there is an association between depression and body over- or under-estimation. Results from the present study weakly support this claim as there was a significant correlation between scores on the Beck Depression Inventory and the overall average error score in estimating size (r = 0.01; n = 100; p < 0.01; see Table 5.1). However, the fact that the relationship was barely significant, even when all size estimation tasks were combined, means that it would be unwise to place much reliance on this finding. The correlational data also showed a significant association between scores on the Body Dysmorphic Disorder Questionnaire and overall average error score in body parameter estimation conditions (r = 0.154; n = 100; p < 0.005; see Tables 5.1 and 5.9). This association has also been reported in the research literature (Phillips, 1996b). The general claim is that people scoring high on BDDQ-R are liable to cognitive-perceptual distortions in the way that they see their own bodies: imagining features that are not present or exaggerated size or shape. Such individuals also tend to have preoccupations with imagined defects in appearance, so that if a slight physical anomaly is present, the person’s concern is markedly excessive (DSM-IV, 1997). Dissatisfaction with body does not appear to be correlated with trait or state anxiety, as there is no significant correlation between either measure and errors in size estimation (of Self, Others or Objects).
215
216 Hanan Abdulwahab El Ashegh and Roger Lindsay
The theoretical analysis offered in the present study suggests that syndromes involving body dissatisfaction, misperceptions of body size and shape and eating disorders may have two distinct aetiologies. In Type 1 BDD, the origin of any eventual pathology seems to lie in defensive cognitive distortions strategically introduced to control or suppress dissatisfaction with one’s own body, particularly in females. Cognitive representations embodying scaleddown Self-dimensions or scaled-up Other-dimensions may eventually produce secondary behavioural and emotional problems, for example by misleading a person into being inappropriately tolerant of excessive bodyweight in comparison to the scaled-up representations of others. Type 2 BDD begins with an emotional problem For example, depression may cause self-misattribution of negative characteristics: ‘Everything about me is bad, being overweight is bad, so I must be overweight’. Alternatively high anxiety may cause a preoccupation with physical aspects of one’s body that produces scaling errors via attentional processes, just as a mouth ulcer or facial blemish can subjectively assume exaggerated size because of the attention it attracts. Neither Type 1 nor Type 2 BDD are hypothesized to be caused directly by eating disorders, but rather it is supposed that they lead to unrealistic cognitive representations which interfere with the cognitive mechanisms normally involved in body-size regulation. Taking a broader theoretical perspective on the findings we have reported, the data seems to provide reasonably robust support for two conclusions. Firstly, the cognitive mechanism underlying perception of the physical characteristics of a person’s own body seems to be independent of the mechanisms involved in perceiving objects or other people’s bodies. The evidence for this takes the form of a double dissociation (Teuber, 1955): some variables affecting own body perception do not affect object and other-body perception, and vice versa. Double dissociation evidence is generally taken to be strongly indicative that two systems are independent. Secondly, the data also show that own body perception is affected by psychological variables and judgements that would seem to depend upon cultural norms. Though somewhat surprising in the context of standard and lay assumptions about the perception of the physical world and one’s own body in particular, these findings are exactly what would be expected if own body perception is the product of natural technology manifest in the form of a special-purpose module. In addition to making sense of some rather surprising data, this theoretical framework is also productive in that at least potentially it leads to new approaches to clinical intervention in a domain within which old approaches have a poor record of success.
Cognition and body image 217
Notes 1. Eating disorders is one of two main classes of psychological body image disorder having relatively high prevalence (see Note 2 below for discussion of body dysmorphism, the second main group of disorders). “Eating disorders involve serious disturbances in eating behavior, such as extreme and unhealthy reduction of food intake or severe overeating, as well as feelings of distress or extreme concern about body shape or weight … Eating disorders are not due to a failure of will or behaviour; rather, they are real, treatable medical illnesses in which certain maladaptive patterns of eating take on a life of their own. The main types of eating disorders are anorexia nervosa and bulimia nervosa. A third type, binge-eating disorder, has been suggested but has not yet been approved as a formal psychiatric diagnosis. Eating disorders frequently develop during adolescence or early adulthood, but some reports indicate their onset can occur during childhood or later in adulthood” (Spearing, 2001, p. 1). Eating disorders are far more common amongst females than males. Only an estimated 5 to 15 percent of people with anorexia or bulimia (Andersen, 1995) and an estimated 35 percent of those with binge-eating disorder are male (Spitzer et al., 1993). An estimated 0.5 to 3.7 percent of females suffer from anorexia nervosa in their lifetime (American Psychiatric Association Work Group on Eating Disorders, 2000). Symptoms of anorexia nervosa include: resistance to maintaining body weight at or above a minimally normal weight for age and height; intense fear of gaining weight or becoming fat, even though underweight; disturbance in the way in which one’s body weight or shape is experienced, undue influence of body weight or shape on self-evaluation, or denial of the seriousness of the current low body weight; infrequent or absent menstrual periods (in females who have reached puberty (Spearing, 2001). People with anorexia nervosa see themselves as overweight even though they are dangerously thin. The process of eating becomes an obsession. Unusual eating habits develop, such as avoiding food and meals, picking out a few foods and eating these in small quantities, or carefully weighing and portioning food. People with anorexia may repeatedly check their body weight, and many engage in other techniques to control their weight, such as intense and compulsive exercise, or purging by means of vomiting and abuse of laxatives, enemas, and diuretics. Girls with anorexia often experience a delayed onset of their first menstrual period (Spearing, 2001). The mortality rate among people with anorexia has been estimated at 0.56 percent per year, or approximately 5.6 percent per decade, which is about 12 times higher than the annual death rate due to all causes of death among females ages 15–24 in the general population (Sullivan, 1995). The most common causes of death are complications of the disorder, such as cardiac arrest or electrolyte imbalance, and suicide. Approximately 1.1 percent to 4.2 percent of females have bulimia nervosa in their lifetime (American Psychiatric Association Work Group on Eating Disorders, 2000). Symptoms of bulimia nervosa include: recurrent episodes of binge eating, characterized by eating an excessive amount of food within a discrete period of time and by a sense of lack of control over eating during the episode; recurrent inappropriate compensatory behaviour in order to prevent weight gain, such as self-induced vomiting or misuse of laxatives, diuretics, enemas, or other medications (purging); fasting; or excessive exercise, the binge eating and inappropriate compensatory behaviours both occur, on average, at least twice a week for 3 months.
218 Hanan Abdulwahab El Ashegh and Roger Lindsay
Self-evaluation is unduly influenced by body shape and weight; because purging or other compensatory behaviour follows the binge-eating episodes, people with bulimia usually weigh within the normal range for their age and height. However, like individuals with anorexia, they may fear gaining weight, desire to lose weight, and feel intensely dissatisfied with their bodies. People with bulimia often perform the behaviours in secrecy, feeling disgusted and ashamed when they binge, yet relieved once they purge (Spearing, 2001). Community surveys have estimated that between 2 percent and 5 percent of Americans experience binge-eating disorder in a 6-month period (Bruce and Agras, 1992; Spitzer et al., 1993). Symptoms of binge-eating disorder include: recurrent episodes of binge eating, characterized by eating an excessive amount of food within a discrete period of time and by a sense of lack of control over eating during the episode. Binge-eating episodes are associated with at least 3 of the following: eating much more rapidly than normal; eating until feeling uncomfortably full; eating large amounts of food when not feeling physically hungry; eating alone because of being embarrassed by how much one is eating; feeling disgusted with oneself, depressed, or very guilty after overeating; marked distress about the binge-eating behaviour. Binge eating occurs, on average, at least 2 days a week for 6 months. Binge eating is not associated with the regular use of inappropriate compensatory behaviours (e.g., purging, fasting, excessive exercise). People with binge-eating disorder experience frequent episodes of out-of-control eating, with the same binge-eating symptoms as those with bulimia. The main difference is that individuals with binge-eating disorder do not purge their bodies of excess calories. Therefore, many with the disorder are overweight for their age and height. Feelings of self-disgust and shame associated with this illness can lead to further bingeing, creating a cycle of binge eating (Spearing, 2001). Obsessive concern with dieting among adolescents has frequently been linked to a general dissatisfaction with their bodies (Huenemann et al., 1966). Huenemann et al. (1966) found that, among US ninth graders, 50% of boys and 65% of girls said that they were “trying to do something about their weight”. Tobin-Richards, Boxer, and Peterson (1983) reported that adolescent girls were less satisfied with their weight than adolescent boys. This satisfaction was linked to perceived body weight, with girls expressing most satisfaction when they perceived themselves as underweight and least satisfaction when they perceived themselves as overweight. Concern with thinness and dieting has been linked to an increasing prevalence of eating disorders among adolescent girls. Concern with weight, dieting, and body image is apparently associated with eating behaviours such as fasting, crash diets, binge eating, and selfinduced vomiting and with the use of laxatives, diuretics, and diet pills (Greenfeld et al., 1995). Furthermore, a strong desire for thinness has been associated with problems of eating behaviour (Lundholm and Littrell, 1986). 2. The second main group of body image disorders is Body Dysmorphic Disorder. A condition known as ‘dysmorphophobia’ was first described as long ago as 1886 as a: “subjective feeling of ugliness or physical defect which a patient feels is noticeable to others, although he/she has appearance within normal limits” (Morselli, 1886, p. 100). Body Dysmorphic Disorder (BDD) was first recognized as a distinct disorder only in DSM IV (1997) and placed within the category of somatoform disorders (disorders having a pattern of
Cognition and body image 219
recurring multiple, clinically significant, somatic complaints — summary of DSM-IV, p. 486) According to DSM-IV the diagnostic criteria for Body Dysmorphic Disorder are: i.
Preoccupation with an imagined defect in appearance. If a slight physical anomaly is present, the person’s concern is markedly excessive. ii. The preoccupation causes clinically significant distress or impairment in social, occupational, or other important areas of functioning. iii. The preoccupation is not better accounted for by another mental disorder (e.g., dissatisfaction with body shape and size in Anorexia Nervosa. Body dysmorphic disorder (BDD) is a distressing and sometimes psychologically disabling preoccupation with an imagined or slight defect in appearance. BDD is regarded as an “Obsessive-Compulsive spectrum disorder” that appears to be relatively common. BDD often goes undiagnosed, however, due to patients’ reluctance to divulge their symptoms because of secrecy and shame. Any body part can be the focus of concern (most often, the skin, hair, and nose), and most patients engage in compulsive behaviours, such as mirror checking, camouflaging, excessive grooming, and skin picking. Approximately half are delusional, and the majority experience ideas or delusions of reference (Phillips et al., 1996). Nearly all patients suffer some impairment in functioning as a result of their symptoms, some to a debilitating degree. Psychiatric hospitalization, suicidal ideation, and suicide attempts are relatively common. Phillips et al. (1996, p. 126) note that depression and suicide are frequent complications of BDD, other manifestations including obsessional preoccupation with an imagined appearance defect; clinically significant distress or functioning impairment; impoverished social interactions because of embarrassment or shame; attempts to camouflage the perceived deformity with clothing, makeup, or hair; use of non-psychiatric treatment (i.e., dermatologic or plastic surgery) with some patients even attempting surgery on themselves. As clinical depression involves negative attitudes to pretty much everything, it might be thought unsurprising that this may include negative attitudes to features of one’s physical self. However, Phillips (1996a) is clearly of the opinion that the causal direction is from BDD to depression: “Patients with BDD are more prone to major depression…. in clinical settings, 60% of patients with BDD have major depression, and the lifetime risk for major depression in these patients is 80%. Patients with this co-morbid duo are at risk for suicide. Determining if depressed patients have BDD is important because the treatment is different. Usually, major depression occurs as a result of the BDD, not vice versa.” (Phillips, 1996a, p. 156). Body dissatisfaction seems to be systematically related to body size distortion. The relationship between body dissatisfaction and body distortion was examined in an experiment by Gardner-Rick and Tockerman (1993). These investigators assessed Body Image accuracy using the Colour-a-Person Test (Wooley and Roll, 1991), and a computer based TV video that measured distortions resulting from inaccurate body size estimations and selfideal discrepancies. Results showed there was a significant correlation between body dissatisfaction and body distortion (Gardner-Rick and Tockerman, 1993). A number of studies by Altabe and Thompson (1996) support the claim that body image acts like a cognitive structure. For example social comparison enhances the body image schema priming effect. Additionally, high trait distress individuals tended to be more sensitive to priming of both body image information and self-relevant information (Altabe and Thompson,
220 Hanan Abdulwahab El Ashegh and Roger Lindsay
1996). Altabe and Thompson conclude that these studies “supported the interpretation of body image as a mental representation” (ibid., p. 191). Treatment for people with BDD involves accurate assessment, proper diagnosis, and adherence to a medical regimen. According to Kirksey et al.: “Although reportedly difficult to treat, selective serotonin reuptake inhibitors (SSRIs) and cognitive-behavioural therapy have been effective for some individuals with BDD. Examples of SSRIs that have resulted in improvement are: Prozac TM (fluoxetine), AnafranilTM (clomipramine), LuvoxTM (fluvoxamine), Zoloft TM (sertraline) and Pails (paroxetine)” (Kirksey, Goodroad, Butensky and Holt-Ashley, 2000, http://www.ispub.com). Psychoactive drugs that appear to be clinically effective in treating BDD are substances generally reported to be beneficial in cases of depression. This suggests that drug therapy is associated with BDD resulting from negative evaluations of body image rather than faulty body image representations. The present review is most concerned with BDD resulting from misjudgements of body size and shape, and it is unlikely that such cases would respond to drugs designed to relieve depression. Lack of sensitivity to anti-depressant drugs may even serve as a criterion for BDD resulting from cognitive misrepresentation of body image. 3. Bonnier (1905) seems to have first used the term body schema in reporting observations of patients with brain lesions affecting their bodily experiences. In Bonnier’s study the experiences of interest resulted from vestibular dysfunction, which, according to Bonnier, alters the way that a subject experiences spatial aspects of the body. He referred to the disordered experiences that resulted from this disorder as aschematie (Bonnier, 1905, p. 606). Head and Holmes (1911) studied the pattern of impairment in postural sensation following lesions in various parts of the central nervous system to conclude that: “[t]he final product of the tests for the appreciation of posture or passive movement rises into consciousness as a measured postural change. For this combined standard against which all subsequent changes of posture are measured before they enter consciousness we propose the word schema” (Head and Holmes, 1911, p. 246). Head (1926) claimed that a body schema is a model or representation of one’s own body that constitutes a standard against which postures and body movements are judged. This representation can be considered the result of comparisons and integrations at the cortical level of past sensory experiences (postural, tactile, visual, kinaesthetic and vestibular) with current sensations. This gives rise to an almost completely unconscious “plastic” reference model that makes it possible to move easily in space and to recognise the parts of one’s own body in all situations. 4. The term ‘body image seems to have been first employed (also originally within neurology and neuropsychology) to incorporate cognitive elements that were excluded from the body schema concept such as wishes, emotional attitudes and interactions with others. Schilder (1935) incorporated visual and emotional aspects, referring to a schema as a “three dimensional image”, a “self-appearance of the body” and a “unit of its own”, influencing mental life by means of the emotional value invested in it. Though Lhermitte (1939) emphasised the representational component, derived from memory, he too preferred the term ‘body image’ to ‘body schema’. The Gestalt oriented neurologist Conrad (1933) regarded body image as a particularly good example of his theory of mental function: the whole (a ‘psychological’ body image) was greater than the sum of the parts (the ‘physiological’ contributions from the various sense organs).
Cognition and body image 221
5. Kinsbourne (1993) has attacked the idea that there is a single system responsible for representing the physical self: “There is no evidence for the existence of body schema in consciousness, bits of which can be nibbled away by disease. There is no set of localized deficits of regional body awareness, which, when fitted together like jigsaw pieces, cover the total body surface. Observation suggests that the representation of somatic awareness is object centred, not space-centred — the ‘objects’ in this instance being the body parts. We cannot achieve a simultaneous ‘over feel’ of all the body. We can, however, shift attention to one or another body part at a time, just like to one or other object in a visual display” (Kinsbourne, 1993, p. 71). Despite the intensity with which Kinbourne asserts his position, the terms ‘body image’ and ‘body schema’ are deeply embedded in psychology and appear to have demonstrable clinical utility. Equally important, Kinsbourne’s argument seems to be more a claim about the how mental representations of body are implemented, rather than a denial that they exist. No-one would say that a document wasn’t represented in the memory of a computer merely because it was spread over a variety of memory locations.
References Altabe, M. & J. K. Thompson (1996). Body Image: A Cognitive Self-Schema Construct? Cognitive Therapy & Research 20(2), 171–193. Andersen, A. E. (1995). Eating disorders in males. In K. D. Brownell, C. G. Fairburn (Eds.), Eating disorders and obesity: a comprehensive handbook, pp. 177–187. New York: Guilford Press. American Psychiatric Association Work Group on Eating Disorders (2000). Practice guideline for the treatment of patients with eating disorders (revision). American Journal of Psychiatry 157 (1 Suppl), 1–39. Beck, A. T., C. H. Ward, M. Mendelson, J. Mock & J. Erbaugh (1961). An inventory for measuring depression. Archives of General Psychiatry 4, 561–571. Bonnier, P. (1905). L’aschématie. Revue Neurologique, 13, 605–609. Bower, F. L. (1977). Normal Development of Body Image. New York: John Wiley Medical Publishers. Bruce, B. & W. S. Agras (1992). Binge eating in females: a population-based investigation. International Journal of Eating Disorders 12, 365–73. Conrad, K. (1933). Der Konstitutionstypus, theoretische Grundlegung und praktische Bestimmung. Berlin: Springer. Eysenck, H. J. & S. B. G. Eysenck (1964). Manual for the Eysenck Personality Inventory. London: ULP. Fisher, S. (1990). The evolution of psychological concepts about the body. In T. F. Cash & Y. T. Pruzinsky (Eds.), Body Images: Development, Deviance and Change. New York: The Guildford Press. Fodor, J. A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press. Gallagher, S. (1986). Body Image and Body Schema: a conceptual clarification. Journal of Mind & Behaviour 4, 541–554.
222 Hanan Abdulwahab El Ashegh and Roger Lindsay
Gallagher, S. & J. Cole (1995). Body Image and Body Schema in a Deafferented Subject. Journal of Mind & Behaviour 16, 369–89. Garner, D. M., M. P. Olmstead & J. Polivy (1983). Development and validation of a multidimensional eating disorders inventory for anorexia nervosa and bulimia. International Journal of Eating Disorders 2, 15–34. Garner, D. M., M. P. Olmsted & J. Polivy (2003). Handbook of the Eating Disorders Inventory. Lutz, Florida: Psychological Assessment Resources, Inc. Gardner-Rick, M. & Yale R. Tockerman (1993). Genetic Social and General Psychology Monographs, 119 (1), 125–145. New York: Heldref Publications. Goldberg, E. (2001). The Executive Brain. New York: Oxford Univesity Press. Gordon, R. A. (1992). Anorexia And Bulimia: Anatomy of A Social Epidemic. Oxford: Blackwell Publishers. Greenfeld, D., D. Mickley, D. M. Quinlan & P. Roloff (1995). Hypokalemia in outpatients with eating disorders. American Journal of Psychiatry 152, 60–3. Groth-Marnat, G. (1990). The handbook of psychological assessment (2nd ed.), New York: John Wiley & Sons. Head, H. (1926). Aphasia and kindred disorders of speech. London, Cambridge: University Press. Head, H. & G. Holmes (1911). Sensory Disturbance from Cerebral Lesions. Brain 34, 102–254. Huenemann, R. L., L. R. Shapiro, M. C. Hampton & B. W. Mitchell (1966). A longitudinal study of gross body composition and body conformation and their association with food and activity in a teenage population. American Journal of Clinical Nutrition 18, 325–38. Kinsbourne, M. (1993). Orientational Bias Model of Unilateral Neglect. Evidence from attentional gradients within hemispace. In I. H. Robertson & J. C. Marshal (Eds), Unilateral Neglect: clinical and experimental studies, pp. 63–85. Hillsdale, NJ: Erlbaum. Kirksey, K. M., B. K. Goodroad, E. A. Butensky & M. Holt-Ashley (2000). Body Dysmorphic Disorder in an Adolescent Male Secondary to HIV-related Lipodystrophy: A Case Study. The Internet Journal of Advanced Nursing Practice 4 (2): 1–14. Accessed from http:// www.ispub.com on 9 Jan 2004. L’hermitte, J. (1939). L’image de Notre Corps. Paris: Nouvelle Revue Critique. Lundholm, J. K. & J. M. Littrell (1986). Desire for thinness among high school cheerleaders: Relationship to disordered eating and weight control behaviors. Adolescence 21, 573–579. Maccoby, E. E. & C. N. Jacklin (1974). The Psychology of Sex Differences. Stanford, CA: Stanford University Press Meenan, S. & R. O. Lindsay (2002). Planning and the Neurotechnology of Social Behaviour. International Journal of Cognition and Technology 1 (2), 233–274. Morselli, E. (1886). Sulla Dismorfofobia e sulla Tafefobia. Bulletino academia della Scienze Mediche di Genova 6, 100–119. Myers, P. & F. Biocca (1992). The Elastic Body Image: The Effect of Television Advertising and Programming on Body Image Distortions in Young Women. Journal of Communication 42(3), 108–133.
Cognition and body image 223
Phillips, K. A. (1996). An open study of buspirone augmentation of serotonin-reuptake inhibitors in body dysmorphic disorder. Psychopharmacological Bulletin, 32, 175–80. Phillips, K. A. (1996b). The Broken Mirror: Understanding and Treating Body Dysmorphic Disorder. Oxford: Oxford University Press. Phillips, K., K. Atala & R. Albertini (1996). Case study: body dysmorphic disorder in adolescents. Journal of the American Academy of Child and Adolescent Psychiatry 34 (9), 1216–20. Schilder, P. (1935). Image and Appearance of the Human Body. London: Kegan Paul. Slade, P. D. (1994). What is body image? Behaviour Research and Therapy 32(5), 497–502. Spearing, M. (2001). Eating Disorders. NIH Publication No. 01–4901. Bethesda, Maryland: NIMH. Spielberger, C. D., H. L. Gorsuch, R. E. Lushene, P. R. Vagg & G.A. Jacobs (1983). Manual for the State-Trait Anxiety Inventory (STAI). Palo Alto, CA: Consulting Psychologists Press. Spitzer, R. L., S. Yanovski, T, Wadden, R. Wing, M. D. Marcus, A. Stunkard, M. Devlin, J. Mitchell, D. Hasin & R. L. Horne (1993). Binge eating disorder: its further validation in a multisite study. International Journal of Eating Disorders 13(2), 137–53. Sullivan, P. F. (1995), Mortality in anorexia nervosa. American journal of Psychiatry 152(7), 1073–4 Teuber, H-L. (1955). Physiological Psychology. Annual Review of Psychology 9, 267–96. Tobin-Richards, M., A. M. Boxer & A. C. Petersen (1983). The psychological significance of pubertal change: Sex differences in perceptions of self during early adolescence. In J. Brooks-Gunn & A. C. Petersen (Eds.), Girls in Puberty: Biological and Sociological Perspectives, pp. 127–154. New York: Plenum. Warrington, E. K. & M. James (1991). Visual Object and Spatial Perception Battery. Bury St Edmonds, UK: Thames Valley Test Company Wooley, O. W. & S. Roll (1991). The Color-A-Person Body Dissatisfaction Test: Stability, internal consistency, validity, and factor structure. Journal of Personality Assessment 56 (3), 395–413.
Looking under the rug Context and context-aware artifacts* Christopher Lueg University of Technology, Sydney
Introduction A rather important expectation in research communities having a strong belief towards technological progress is Weiser’s (1991) vision that “technologies will weave themselves into the fabric of everyday life until they are indistinguishable from it.” The idea is that “embedded and invisible technology calms our lives by removing the annoyances”. A decade later technological progress indeed allows for the development of “intelligent” gadgets that are much smaller and more powerful than the bulky desktop computers that were around when the vision came up. Everyday life is shaped by people and what they do, how they do it, and how they perceive what they are doing. Computers, however, still do not have the intuitive understanding of usage situations humans do naturally have. Computational artifacts that exhibit a notion of context-awareness are expected to address this problem. Attributing context-awareness to computational artifacts means that artifacts are to some extent capable of sensing the context in which they are being used. The idea is that artifacts determine this context and adapt their functionality to what might be helpful in the respective context. According to Gupta et al. (2001), tremendous progress in context-awareness is required in order to achieve invisibility in pervasive computing. The idea of a context-aware mobile phone nicely illustrates the potential benefit of context-aware artifacts. It is easy to imagine a context-aware mobile using context aspects to determine the level of intrusiveness that would be appropriate when trying to notify the user of incoming calls (e.g., Lueg 2001). Notifications could range from ringing (quite intrusive) to buzzing or vibrating (less intrusive). The mobile even might suppress notifications of less important calls (not intrusive at all). One could even imagine that the mobile answers
226 Christopher Lueg
certain calls while presenting others to the user. Context aspects that might be sensed by a context-aware mobile might include the user’s identity, the user’s location, and the user’s current schedule which might be available electronically from his or her personal digital assistant (PDA). Other examples for context-aware artifacts are cooperative buildings, intelligent rooms, personal assistants, etc. Building context-aware artifacts requires operationalizing notions of context. People usually have some kind of intuitive understanding of what context might be and upon request they are able to list a virtually indefinite number of aspects in their environment that they would consider relevant to a given situation. This means that to a man with the intent to build a contextaware artifact, almost every aspect of the surrounding world might appear to be “context”, the matter to be operationalized in the artifact. This is actually a variation of the aphorism “To a man with a nail everything looks like a hammer” which itself is a variation of the well known aphorism “To a man with a hammer everything looks like a nail” (Gorayska and Marsh, 1999). The first expression beautifully captures the problem that is underlying context-aware artifacts: the ‘generation’ of context and the problem of determining observerindependent descriptions that could be operationalized in computational artifacts. Even describing context in computational terms seems to be rather difficult. Drawing from a range of related disciplines I will illustrate that the way humans ‘use’ context is quite different from the way computational artifacts might use ‘context’. A less obvious issue underlying the development of context-aware artifacts is that the generation of context involves a notion of responsibility for the course of action. I will argue that the very idea of contextaware artifacts is closely related to much older ideas about intelligent machines pursued (with limited success) in the realm of classical Artificial Intelligence.
Context-aware artifacts and definitions of context A sound understanding of how context-aware artifacts are typically implemented helps us to understand the notions of context that are operationalized in these artifacts and allows to illustrate what can reasonably be expected in terms of human-like context-awareness. In Lueg (2002c) I have argued that contextaware artifacts can be seen as a subclass of socially adept technologies (Marsh, 1995). It is therefore important to note that not all socially adept technologies need to be context-aware. There may be situations in which designers of socially-adept technologies can exploit the fact that humans are good at creating
Looking under the rug 227
and adapting to situations (consider, for example, how people react when confronted with robots like Kismet or Cog). The 2001 HCI special issue on context-aware artifacts is a rich and highly relevant resource. In the anchor article of the special issue, Dey et al. (2001) start with a definition given in Webster’s Dictionary: “the whole situation, background or environment relevant to some happening or personality” and argue that this definition is too general to be useful in context-aware computing. After considering a number of definitions they finally come up with a definition of context that is based on information that characterizes a situation, and that is relevant to the interaction between a user and his or her application: “Any information that can be used to characterize the situation of an entity, where an entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and the application themselves. Context is typically the location, identity and state of people, groups and computational and physical objects.” (Dey et al., 2001, p)
Similar to many other definitions of context in the technically oriented literature, the definition suggests that context is understood as a kind of model or representation of a particular type of situation. The term ‘situation’ seems to comprise ‘everything’, whereas ‘context’ only consists of specific aspects that are ‘distilled’ from a particular situation. Examples for such aspects listed by Dey et al. (2001) include location, identity and state of people, groups and computational and physical objects. Hull et al. (1997) mention identity, locations, companions, vital signs, air quality, and network availability as examples of context aspects. The underlying assumption seems to be that such aspects can be used to identify a user’s current situation, which means it is assumed that the context aspects characterize that situation. Elsewhere (e.g., Lueg 2002a) I have discussed that there may be significant differences between what designers of context-aware artifacts define as context in verbal descriptions and what is actually operationalized in context-aware artifacts. These differences directly impact the capabilities of context-aware artifacts, as capabilities of artifacts depend on the context models that are actually implemented.
Artifacts, context and situations In what follows I try to explain from a number of different perspectives, such as epistemology, sociology and phenomenology, why it is so difficult to implement
228 Christopher Lueg
context-awareness in artifacts. From a logic-oriented perspective, the problem of defining context in computational terms is related to the frame problem (e.g., Pylyshyn, 1987) in classical, representation-based Artificial Intelligence (AI). Roughly, the frame problem is about what aspects of the world would have to be included in a sufficiently detailed world model, and how such a world model could be kept up-to-date when changes occur. The frame problem has been under investigation for more than two decades and it seems to be reasonable to state that the frame problem is intractable in realistic settings (e.g., Dreyfus, 2001). The frame problem is often considered a more technical problem as it is about keeping models of the world up-to-date. However, the frame problem can also be interpreted from an epistemology-oriented point of view, in the sense that a world model defines what is “known” about the world. Then the frame problem is also an epistemological problem as richness of the model determines what can be inferred based on the model: aspects of the world not included in the model and not derivable from the model do not exist in the world of the model. Another major problem is defining context in a precise and, in particular, an observer-independent way. One of the main reasons is that situations are not given but negotiated among the persons involved in the situation. Agre (2001) discusses how people use the various features of their physical environment as resources for the social construction of a place, i.e., it is through their ongoing, concerted effort that the place — opposed to space — comes into being. An artifact will be incapable of registering the most basic aspects of this socially constructed environment. Context-aware artifacts may fail annoyingly as soon as a system’s (wrong) choices become significant. Dourish (2001) discusses from a phenomenology-oriented point of view how meaning arises in the course of action: the meaning of a technology is not inherent in the technology but arises from how that technology is used. Designers may influence how artifacts are being used but they have no absolute control. Humans are in principle conformists, flexible tool-makers and users (Gorayska and Mey, 2002). In more practical terms, this means that people may use an artifact in a way that is different from what has been envisioned by the artifact’s designer. This is important as artifacts embed certain assumptions and this holds for context-aware artifacts as well. Furthermore, the use of artifacts is socially negotiated. Cars or powerful computers on one’s desktop can be used as examples for illustrating how the use of artifacts may be (re-)negotiated. Both artifacts can be used as effective tools for transporting things and processing
Looking under the rug 229
data, respectively, but both may also function as status symbols (Wenger, 1998). Context-aware artifacts, however, would not be involved in such negotiations, which means that they are hardly able to recognize the outcome. Accordingly, artifacts would not be aware of what they represent and what other artifacts represent. A practical example is a socially adept agent (Marsh, 1995) that may not be aware of the above mentioned social status of a car. In a discussion with humans, the agent might treat a specifically equipped sports car as if it were a regular car. The sports car actually is a car but treating it as such may be embarrassing in certain situations. This discussion points to the current understanding that, contrary to artifacts, humans are “situated” in their physical and social environment. The term “situated” has its origins in the sociology literature in the context of the relation of knowledge, identity, and society (Clancey, 1997a). In respect to the core aspects of situatedness, researchers from fields as different as ethnomethodology, cognitive science, and anthropology are arguing in a similar direction although individual positions may still vary significantly. Suchman (1987, 1993), for example, investigated situational aspects of human behavior and has shown that the meaning of situations (and thus the significance of actions) is generated rather than given. The coherence of situated action is tied in essential ways to local interactions contingent on the actor’s particular circumstances. Clancey (1997a) argued in a similar direction by emphasizing the relation of perception, action, and knowledge. He claims that every human thought and action is situated, because what people perceive, how they conceive of their activity, and what they physically do develop together. Lave (1991) emphasized that perception, action, and even knowledge, have to be considered in relation to identity, and culture. Lave actually proposed substituting the term “situated activity” for “socially situated practice” or, where appropriate, “situated learning” in order to stress that perception and activity are tightly bound to culture and identity. In these days, using the term “situated” is a bit complicated as the term is used in a variety of different meanings in the literature. Clancey (1997a, p. 23) explained that “in particular, the overwhelming use of the term in Artificial Intelligence research since the 1980s has reduced its meaning from something conceptual in form and social in content to merely ‘interactive’ or ‘located in some time and place’”. Even in non-traditional AI research the term “situated” is used in varying ways. In a discussion centered around embodiment, Dautenhahn et al. (2002) concluded that “the concept of situatedness can easily be applied to the social domain, by extending the physical environment to the social environment. A
230 Christopher Lueg
socially situated agent acquires information about the social, as well as the physical domain through its surrounding environment, and its interactions with the environment may include the physical as well as the social world.” Considering the origins of the term “situated” the notion “socially situated” is a pleonasm, indicating that still much work is needed to combine the different research directions. The difference between physical and social aspects matters in the context of context-aware artifacts, in particular, as sensing physical aspects of the environment is typically much easier than ‘sensing’ the social construction of the world. Robertson’s (2000) study of the social construction of a business situation can be used to illustrate the difference. Conducting a workplace observation in a software company, Robertson attended weekly meetings over a period of seven months, making separate video and audio recordings of relevant meeting activities. One of the questions to be answered was what designers actually do during these meetings: “Amongst the talk, laughter and other activities, there was clearly a pattern to each meeting. Individuals reported what they had done while apart. Others would ask questions and each person’s work would be discussed by the group. Then another person would report on her work. This process continued until everyone, who had worked on the project through the week, had told the others what she had done. Reporting was always followed by a period of shared designing, where the group worked together on some aspect of the design. Then, towards the end of the meeting, the work for the next week would be negotiated and allocated.” (Robertson 2000, p. 126)
Robertson notes that from an observer’s perspective it was easy to divide the group’s meeting into different stages, such as reporting, discussion, shared design, negotiations of future work, and finally allocation of work. One of the central findings of the workplace observation, however, was that the participants in the process did not describe their work with such labels: “[…] they did not bother with names for specific stages in their work, as they lived it, at all.” (Robertson 2000, p. 126) Robertson concludes: “[…] naming the stages in the design work in this way excludes entirely the work of coordination and negotiation that made the process they represent possible in the first place. Moreover, this communicative work had been identified by the designers themselves as the work they most wanted supported.” (Robertson 2000, p. 126)
Looking under the rug
The most important point for this chapter about context-aware artifacts is how the process was going on: “[…] people did all these kinds of cooperative design work while sitting round a table talking together. At times they moved around the room, entered or left the room and moved various objects around; but there were no formal changes of position, no discernible interactional difficulties and certainly no upheaval when they changed from one kind of work to another. […] Whatever they did was always accomplished by different combinations of their purposeful, embodied actions.” (Robertson 2000, p. 126)
The important point here is that the business situation changed although most ‘context indicators’ that could be sensed by technical artifacts did not appear to undergo any recognizable changes. Robertson’s study demonstrates that situations are negotiated among those participating in the situation. This means that even if a particular situation meets the description of a “business meeting context” at some stage, the situation may change into an informal get together and vice versa. The idea of a context-aware meeting room (or some other kind of “intelligent” room) can be used to illustrate why the re-negotiation of situations is important in the context of this paper. Using currently available technology, such as room could possibly sense many aspects, such as electronic schedule, the number of persons in the room, and the prevailing clothing. Based on these information, the room could compute that the “current context” is a “business meeting context” (and not “morning tea” or an “unplanned, informal gettogether”) and could instruct attendees’ mobile phones not to disturb the meeting; business-related information like the latest share prices could be projected onto the room’s multi-purpose walls, and so on. The problem is that changes to the situation as subtle as those observed by Robertson are hardly recognizable for currently available technology: the (defined) “business meeting context” would not change while the situation as experienced by those involved would. This means that the once “formal business meeting” may have changed into an informal get together and vice versa, unrecognized by the intelligent room. The difference seems to be minor but once the meeting’s nature has changed, for example, it may no longer be appropriate to project business-related information on walls (as it would be embarrassing to demonstrate that the hosting company’s fancy technology did not recognize this simple change in the meeting situation). Elsewhere (Lueg, 2001 & 2002a) I have argued that it is, in particular, the social connotation of the term “situated” that allows us to highlight the differences
231
232 Christopher Lueg
between “context” as implemented in context-aware artifacts and the “situation” that is modeled. I understand a “situation” as a potentially unlimited resource that is continuously interpreted and re-interpreted in the course of action. I understand Clancey’s (1997b) statement that situations are conceptual constructs, not places or problem descriptions as support for the cognitive aspect of my definition. The observations made during Robertson’s (2000) study stress the importance of the negotiation aspect. Situations are also observerrelative, by which I mean that there is no single observer who defines what constitutes a situation. Those who are involved in a situation create and maintain their own interpretations of the situation by using the situation as resource. Again, Robertson’s (2000) observations of the “stages” she as observer could identify can be used to illustrate the importance of being-involved in a situation. By contrast, the notion of context as operationalized in context-aware artifacts is an expression of a certain interpretation of a situation, a model of a situation. Such a model is observer-dependent and not part of the unfolding situation. Thus the model is no longer open to re-interpretation: the meaning of aspects included in a model is more or less determined. This lack of openness to re-interpretation matters as (individual) participants may decide to assign significance to aspects of the environment not considered significant by the model’s designers. As mentioned before, context is about what people consider relevant in a situation. Winograd (2001) summarizes the context “problem” as such that features of the world become context through their use: something is not context because of its inherent properties but because of the way it is used in (human) interpretation. Dreyfus (2001) has argued that it should be no surprise that no one has been able to program a computer to respond to what is relevant, as human beings respond only to changes that are relevant given their bodies and their interests. To sum up, I do not see much that suggests that artifacts will soon become context-aware in the sense that they will be able to recognize situations in a non-trivial way. As Erickson (2002) put it: context-awareness exhibited by people appears to be quite different from what can be implemented in computational systems. This does not question, however, the value of research on context-aware artifacts. Goodwin and Duranti (1992, p. 2) have maintained that “it does not seem possible at the present time to give a single, precise, technical definition of context, and eventually we might have to accept that such a definition may not be possible”. They have also noted, however, that providing a formal, or simply explicit, definition of a concept such as context can lead to
Looking under the rug 233
important analytic insights because such a definition can expose inconsistencies and insights that were not visible before. Considering these problems, it is little surprising that the “artificial intelligence problem” is still largely unsolved. As Michael Dertouzos, Director of the Laboratory for Computer Science at MIT, pointed out in July 2000: “The AI problem, as it’s called — of making machines close enough to how human beings behave intelligently-…has not been solved. Moreover, there’s nothing on the horizon that says, I see some light. Words like ‘artificial intelligence’, ‘intelligent agents’, ‘servants’ — all these hyped words we hear in the press — are statements of the mess and the problem we’re in.” (quoted in Dreyfus, 2001, p. 8). Dertouzos’ statement stresses the importance of carefully looking at what can reasonably be expected from today’s — and tomorrow’s — technology.
Looking under the rug It is interesting to note that a number of issues discussed in the previous section seem to emerge again and again. Questions concerning the modeling of context and the explaining of inferences (see below) have already been discussed during the Seventies and Eighties in the context of artificial intelligence and expert systems. A more recent emergence of these questions could be observed during the hype of intelligent software agents and personal assistants in the midNineties. Interestingly, none of these technologies have delivered what their proponents envisioned. Although it was expected that “expert systems change the way businesses operate by altering the way people think about solving problems” (Harmon and King, 1985, p), it is “fair to say that the vaunted potential of expert systems has never been realized.” (Davenport and Prusak, 1998, p). A few so-called expert systems are around in these days but these systems typically operate in rather constrained settings; the idea of expert systems as universally applicable problem solvers and replacements for human expertise has largely been abandoned. Interestingly, the idea of expert-computers is still popular: the news magazine Newsweek reports in its September 2002 issue that according to a poll conducted by George Washington University, experts expect that in 2008 “expert system software competes with lawyers, doctors and other professionals” (Foroohar, 2002, p. 67). A recent review of promises made during the early software agent hype was disillusioning as well: “[…] not much discernible progress has been made post 1994 [the year in which the popular ACM special issue on software agents was published],
234 Christopher Lueg
perhaps because researchers have failed to address the practical issues surrounding the development, deployment and utilization of industrial-strength production systems that use the technology. We note that once greater effort is placed on building useful systems, not just prototype exemplars, the real problems inherent in information discovery, communication, ontology, collaboration and reasoning, will begin to be addressed.” (Nwana and Ndumu, 1999). In what follows, I discuss a few issues concerning what is being implemented in context-aware artifacts and other technologies incorporating models of human behavior. For example, Bellotti and Edwards (2001) outline that in many situations where context-aware systems have been proposed or prototyped, human initiative is frequently required to determine what to do next. They conclude that intelligibility and accountability are two key features which must be supported by context-aware systems so that users may make informed decisions based on context. Intelligibility means that context-aware systems should be able to present to their users what they know, how they know it, and what they are doing about it. Accountability means that systems must enforce user accountability when they seek to mediate user actions that impact others. As outlined in Lueg (2001), I believe that providing for intelligibility and accountability will help gain a better understanding of responsibilities involved in the design of context-aware artifacts. It is questionable, however, whether intelligibility and accountability help overcome the inherent limitations of context-aware artifacts. Similar demands have been discussed extensively in the artificial intelligence field in the context of expert systems and robotics. The problem is that explaining inferences works best if concerned with rather simple settings, and is increasingly difficult the more complex the setting is. Projecting future implications of proposed actions is even harder. Now the crux is that Bellotti and Edward’s (2001) demands for intelligibility and accountability are only necessary in settings that are already so complex that context-aware artifacts are no longer able to — or not allowed to — make decisions on their own, i.e., without human supervision. Accordingly, demands for intelligibility and accountability are likely to be intractable when applied to complex real world settings. Scenarios from the realm of robotics can be used to illustrate practical impacts of this intelligibility issue. Robots like the ones discussed below can be seen as socially adept technologies. As mentioned before, socially adept technologies are closely related to the idea of context-aware artifacts and arguably the robots discussed below would need to be capable of context-awareness. The large mobile robot described by Brooks (2002) is expected to be capable of
Looking under the rug 235
negotiating who goes first in a tight corridor. It is expected that the robot will understand the same natural head, eye, and hand gestures people usually understand. In a situation in which the robot fails to understand certain gestures, accounting for intelligibility would mean that the robot starts dumping lists of sensor readings and inferences (or more aggregated information like ‘I sensed that you moved your head such and such. According to rule 123 this means xy’) because the robot’s designer hopes it helps the confronted user understand why the robot failed to understand his or her gestures. Another scenario from the realm of robotics is the robotic cab driver featured in the science fiction movie “Total Recall”. Upon arrival on the planet Mars, the hero (played by the actor Arnold Schwarzenegger) is chased by some evil guys. With a bit of luck, he makes it into a fully automated cab. The robotic driver recognizes a new guest having entered his cab and starts querying for a destination. Being on the run, the hero cries something like “just go”. The robot, however, has not been programmed to understand such utterances and accompanying panic gestures. Rather than just going ahead, the robot starts nagging for a destination. Completely unaware of the dramatic nature of the situation, the robotic driver wastes valuable time. The hero finally resolves the situation by kicking the robotic driver out of the cab. In both situations, it is questionable whether intelligibility would help resolve the problematic situation. Furthermore, the ‘robotic cab driver’ scenario is also a nice example of the difficulties designers face when preparing artifacts for real world situations as it is a situation apparently not considered by the robot’s designer. The problem is not that such an ‘escape situation’ is not exactly a common situation but that context-aware artifacts are based on predefined context models. This means that the designers of such robots would have to foresee all possible situations — from escape situations to other emergency situations. Considering these issues my conclusion was, and still is, that designers of context-aware artifacts should take care that users are able to overrule a context-aware artifact in such a way that the artifact’s behavior does no longer interfere with the situation negotiated among those participating in a situation. Mobile phones can be switched off but more artifacts as complex as mobile robots may be more difficult to ‘overrule’. The challenge, still, is making artifacts both easier to comprehend and easier to use.
236 Christopher Lueg
Conclusions In this chapter I have looked at the limitations of context-aware artifacts and I have discussed some of the implications of these limitations. Exposing limitations does not mean that work on context-aware artifacts may not be valuable; building context-aware artifacts complements more theoretical research into context and may help contribute to gaining a better understanding of the complexity of human behavior and human social life. In this sense, this work is as valuable as work on robotics, which is also increasing our understanding of the amazing complexity of human beings. From a more practical point of view, considering the limitations of the state-of-the-art in context-awareness suggests that we need to be very careful when designing such technologies as they are more likely to fail than to succeed when trying to recognize situations. Socially responsible design (Lueg, 2001) means it should always be possible to ‘overrule’ decisions made by contextaware artifacts. Many of such problems could be circumvented, however, if humans were kept ‘in the loop’ (Erickson 2002; also Lueg, 2002c, d). With regard to the general idea of context-aware artifacts, it is still unclear if such artifacts would actually be able to deliver the benefit expected. In most cases, people are well aware of their situation and have quite some expertise in using artifacts in appropriate ways (e.g., there is no real need to have contextaware mobile phones as most people turn off their mobiles anyway during a theater audience because they know that mobiles ringing during theater audiences are annoying). People are also good at recognizing changes to situations, as they are participating in the negotiations that lead to these changes. From a human-computer interaction (HCI) point of view, the question therefore is what the benefit is of making artifacts context-aware over making artifacts easier to use? (Lueg, 2001). In a way, the context-aware artifacts hype (and parts of the closely related ubiquitous computing idea) can be seen as the latest (but almost certainly not the last) wave of (classical) artificial intelligence. Broadly speaking, AI can be seen as the approach to using technology to model and replicate human intelligent behavior in such ways that machines work as if they were human. Research in this area tends to focus on pre-planned behavior and fixed meanings, at the expense of the situatedness of human action and the re-negotiation of situations. AI had to learn the hard way that human behavior does not only involve “thinking” but also acting and “being in the world” (Clark, 1997), and has moved from modeling cognitive processes in isolation to modeling of
Looking under the rug 237
behaviors in situ. This is where AI meets context-aware artifacts, ubiquitous computing and the idea of technology “calming our lives by removing the annoyances” (Weiser, 1991). Many researchers in context-aware artifacts and ubiquitous computing do not consider their work as AI research but a closer look reveals that these research directions address quite a few issues that traditionally were investigated in AI. As a consequence, today’s researchers may run into problems, such as the frame problem or the problem of reliably predicting human behavior, that have been haunting AI researchers for decades (see Lueg, 2002c, d for a more detailed discussion).
Note * This chapter is based on (Lueg, 2002b) work presented at the workshop “The Philosophy and Design of Socially Adept Technologies” at the ACM SIGCHI Conference on Human Factors in Computing Systems (Minneapolis, MN, USA, April 2002). Discussions at the workshop helped shaping a number of arguments. The author is grateful to Barbara Gorayska and Toni Robertson for lots of stimulating discussions and to the anonymous reviewers for insightful comments on the draft version of this chapter. The rug metaphor is pirated from Tom Erickson’s CACM article.
References Agre, P. E. (2001). Changing places: contexts of awareness in computing. Human-Computer Interaction 16(2–4), 177–192. Bellotti, V. & K. Edwards (2001). Intelligibility and accountability: human considerations in context aware systems. Human-Computer Interaction 16(2–4), 193–212. Brooks, R. (2002). Humanoid robots. Communications of the ACM 45(3), 33–38. Clancey, W. (1997a). Situated cognition. Cambridge: Cambridge University Press. Clancey, W. (1997b). The conceptual nature of knowledge, situations, and activity. In P. Feltovich, R. Hoffman, & K. Ford (Eds.), Expertise in context, pp. 247–291. The AAAI Press. Clark, A. (1997). Being there. Cambridge, Mass: MIT Press. Davenport, T. H. & L. Prusak (1998). Working knowledge. Boston, Mass: Harvard Business School Press. Dautenhahn, K., B. Ogden & T. Quick (2002). From embodied to socially embedded agents — implications for interaction-aware robots. Cognitive Systems Research. Special Issue on Situated and Embodied Cognition. Amsterdam: Elsevier. In print.
238 Christopher Lueg
Dey, A. K., D. Salber & G. D. Abowd (2001). A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction 16(2–4), 97–166. Dourish, P. (2001). Seeking a foundation for context-aware computing. Human-Computer Interaction 16(2–4), 229–241. Dreyfus, H. (2001). On the Internet. London. New York: Routledge. Erickson, T. (2002). Some problems with the notion of context-aware computing. Communications of the ACM 45(2), 102–104. Foroohar, R. (2002). Life in the grid. Newsweek pp. 60–67. Goodwin, C. & A. Duranti (1992). Rethinking context: An introduction. In A. Duranti & C. Goodwin (Eds.), Rethinking context: Language as an interactive phenomenon. Cambridge [England]; Melbourne: Cambridge University Press. Gorayska, B. & J. Marsh (1999). Investigations in cognitive technology. In B. Gorayska, J. Marsh & J. Mey (Eds.), Humane interfaces: questions of methods and practice in cognitive technology, pp. 17–43. Amsterdam; Oxford: Elsevier/North Holland. Gorayska, B. & J. L. Mey (2002). Pragmatics of technology. International Journal of Cognition and Technology 1(1), 1–21. Gupta, S., W. C. Lee, A. Purakayastha & P. Srimani (2001). An overview of pervasive computing. IEEE Personal Communications 8–9. Harmon, P. & D. King (1985). Expert systems: Artificial intelligence in business. New York: J. Wiley. Hull, R., P. Neaves & J. Bedford-Roberts (1997). Towards situated computing. In Proceedings of the First International Symposium on Wearable Computers (ISWC ’97), pp.146–153. IEEE. Lave, J. (1991). Situated learning in communities of practice. In L. B. Resnick, J. M. Levine & S. D. Teasley (Eds.), Perspectives on Socially Shared Cognition, pp. 63–82. American Psychological Association, Washington, DC, USA. Third Printing April 1996. Lueg, C. (2001). On context-aware artifacts and socially responsible design. In W. Smith, R. Thomas & M. Apperley (Eds.), Proceedings of the Annual Conference of the Computer Human Interaction Special Interest Group of the Ergonomics Society of Australia (OZCHI 2001), pp. 84–89. ISBN 0–7298–0504–2. Lueg, C. (2002a). Operationalizing context in context-aware artifacts: benefits and pitfalls. Informing Science 5(2), 43–47. ISSN 1521–4672. Lueg, C. (2002b). Looking Under the rug: on context-context aware artifacts and socially adept technologies. Proceedings of the Workshop “The Philosophy and Design of Socially Adept Technologies” at the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2002). National Research Council Canada NRC 44918. Lueg, C. (2002c). On the gap between vision and feasibility. Proceedings of the International Conference on Pervasive Computing (PERVASIVE 2002). Lecture Notes in Computer Science (LNCS) 1414, pp. 45–57. Berlin; Heidelberg: Springer. Lueg, C. (2002d). Representations in pervasive computing. Paper presented at the Inaugural Asia Pacific Forum on Pervasive Computing, 31 October – 1 November 2002, Adelaide, Australia. Paper available at http://www.staff-it.uts.edu.au/~lueg/abstracts/inauguralforum02.html. Marsh, S. (1995). Exploring the socially adept agent. Proceedings of the First International Workshop on Decentralized Intelligent Multi-Agent Systems (DIMAS ’95), pp. 301–308.
Looking under the rug 239
Nwana, H. & D. Ndumu (1999). A perspective on software agents research. Knowledge Engineering Review. Pylyshyn, Z. (Ed.) (1987). The robot’s dilemma: the frame problem in artificial intelligence. Norwood, N.J: Ablex Publishing Corporation. Robertson, T. (2000). Building bridges: negotiating the gap between work practice and technology design. Human-Computer Studies 53, 121–146. Suchman, L. (1987). Plans and situated actions — the problem of human-machine communication. New York: Cambridge University Press. Suchman, L. (1993). Response to Vera and Simon’s situated action: a symbolic interpretation. Cognitive Science 17, 71–75. Wenger, E. (1998). Communities of practice: Learning, meaning, and identity. Cambridge: Cambridge University Press. Quotes from first Paperback Edition 1999. Weiser, M. (1991). The Computer for the 21st Century. Scientific American, 265(3), 66–75. Reprinted in IEEE Pervasive Computing 1(1), 19–25, 2002. Winograd, T. (2001). Architectures for context. Human-Computer Interaction 16(2–4), 401–419.
Body Moves and tacit knowing* Satinder P. Gill Centre for the Study of Language and Information (CSLI), Stanford University, USA
Introduction Body Moves are rhythmic coordinations in communication of at least two people. In performing them, we indicate the state of our connection and understanding, most importantly the degree of our contact and commitment within a communication situation (Gill, Kawamori, Katagiri, Shimojima, 2000). Tacit knowing is the unspoken dimension of human knowledge, formed in practice or experience and our personal self, with others. It is essential for skilled performance. Body Moves enable the formation of tacit knowing, and it’s performance, in communication. In this chapter we develop this relation between body and cognition through three examples of collaborative design activities, taken from an ethnographic study of landscape architects (UK) (Gill, 1997) and a project on conceptual design activity in the interactive workspaces lab at Stanford University (Gill, 2002; Gill and Borchers, 2003). The analyses of these design activities involves the study of the interaction between people and the objects they manipulate, a core concern of Cognitive Technology (Gorayska and Mey, 1995) that addresses how external and social environments shape cognition. Salient concepts are practice, experience, tacit knowing, representation, parallel-coordinated action, and co-ordinated autonomy.
Practice and experience In the winter of 1996/1997, Gordon, an apprentice landscape architect with company ‘BETA’, sent a set of completed coloured maps that he had made at the company’s Welsh office, to John, a senior architect based at its headquarters located in North England. The company was going to make a bid for project work to reshape a major road in North Wales where the frequency of traffic
242 Satinder P. Gill
accidents was high, and these coloured maps were part of the depiction of the changes to the road design and effects upon the landscape. For example, colours depicted old woodland and new woodland. To Gordon’s surprise, John judged the colours that he had used to be ‘wrong’ and that the maps needed to be correctly recoloured. Company BETA had barely two weeks left to submit their bid and re-colouring all these maps was no small task. John brought in other experienced landscape architects at his branch to help, and asked Gordon to travel up from Wales and re-colour the maps with them. It was felt that Gordon lacked experience and the only way he was going to get it was by experiencing the doing of colouring in a shared practice. The problem of ‘seeing’ the colours was partly due to the company’s economic condition. BETA was downsizing, as a result of which Gordon was the sole landscape architect left at the Welsh branch. Architects, however, do not interpret the material in isolation when they first handle it. In talking aloud and moving pens over paper, they engage the other person(s) in their conceiving. This, it is suggested enables one person to adapt upon another person’s view, producing the conditions for a coherent development of the design (Gill, 1997), and a process for ‘seeing-as’ (interpretation) until they come to ‘see’ (unmediated understanding) (Tilghman, 1988). This is likewise with colouring activity: as the apprentice colours with the team and more experienced architects, he/she learns how they select, for example, a specific shade of blue to set against a particular shade of green (‘seeing-as), to create a ‘pleasing effect’ that ‘looks professional’ (Gill, op.cit). Because of the distance between the two branches and because of their commitments, John had been unable to visit Gordon and work with him. Instead, he had sent him a set of previously coloured maps (examples of experience), colour coded keys, and a set of instructions. These are descriptive and propositional forms of expression, all located in the experience of the architects at the North England Branch. For Gordon, they are outside his experience, and he brings his own to bear in interpreting these fragmented representations of practice. In his study of how a team of geophysicists judge when material fibres in a reaction vat are jet black, Goodwin (1997) shows how simply saying ‘jet black’ is not sufficient for helping an apprentice measure and make this judgement competently. Rather, the ‘blackness of black’ is learnt through physically working with the fibre, and in talking about the experience, “transforming private sensations and hypotheses into public events that can be evaluated and confirmed by a more competent practitioner”. Geochemists use their bodies as
Body Moves and tacit knowing 243
‘media that experience the material’ being worked with through a variety of modalities. In the case of the apprentice, Gina, in Goodwin’s study, her interlocutor’s ability to recognize and evaluate the sensation she is talking about requires co-participation in the same activity. The example of Gordon’s ‘failure’ to correctly interpret the forms of expression sent to him, is an example of how breakdown can take place when coparticipation is missing from the interpretation process, and how essential it is for repair within a distributed apprenticeship setting. Knowledge becomes clearly more than a matter of applying learnt rules, but of learning ‘rule-following’ (Johannessen, 1988) within the practices that constitute it. The need for him to colour with the other architects in order to be able to correctly interpret any such future fragments that might be sent to him, shows that experiencing in co-presence has powerful tacit information. Gordon’s acquired knowledge will be evident in his skillful performance of these forms of expression. The equivalence in meaning of ‘forms of expression’ and ‘representations of practice’ denotes a range of a range of human action, artifacts, objects, and tools. Human action includes cues, which may be verbal, bodily, of interaction with a physical material world (tools, e.g., pens, light tables, etc.), and construction of the physical boundary objects (e.g., colour, maps, sketches, masterplan sketches, masterplans, plans, functional descriptive sketches, photographs, written documents, etc.). The dilemma of the distributed setting is that even in the future, any interpreting or understanding that Gordon, as an apprentice, does of similar or different fragments of knowledge, will still take place in isolation, and the feedback from his local colleagues will be based on their ‘seeing-as’ (interpretation based on their experience) and not ‘seeing’ (as they lack sufficient skill in this domain to ‘understand’ without interpreting.) In the rest of this chapter, we develop an analysis of how experiencing the performances of representations of practice and moving with these representations in a joint design activity consists in specific types of behavioural alignments between actors in an environment, that we call Body Moves. This analysis builds on previous work (Gill, 2002; Gill and Borchers, 2003a) by developing the relationship between Body Moves and tacit knowing. This will help to better understand, conceptually and practically, how Body Moves facilitate knowledge transformation and knowledge acquisition. Further, some findings are presented from the study at Stanford that show how the use of artifacts, that do not permit designers to act at the surface (e.g., drawing) at the same time, i.e., in parallel, inhibit collaborative activity. Body Moves that have
244 Satinder P. Gill
a parallel movement structure are termed Parallel Coordinated Moves (PCM). They embody autonomy, hence we analyse how coordinated autonomy is part of tacit knowing and is managed in collaboration. In this research, ‘knowledge’ is considered as a process that is dynamically represented in actors’ behaviours with each other and with tools, technologies and other artifacts within an environment. These behaviours involve the senses of touch, sound, smell, and vision. The motivation behind this perspective of knowledge is to understand how we form and transform it in communication. This is located within a framework that sees cognition as a dynamic system that co-evolves and emerges through the interaction of mind, body, and environment. The co-evolution includes Body Moves (Kinesics and Kinaesthetics) that give the cognitive dynamic system meaning.
Body Moves A focus on the body, specifically ‘body moves’, originated as an attempt at expanding into the area which has been called ‘pragmatic acts’, and thereby widening the ‘narrow conception of strict natural language pragmatics’ (Gill, Kawamori, Katagiri, Shimojima, 2000). Non-verbal communication has already gained strong ground in its significance for understanding the ‘human interface, the point at which interaction occurs’ (Gill et al., 2000), and thereby the design of interactive systems. Body Moves provide us with a further insight into the nature and operation of ‘co-presence’ (Good, 1996), which is an essential component of human understanding. Co-presence denotes simply, how we are present to each other, be this in the same physical space or in differing physical spaces (e.g., computer mediated spaces, or mobile technology mediated spaces). Being present may be described as a precondition for communication, and the nature of this precondition has a bearing upon how we coordinate with each other. Body Moves are coordinated rhythms of body, speech, and silence, performed by participants orienting within a shared activity. These rhythms create what we term ‘contact’, i.e., a space of engagement between persons, and take sequential and parallel forms. These rhythms are described as being behavioural alignments and they occur at the level of meta-communication (Allwood et al., 1991; Scheflen, 1975; Bateson, 1955; Shimojima et al., 1997). Body Moves are a special case of information flow in dialogue and are considered as a form of interactional synchrony (Birdwhistle, 1979; Kendon, 1970), and as metapragmatic (Mey, 2001). Drawing upon the idea of the composite signal (Clark, 1996;
Body Moves and tacit knowing 245
Engle, 1998), these Body Moves have been conceived as Composite Dialogue Acts, formed of various combinations of gesture, speech, and silence (Gill et al., 2000). Our work on Body Moves indicates the construction/establishment of mutual ground within a space of action. By ‘Body Move’ we do not refer to the physical movement, rather, we target the act that the movement performs.
Metacommunication In the early work on Body Moves (Gill et al., 1999, 2000), their metacommunicative quality was located within a framework drawn from conversation theory, where information is conveyed upon the triggering of cueing facts that convey a variety of information about the conversation situation (Shimojima et al., 1997), rather than its content. Such cueing facts are fillers and responsives. These function as discourse markers, the particular nature of which can be identified by prosody and ‘phoricity’ (Kawamori et al., 1998). Such interjections in speech determine discourse structures and the nature of the co-ordination taking place (Schiffrin, 1987; Kawamori et al., 1998). Body Moves were seen to have the quality of a cueing fact. However, this posed a challenge for handling rhythmic coordinations that did not fit the sequential structure of a cueing system. In Gill (2001), a non-sequential rhythmic coordination, called the Parallel Coordinated Move (PCM), was further analysed by drawing upon joint activity theory (Clark, 2000) and synchronous communication studies of body and speech coordination (Kendon, 1970). Recently, in Gill (2003c), a visit back to Scheflen (1975) and Bateson’s (1955) formative work has lead to a further understanding of how the various forms of Body Moves are meta-communicative. To explain Bateson’s (1955) formulation of metacommunicative behaviour, Scheflen considers the relation between kinesics and language: the former (kinesics) can ‘qualify or give instructions’ about the latter (language) in a relation that Bateson called metacommunicative, whereby the ‘movement of the body helps in clarifying meaning by supplementing features of the structure of language’ (op. cit., p. 11). Body Moves show just this, and contribute further to the idea that the structure of language lies in its performance. The theory of Body Moves is intrinsically about meta-communication.
Body Moves and the tacit dimension In performing Body Moves we engage with the representations of the tacit dimension of another’s actions and move with them, for example, in a design
246 Satinder P. Gill
activity, or to form a shared identity. Action is the performance, whilst its tacit dimension, is its basis that is sensed, grasped, responded to. The representation of the tacit dimension of action is the structure of the form of its expression. In engaging with the representation of the tacit dimension of another’s actions, we are resonating with the communicative structures being performed. In order to be able to do so, we both draw upon experience and are experiencing in the same moment. It may help here to turn to Polanyi’s discussion of the body and tacit knowing (Polanyi, 1966), particularly in learning tasks that involve the body, such as playing chess, dance, etc. He describes a skilled human performance as a comprehensive entity, and when two people share the knowledge of the same comprehensive entity, two kinds of ‘indwelling’ meet. “The performer co-ordinates his moves by dwelling in them as parts of his body, while the watcher tries to correlate these moves by seeking to dwell in them from outside. He dwells in these moves by interiorising them. By such exploratory indwelling the pupil gets the feel of a master’s skill and may learn to rival him” (Polanyi, op.cit., p. 30´). Gordon and Gina are undertaking such exploratory indwelling in discerning colours through movement and their sense, in order to share the knowledge (experiential) of the same comprehensive entity (e.g., colour black, or aesthetic judgment). For Polanyi, the body is the ‘ultimate instrument of all external knowledge’, and ‘wherever some process in our body gives rise to consciousness in us, our tacit knowing of the process will make sense of it in terms of an experience to which we are attending’. In performing Body Moves, the ability to grasp and sense someone’s motions, and respond to them appropriately (skillfully) is based on experience (tacit knowing of the process) and experiencing (experience to which we are attending). It is spontaneous action. Further, tacit knowing has two terms, proximal, that includes the particulars, and distal, their comprehensive meaning. A simple example is a wood full of trees: a wood is the distal term, and the trees, the proximal term consisting of particulars (trees). When we look at a wood, we are aware of its trees but do not look at each specific tree in order to understand that this is a wood in front of us. How do we achieve this understanding of the entity, the wood. Polanyi develops a theory about the processes of such understanding or comprehensive meaning, called tacit knowing. We draw upon this to consider how we resonate with structures in communication. Polanyi has a specific meaning in using the word ‘comprehension’. A comprehensive entity has a number of levels that form a hierarchy. These levels are structures of sets of particulars that constitute the entity. Each level (set of particulars) has its own principles that operate under the control of the next level up. For example, in performing a speech, the
Body Moves and tacit knowing 247
voice you produce is shaped into words by a vocabulary. However, the operation of higher levels cannot be accounted for by the laws governing its particulars that form the lower levels. For example, you cannot derive a vocabulary from phonetics. The two terms of tacit knowing, proximal and distal, are described as being two levels of reality, and between them, ‘there is a logical relation which corresponds to the fact that the two levels are the two acts of tacit knowing which jointly comprehends them’. There are two important facts here. Firstly, that in resonating with the particulars of each level in the communication structure, we do so by being ‘aware’ of the particulars of the entity such as a gesture, for attending to it. This is distinct from saying we resonate with the particulars, the elements of the gesture, by attending to them. If we did so, this function of the particulars, i.e., enabling us to attend to the entity, is canceled and we would lose sight of the gesture itself because all we would see is fragmented elements. A popular example of this in discussions about skill, is of playing the piano. If in the middle of performing a piece of music we suddenly began to focus on the movements of each of our fingers we would have difficulty in being able to play. Tacit knowing is about achieving the performance of playing the piano such that the finger movements and the piano keys are ‘invisible’ to us, as an extension of our selves.1 In a similar sense, our ability to grasp communication cues and the content that they frame is invisible to us until we feel uncomfortable, in the communication situation, at which point we become aware of its particulars. Understanding how we have this ability, through tacit knowing, would provide a further insight into the nature of ‘experiencing’ and the connecting of self with other self(ves). Body Moves are composites of gesture, speech, and silence of the participants together, not of the individuals (as in the idea of a ‘composite signal’ (Clark, 1996; Engle, 1998)). This is an important distinction. Our skill in communication, as an individual, is impingent on our skill in performing with another self and needs to be understood as such. In other words, the understanding of the representations of the tacit dimension of another’s action, is expressed in the skilled performance with the other, be this to agree, disagree, negotiate, acknowledge, or simply, to act at the same moment with the other (simultaneously). The last kind of performance, to act at the same moment, is of a different nature to the others. It is a parallel and coordinated action, whereas the former are sequential actions that take the form of action and response moves. This distinction becomes important in the discussion and analysis that follows. Body Moves necessarily involve at least two people sharing the knowledge of the
248 Satinder P. Gill
same comprehensive entity, namely, of their joint skilled human performance. ‘These comprehensive entities include, apart from our own performance, both the performance of other persons and these persons themselves’ (Polanyi, op.cit., p. 49) In this paper, we distinguish between two variations of tacit knowing that arise from differing conditions of time and space relations between persons moving together. The Body Moves that have been identified so far, have two information and knowledge functions. Sequential Body Movements (SBM) carry and maintain the flow of information in interaction (Gill at al., 2000) through action-reaction responses. Parallel Coordinated Moves (PCM), however, facilitate the tacit transformation of this flow, termed knowledge transformation (Gill, 2001). In the ‘Tacit Dimension’, Polanyi described a relation between emergence and comprehension, as existing when ‘an action creates new comprehensive entities’. Parallel Coordinated Moves are multiactivity gestural coordinations, where different but related projects are being expressed in the body actions of the participants at the same time. This fusion provides the conditions for tacit transformation in a new plane of understanding from the prior sequential interactions, and as a result they create new comprehensive entities, expressed in rhythm, body and speech. The collaborative features of these moves enable the participants to negotiate and engage in the formation of a common ground (Gill, 2002).
Tacit and explicit knowing We know that if the representations of the tacit lie outside one’s experience, then they become what some have termed, propositional knowledge (cf. Josefson, 1987; Gill, SP, 1988, 1995). This is either meaningless for the participant or cannot be interpreted or used by him/her in accordance to the background of understanding and practices against which it has been expressed (as in the case of Gordon). Cooley (1996), Rosenbrock (1996), Gill, K. S. (1996) and Gill, S. P.’s (1995, 1996) work on tacit and explicit knowledge shows that their relationship to each other is that of continuous emergence, where one aspect builds on and is shaped by the other (see Figure 1, reproduced from Gill SP, 1995, p. 15). In saying ‘explicit’, they mean a range of things that share the common feature of abstraction, emphasizing data and information, e.g., the formalisation of ideas, organizational rules, text-book type information, and so-on. The idea of ‘explicit’
Body Moves and tacit knowing 249
Tacit Knowledge Tacit Knowledge
Explicit Knowledge Explicit Knowledge
Figure 1.The expansion of knowledge leads to a reciprocal expansion of tacit knowledge required for using the new explicit knowlegde. [Reproduced from Gill, S. P. Dialogue and Tacit Knowledge for Knowledge Transfer, PhD Dissertation (1995), p. 15.]
can also be applied to any form of expression, such as a word, ‘black’, that has a specific meaning located in an experience. As we become skilled in practices around, say, the word ‘black’, we will be able to both comprehend this and other forms of expression, gaining a tacit knowing of them, and create new forms of expression based on our experiences. These are new entities that are comprehensive for those who create them, and they need to be comprehended by others.
Co-presence I asked John whether, if it were possible for his team and Gordon to colour maps together in a distributed setting with the help of some hypothetical computer mediated technology, he would be interested in exploring this possibility. John declared that this was not a matter for technology, but quite simply that Gordon ‘lacks experience’ and that the only way he will acquire it is by colouring with them in the same space. His conviction, made me reflect on what it means to share a space and be present, as a precondition to acquiring experience; experience that would have helped Gordon to interpret the examples of previously coloured maps for similar bids, colour keys, and instructions, that had all been sent to aid him in understanding how to colour the maps. Being present is a bodily experience, and involves all the human senses. In various cultures we draw upon various levels of our senses. For instance, the
250 Satinder P. Gill
Maori rub noses in greeting each other, Russians kiss on the mouth, and in some Arab cultures, they bring their faces close enough to smell the breath of the other. All these acts are part of gauging one person’s sense of another, essential to building trust that is required for committed engagement. Placing a glass plane between two people in any of these situations would block their tacit ability to interpret their relation to each other, and thereby comprehend each others meaning, through the impacts between their bodies, and would require them to focus on the visual and speech channels that have limited bandwidth for tacit knowing. John was certain that once Gordon had this experience of colouring with him in the same physical space, he would have no trouble in the future in aligning his aesthetic ‘seeing-as’ (Tilghman, 1988) with theirs when given such materials or representations (exemplars) to interpret and ‘see’, wherever he might be. Seeing-as requires interpretation, and Tilghman terms this, ‘mediated understanding’. Once you have the skill to see, you can understand without interpretation, and just perform. The tacit knowing that Gordon had acquired would be ‘retrieved and made active by sensing’ (Reiner, 2003) in his act of seeing. The role of mind and imagination is important for such retrieval in sensing that brings together past (memory), present, and future. In such sensing, our minds draw upon our bodies: “wherever some process in our body gives rise to consciousness in us, our tacit knowing of the process will make sense of it in terms of an experience to which we are attending” (Polanyi, op.cit. p. 15). Each situation we are in is also unique, and in engaging with each other we recognize ourselves as persons in the performance. Tilghman (1988) gives a wonderful example of this in Charle’s le Brun’s series of faces. Le Brun painted these in the seventeenth century, to illustrate the various emotions that painters could be asked to represent. What is “striking is that any number of them could be substituted for another without loss. What is missing is any setting or context to ‘make the emotion determinate” (op. Cit. p. 312). Persons and contexts together constitute the formation of memories through movements in co-presence that Gordon would draw upon to colour any future maps.
Body Moves and grounding How we come to ground our communication with each other is core to understanding what collaboration is about (Clark, 1996). Much work on grounding has given us a deep insight into how the sequences in our speech and
Body Moves and tacit knowing
body communication are finely tuned in time, and how incrementally they serve to build common ground in the specific communication situation. Body Moves expands upon the sequence structure to the study of parallel structures, and involves the relationship between these two. Parallel structures create, and operate, within a different time-space connectivity than sequential structures. Sequences emphasise time whilst in parallels time and space are both emphasised. In Body Moves, the grounding process is considered within the communicative frame of the ‘engagement space’. This is a communication space where communicative orientations (e.g., metacommunication) lie in a matrix relation to spatial orientations. Within the engagement space, dynamics of interaction is considered in terms of both action and meaning. The process of moving from the state of information flow (Sequential Body Moves) to knowledge transformation (possible Parallel Coordinated Moves) is seen as a process of grounding. This grounding occurs along the integration of the two axes of the tacit and explicit dimensions of human knowing (Jerry Gill, 1995), the awareness and the activity axes, that in their integration form a third axis, of cognitivity [see Figure 2]. These axes allow us to move from tacit to explicit knowing whilst retaining the subsidiary awareness and bodily understanding of the tacit dimension of that explicit knowing. Here we need to briefly backtrack. In the discussion on the tacit dimension, we spoke of the relationship between the tacit and explicit dimensions of knowing. Continuing with Polanyi’s conception, tacit knowing has two terms, proximal and distal, that correspond to two levels of reality; the lower one enabling us to attend to higher one, from particulars to the entity. The explicit dimension of knowing is the ability to refer to the entity. It lies in language whether this be an exemplar of the tacit, such as a coloured map, or a coding of the tacit, such as a colour key, or a description of the tacit in rules, such as a set of instructions. The relation between the tacit and the explicit dimensions of knowing is the multi-axis interface that makes communication possible, and that enables us to say ‘black’ and comprehend ‘black’ at the same moment with someone else. Within knowledge transformation, aspects of both the explicit and the tacit dimensions that are particular to the situation of that moment, are located in the past, present and future simultaneously. To reflect further on this interface, we explore the idea of skill (tacit knowing) in interaction further. It involves understanding how and when to move between the ‘individual’ (self) and the ‘group’ (self with other self(s)) such that one is successful in this performance.
251
252 Satinder P. Gill
Conceptual Explicit Knowing Cognitivity
Focal
Activity
Awareness
Bodily
Tacit Knowing
Figure 2.Cognitivity: the axes of mediating and grounding. [Reproduced from Gill, J. H. ,The Tacit Mode, (1995), p. 39, Figure 2.1.]
Take Reiner’s example of two basketball players beautifully timing and coordinating their joint act of tossing and catching a basketball (Reiner, 2003). They perform as if their bodies “know how to precisely time actions, assess future movements and impart correct velocities without formalism”. In using this example, Reiner is questioning the idea that cognition consists in symbolic processing. A “robotic player” would have to assess the velocity and distance of the ball, asses its own velocity, calculate the time until the ball and the layer are in the same point in space, calculate the velocity needed for his hand to catch the ball, change the position of his hand accordingly. Simultaneously watch other near-by players, predict their intentions and capabilities, and then plan the appropriate velocities, momentum and forces, all with the tough, precise, timing constraints.” (p. 5). This example captures how the integration of the awareness and activity axes gives rise to the cognitivity to move with accuracy and judge with accuracy, whilst relying on the subsidiary and bodily dimensions of tacit knowing to bound this precision. Skilled cooperative action, be this in basketball, colouring maps, or discerning a particular blackness of black, involves the participants in the communicative situation being able to understand it and to know how and when to respond appropriately for the purpose(s) at hand. This has been described as the performance of knowledge in co-action (Gill and Borchers, 2003) and is a
Body Moves and tacit knowing 253
form of intelligence for sustainable interaction. We are not conscious of it and it is invisible to us, as an extension of our self, until there is a problem that causes us to become aware of it. Skilled communication, or being a skilled performer of knowledge, in a team, involves this ability to move effectively between one’s ‘self ’ with another ‘self ’, from ‘seeing-as’ to ‘seeing’. In Body Moves, this skill is performed in grasping the cues or representations of the tacit dimension of each other’s action, as bodies move from sequential interaction to parallel coordinated actions and back again. Coordinated structures of these behaviours can be extended and transformed through technology to form more dimensional interactional spaces.
Co-ordinated autonomy The movement from sequence to parallel action involves the management of coordinated autonomy, i.e., the management of self and self-with-other that is essential to sustain the communication and engagement within collaborative activities. Coordinated autonomy is one dimension of being co-present, and it is culturally determined.2 In parallel coordinated moves, coordinated autonomy has a special quality of awareness. Parallel action itself is not a sufficient condition, as one could be autonomous without awareness of the other. As we will show later in this paper, autonomy without attending to the other can be disruptive to collaborative ‘joint’ activities. Action that is parallel and coordinated, however, involves each person being aware of the other simultaneously. Such quality of awareness is important for tacit knowing.
The Parallel Coordinated Move The Parallel Coordinated Move (PCM) was first identified whilst analyzing a five-minute video excerpt of landscape architects working on a conceptual design plan. It occurred only once and lasted 1.5 seconds. It was the first time in that session that the disagreement between two architects was able to find a resolution, and it involved both their bodies acting on the surface at the same time, even whilst presenting alternative design plans. One of them was silent and the other speaking. It enabled the grounding in the communication to come into being (Gill, 2002; Gill, 2003b)3 by enabling an open space for the
254 Satinder P. Gill
negotiation of differences. The opening and closure of the PCM is by actionresponse Body Moves. For example, the ‘focus’ move involves a movement of the body towards the area the ‘speaker’ or ‘actor’ is attending to, i.e., space of bodily attention, and in response causes the listener or other party to move his or her body towards the same focus. In order to understand the PCM further, and to gather more examples for analysis, a number of configurations for a similar task were set up to collect further video data. Part of this study is reported here.4 One task set, is for dyads of students to design shared dorm living spaces. We also collected data and made a preliminary analysis of group activity where students are using multiple large-scale surfaces, i.e., SMARTBoards.5 These are electronic whiteboards. The PCM is explored as a category of Body Move that has it’s own set of variable configurations, where the basic common defining feature is that participants act at the same time. The examples drawn upon are of actions taking place upon various surfaces, and the contexts within which these occur. Such actions, for example, can be to indicate ideas or proposals with a pen or finger or hand. There is also a consideration of those cases where only one participant has physical contact with the surface in order to glean some understanding of what the function of touching is, and reflect on that back to the case of parallel action. We have sought to capture the management of both the body and speech spaces within a task where you need to produce something together and agree upon it.
The study The experiment6 involved two drawing surfaces, used by different sets of subjects, a whiteboard and a SMARTBoard. This is a large-scale computerbased graphical user interface, and is touch-sensitive (an electronic whiteboard). ‘Smart’ technology does not permit two people to touch the screen at the same time, i.e., it does not allow for parallel action at the surface of the task being undertaken, and therefore makes for a useful case bed of data to analyse how such action affords collaborative activity. The contrast between drawing at the ‘smart’ and the whiteboard was expected to reveal whether or not there are particular differences in body moves and gesture and speech coordination at these interfaces.
Body Moves and tacit knowing 255
The experiment Two cameras were positioned to capture side views of subjects and one camera to capture a view from behind. Microphones were attached to the subjects. 11 subjects (3 female, 8 male students) were recruited. Three (seniors) had a general familiarity with the iRoom, while the other 8 (juniors) had a rudimentary concept. All subjects were briefed on the room’s functionality, including an explanation of how to use the SMARTBoards and the related tools. Specifically, the subjects were shown how to draw on the SMARTBoard using the coloured ‘pens’, how to erase using the ‘eraser’, and how to use the computer-based components including a wireless keyboard and mouse. It’s important to note that the subjects were not directly informed that the SMARTBoard can only receive one source of input at a time. There were seven sessions; four at the SMARTBoard, three at the whiteboard. This activity takes place in the ‘iRoom’, which is the laboratory of the Stanford Interactive Workspaces project7 (Guimbretiere, Stone, and Winograd, 2001). Our observations8 reveal that the participants’ commitment, politeness, and attention to each other becomes reduced at a single SMARTBoard, showing behaviours that are in marked contrast to those of users at a whiteboard. Furthermore, the quality of the resulting design is lower when using the SMARTBoard.9 Acting in parallel, e.g., drawing on the surface at the same, involves a degree of autonomy. We have observed patterns of movement from sequential (e.g. turn taking) to parallel actions, as part of this design activity, and suggest that coordinated autonomous action is part of sustainable collaborative activity. In a related study, when a group of four users has three SMARTBoards available to them, there appears to be a transposition of the patterns of autonomy and cooperation that one finds between a dyad working on a whiteboard (Borchers, Gill, and To, 2002; Gill and Borchers, 2003), although the body moves and parallel actions that constitute them take a very different form. At the SMARTBoard, when one has to wait one’s turn to act at the surface, it may (a) take longer to build up the experiential knowledge of that surface than if one could move onto it when one needs to, and (b) there is a time lag for the other person working with you to experience with you, in a manner of speaking, your experience of the situation, i.e., there is an awareness lag. As we know, ‘awareness’ is important for tacit knowing. The former and the latter difficulties are, we suggest, linked because of this experiential dimension of tacit or implicit knowledge. With multiple boards in parallel use, awareness of the experience, i.e., of one person of other persons, seems more fluid than that of a
256 Satinder P. Gill
dyad at the one board, evident in the movements around the boards to gather and disperse where rhythms in behavioural alignments were halted. The study suggests that mediating interfaces that could support collaborative human activities to involve sustainable and committed engagement of self and interpersonal self (self with other self(ves) (Gill and Borchers, 2003a) need to be able support parallel coordinated activity. One aspect of this engagement is ‘contact’. ‘Contact’ is an important dimension of Body Moves. It indicates the nature and degree of commitment of persons to each other. The configuration of the body’s physical space influences the strategies to guage and manage contact. In the following examples we consider a) parallel coordinated moves, and b) how coordinated autonomy is managed as a communicative strategy.
Example of a Parallel Coordinated Move When a designer is making contact with the surface to act upon it, whilst the other person is doing so too, there is an attempt to engage with the body field of the other person, as in the case of the landscape architects. It also happens in the example below (Figure 5). The designer on the right side, closest to us (E), enters the body field of the other one (F) who is currently drawing (action), and uses his index finger to trace out a shape to indicate a bed. He is proposing this idea to (F) who is drawing, to get his opinion (negotiation). Both action and negotiation are operating at the surface. The body field of the person drawing (F) is not disturbed, and as we know from the discussion of the engagement space, this indicates a high degree of contact and is identifiable as a Parallel Coordinated Move (PCM).
Transcription coding scheme In the example presented below we will use the following conventions to encode the Body Moves (BM) and Communicative Acts (CA). {} [] | (1,2,3,4,5) 1. 2. 3. 4.
body movements with speech body movements as turns (i.e., no speech) indicates the point at which body actions start tag reference to specific moment of body move in the pictures (1), (2), (3), (4), (5)
E: Do you want us to do like a bed E: CA:Suggest (1) {E moves in to the whiteboard, index finger point touches the (1) surface; F at the surface about to draw}. E: and then that then
Body Moves and tacit knowing 257
(1)
(2)
(3)
(4)
(5)
Figure 3.A Parallel Coordinate Move. 5. 6. 7. 8. 10. 11. 12. 13. 14. 15. 16. 17. 20. 21. 22. 23. 24.
{E’s finger traces the outline of a bed E: PCM (1–2) {F is drawing in a line towards the left, his head nods} F: PCM (1–2) E: one here E: CA:Suggest (3) {E moves his hand down and taps the surface, of the space E: BM:Dem-Ref (3) that he has just traced, with the back of his hand}; {F traces the outline of the bed in the air, moving his hand F: BM:B-Check (3–4) straight to the left and back and then down}. E: and then | one here E: CA:Suggest (4–5) {E lifts his hand up and taps the board again with the back E: BM:Dem-Ref (4–5) of his hand; at | F moves his hand back to original position Silence [E lifts hand off and away from the surface, as F is about to touch it with his pen] F: ye F: CA:Ack {F puts pen back on paper}; (5) [E’s body begins moving back] E: and maybe do like a dresser between them {E’s body moves back to rest-reflection position; F is drawing the beds}
(F) acknowledges (E)’ s proposal, in tracing the proposed idea above the surface of the board (Figure 3, pictures 3 and 4) with his pen, whilst (E) taps a position of one bed with the back of his hand on the surface to locate it. Through gesture (b-check), F checks the proposal that (E) is making through gesture (Dem-Ref) and speech (Suggest). After tracing (E) continues to draw, and his pen touches the surface (pic5) at the same time as (E) begins to lift his hand away. There is no break in the fluidity of the rhythm of the coordination between them (of body and speech).
Parallel Coordinated Actions and co-ordinated autonomy When body fields overlap, you have simultaneous coordinated autonomy within parallel coordinated moves. In the study, there are many instances of parallel actions taking place at the surfaces of the table and whiteboard, and attempts to do so at the SMARTBoard when only one such board is available.
258 Satinder P. Gill
Figure 4.Moving to act autonomously in parallel.
In this example [Figure 4 above], (E) is standing back, watching and talking, and (F) is drawing on the whiteboard. (F) has his body positioned to accommodate (E) by slightly opening it, slanted to the right, to share the engagement space with (E). At some point, (E) looks to the left of (F) to an area on the whiteboard and moves towards it [pic 2]. He picks up another felt pen and begins to draw as well [pic 3]. As (E) touches the surface, (F) shifts his body and alters his posture so that it is now open slanted to the left, and increases contact with (E). Both are now acting in parallel. This shift occurs in silence. At the SMARTBoard [Figure 5 below], (C) is standing back whilst (D) is drawing. He looks and moves to a position to the right of (D), on the SMARTBoard. He leans in to the surface but cannot draw because he has to first wait for (D) to end his turn. (D), without looking up, speaks, and his utterance causes (C) to turn his body back to look at him. As he cannot yet act, (C) moves back from the surface and waits, and as he is doing so, he breathes in deeply in frustration. (C) notices him, pauses his drawing, turns to look at him and moves back from the zone of action,10 allowing (D) to move into it [Figure 6, pic 3]. Once (C) is acting, i.e., drawing, (D) continues with his drawing on the SMARTBoard [Figure 7]. The result is a disturbance on the board, and a jagged line cuts across from (D’s) touch point to (C’s), causing them both surprise and laughter [Figure 7, picture 2]. (D) momentarily forgot that you cannot touch the surface at the same time. The need to act whilst another is acting is not a
Figure 5.Waiting to act.
Body Moves and tacit knowing 259
Figure 6.Giving the turn.
conscious one. This autonomy in co-action seems to be part of the coordinated collaborative process but at a metacommunicative level.
Figure 7.Problem in drawing on the SMARTBoard together.
In this example, [Figures 5–7] we see that (C’s) attempt to act is frustrated until his need to act is noticed, at which point the turn to act is offered to (C) by (D). It is significant that they recognise each other’s need to act, and signal this need (moving body away, distancing) and respond to it (speech and body), and further, that they forget the limitations of the surface to afford them this need. In contrast, the whiteboard permitted [Figure 4] a more fluid movement around the surface, as there was no enforced pause by the surface, and no turntaking required on one designer’s part to permit the other person to act. These examples are of parallel coordinated actions that involve autonomy, where autonomy occurring in simultaneity involves awareness of and attendance to the state of engagement in the space between participants and the surface(s). When a designer at the SMARTBoard does not easily give the turn to the other one, we observe various strategies to force it. These include, moving close to the board and inside the visual locus of the drawing space in a quick motion, or moving back and forth, or reaching for a pen, or looking at the pen, or simply reaching out and asking for the pen the other person is currently using, or just moving right in front of the body of the person currently drawing, thereby forcing them back, and taking a pen from the pen holder. As either person can act at the whiteboard, there is no need for such strategies.
260 Satinder P. Gill
In contrast to the SMARTBoard, at the whiteboard autonomous performance by one person that is not occurring in co-action can bring a reaction to regain coaction. In an example below, [Figure 8, pictures (1–6)] (E) looks up and stands to draw something higher up on the board, just after (F) has knelt down to draw beside him. (E) altered his position such that the contact within the engagement space became too low for (F) to be aligned with him in order to act.
Figure 8.Attempt, disturbance, and regaining of parallel and coordinated action.
(F) attempts to regain contact so that he can work with (E), first by speech [in picture 3] and when that fails, by using Body Moves to attempt contact [picture 4] and focus [picture 5].11
Discussion The SMARTBoard makes those actions that are invisible, or are extensions of ourselves, when acting at a whiteboard, or a drawing table, visible. The structure in communication “becomes visible only when there is some kind of breakdown” (Winograd and Flores, 1984, p.68). Visibility is problematic for tacit knowing as it inhibits action and awareness of the focal. We see this structure in communication in the acts of leaning one’s hands in the drawing space, acts of rubbing something out whilst another is drawing, checking something by pointing on it, or touching the surface with a finger or hand to think about an idea, etc. These are part of our capacity to connect our selves with the external world and form meaning. When these actions are inhibited or have to be negotiated, the fluidity of sharing an engagement space in an interactive drawing task becomes altered by the kinds of communication strategies available to persons to collaborate. We have analysed three basic elements of collaboration and cooperation in joint activities: the skill to grasp and respond to the representations of the tacit dimension of our actions (e.g., in Body Moves, gestures, sounds); the ability to coordinate this grasping and responding in a rhythmic synchrony of sequential and parallel coordinated actions; and coordinated autonomy that occurs within
Body Moves and tacit knowing 261
parallel coordinated movements and involves awareness and attendance to the state of engagement in the space between us and interfaces. We have found that simultaneous synchrony in co-action such as drawing, or being able to touch the surface together, provides for a certain kind of awareness of states of contact within an engagement space. We have established that this kind of awareness of the other is essential for tacit knowing and the emergence of tacit knowing, hence parallel coordinated actions are conditions for it. Furthemore, the multi-dimensional expression of ideas in combinations of activity using a pen, hand or finger, to sketch ideas allows them to be located in one’s self and made clear for the other person, whereby contact with each other’s ideas can be made with the body through motion and physical contact at the surface, as well as through speech. The analysis of parallel coordinated movements shows the importance of coordinated autonomous behaviour for sustainable collaborative activity, as it facilitates negotiation and cooperative behaviour. Without it, the designers use disturbance strategies to achieve autonomous action. Autonomy is a function of being able to act together at the same time. Coordinated simultaneous autonomy is a function of being able to act and attend to each other at the same time. In the merging of the differences, each self maintains his/her identity, yet is able to be aware and be with the other without disturbing each other’s action. The moment of a parallel-coordinated move arrives after a set of sequential Body Moves that build up the knowledge base for grounding. In this mergence of action, tacit knowing can be achieved, and can give rise to new ‘comprehensive entities’ or new ideas that embody the tacit knowing of both selves. It is in this mergence that knowledge transformation can take place. A challenge for designing mediating interfaces is that they afford us our human skills of engaging with each other and forming tacit knowing. One result of the study of the SMARTBoards is the design of more contact affordances, e.g., software to permit the simultaneous operation of multiple functions at the surface. From the study of the landscape architects we conclude that the solution to gaining ‘experience’ lies in the integration of the axes of tacit and explicit knowing, and this makes for our ability to experience a representation of practice and understand it for the purposes at hand. From the analysis presented here, it is clear that an interface, for collaborative activity, that can manage this integration needs to handle both sequential and parallel coordinated Body Moves and the conveyance of the persons. The ‘messy’ details of any situated activity (Goodwin, 1997) make up our capacity for ‘perceptual discernment’, and this involves the various modalities
262 Satinder P. Gill
of our human senses. Body Moves, as multi-modal communicative acts, lie within the rhythmic pulse of interactive space and time that carries these messy details of a setting or context or the ‘spirit’ of a person, and in so doing, they mediate them in their performance, enabling tacit knowing. The processes of integration for tacit and explicit knowing gives us an insight into the nature of cognitivity, and a theoretical frame from which to explore the relation between embodied mind, technology and environment, rooted in pragmatics of knowing. This bears directly upon the pragmatics of Cognitive Technology that seeks to understand the effects of technology on users and their environment, and the conditions under which users can cognize their technology in order to realize the effects of their technological efforts (Mey, 1995). The extension of self (the ‘invisible’ interface) operates through awareness and activity in order to achieve that state of cognitivity. It is a frame of cognition within which to reflect on CT’s question about effects of the ‘transparent tool’ (that you use without noticing that you are using it) (Mey 1988; Norman 1999) without the mind-body split. In summary, the discussion develops a conceptual frame for cognitivity and collaborative action that can inform the analysis of human-technology symbiosis and the design of mediating interfaces. The idea of the ‘mediating’ interface supports the argument that ‘the locus of control with respect to the language of expression at the interface, ought to be placed in the user’s mind and not in the machine’ (Gorayska and Cox, 1992), where mind is seen in terms of cognitivity. The discussion has built on the concepts of invisibility, parallel-coordinated moves, tacit and explicit knowing, and coordinated autonomy.
Notes * Thanks and acknowledgements to Jan Borchers for collaborating with me on the HCI study of large surfaces, and to Terry Winograd for his support of this work in the iSpaces Project at Stanford. Thanks to Timo Saari and Seija Kulkki for their support of this work, undertaken whilst the author was with CKIR (Centre for Knowledge and Innovation Research). Thanks also to Masahito Kawamori for his work with me on the fundamental coding of Body Moves whilst at NTT. Lastly, thanks to Jacob Mey for his support of Body Moves as part of Pragmatic Theory. 1. The idea of invisibility and extension of self is drawn from Polanyi’s work on Personal Knowledge (1964). 2. The relationship between coordinated autonomy and culture is being developed by the author in a forthcoming paper.
Body Moves and tacit knowing 263
3. See Gill (2002) for a deeper analysis of the PCM. 4. For a more indepth and wide covering analysis, see Borchers, Gill, and To (2002), and Gill and Borchers (2003a). 5. Borchers, J. O., Gill, S. P., To, T. (2002). 6. Gill, S. P., Sethi, R., Martin, S. (2001). 7. http://graphics.stanford.edu/projects/iwork/ 8. For a preliminary discussion of this research, see: Borchers, J. O., Gill, S. P., and To, T. (2002). Multiple Large-Scale Displays for Collocated Team Work: Study and Recommendations, Technical Report, Stanford University. For a more fully developed analysis, see Gill, S. P., and Borchers, J. O. (2003a). 9. This could in part be due to the awkwardness of the interface for producing smooth drawings. 10. In Gill and Borchers (2003a) we have developed the idea of zones of interaction, namely reflection, action, and negotiation, and we describe how these are managed and carried by Body Moves within the Engagement Space. In other words, these are not fixed spatial locations. This is being developed further in another paper by Gill. 11. Gill et al., 2000.
References Allwood, J., J. Nivre & E. Ahlsen (1991). On the Semantics and Pragmatics of Linguistic Feedback. Gothenburg Papers. Theoretical Linguistics 64, 1–39. Bateson, G. (1955). “The Message. ‘This is the Play.’” In B. Schaffner (Ed.), Group Processes. Vol.II. New York: Macy. Birdwhistle, R. L. (1970). Kinesics and Context. University of Pennsylvania. Borchers, J., S. Gill & T. To (2002). Multiple Large-Scale Displays for Collocated Team Work: Study and Recommendations. Technical Report. Stanford University. Clark, H. H. & E. F. Schaefer (1989). Contributing to discourse. Cognitive Science 13, 259–294. Clark, H. H. (1996). Using Language. Cambridge: Cambridge University Press. Cooley, M. (1996). On Human-Machine Symbiosis. In K. S. Gill (Ed.), Human Machine Symbiosis: The Foundations of Human-centred Systems Design, pp. 69–100. London: Springer. Engel, R. (1998). Not Channels but composite signals: speech, gesture, diagrams and object demonstrations are integrated in multi-modal explanations. In M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, pp. 321–327. Mahwah, N. J.: Erlbaum. Gill, J. H. (2000). The Tacit Mode. Michael Polanyi’s Postmodern Philosophy. New York: SUNY Press.
264 Satinder P. Gill
Gill, K. S. (1996). The Foundations of Human-Centred Systems. In K. S. Gill (Ed.), Human Machine Symbiosis: The Foundations of Human-centred Systems Design pp. 1–68. London: Springer. Gill, S. P. (1995). Dialogue and Tacit Knowledge for Knowledge Transfer. PhD Dissertation, University of Cambridge. Gill, S. P. (1996). Designing for Knowledge Transfer. In K. S. Gill (Ed.), Human Machine Symbiosis: The Foundations of Human-centred Systems Design, pp. 313–360. London: Springer. Gill, S. P. (1997). Aesthetic Design: Dialogue and Learning. A Case Study of Landscape Architecture. AI & Society 9, 273–285. Gill, S. P. (2002). The Parallel Coordinated Move: Case of a Conceptual Drawing Task. Published Working Paper: CKIR, Helsinki. ISBN 951–791–660–4. Gill, S. P. & J. Borchers (2003). Knowledge in Co-Action: Social Intelligence in Collaborative Design Activity. AI & Society, 17(3), 322–339. An adaptation of a conference paper presented at Social Intelligence Design 2003, Royal Holloway, London. Gill, S. P., R. Sethi & S. Martin (2001). The Engagement Space and Gestural Coordination. In C. Cave, I. Guaitella & S. Santi (Eds.), Oralite et Gestualite: interactions et comportements multimodaux dans la communication. (Proceedings of ORAGE 2001, International Conference on Speech and Gesture), pp. 228–231. Aix-en-Provence, France. Gill, S. P., M. Kawamori, Y. Katagiri & A. Shimojima (2000). The Role of Body Moves in Dialogue. RASK 12, 89–114. Good. D. A. (1996). Pragmatics and Presence. AI & Society 10 (3&4), 309–14. Goodwin, C. (1997). The Blackness of Black: Colour Categories as Situated Practice. In B. Lauren Resnick, R. Saljo, C. Pontecorvo & B. Burge (Eds.), Discourse, Tools and Reasoning: Essays on Situated Cognition, pp. 111–140. Berlin, Heidelberg, New York: Springer. Goodwin, C. (in press) Pointing as Situated Practice. To appear in S. Kita (Ed.) Pointing: Where Language, Culture and Cognition Meet. Hillsdale: Erlbaum. Gorayska, B. & K. Cox (1992). Expert systems as extensions of the human mind. AI & Society 6, 245–262. Guimbretiere, F., M. Stone & T. Winograd (2001). Stick it on the Wall: A Metaphor for Interaction with Large Displays. Submitted to Computer Graphics (SIGGRAPH 2001 Proceedings). Josefson, I. (1987). The nurse as an engineer. AI & Society 1, 115–126. Kawamori, M., T. Kawabata & A. Shimazu (1998). Discourse Markers in Spontaneous Dialogue: A corpus based study of Japanese and English. Proceedings of 17th International Conference on Computational Linguistics (COLING-ACL98). Kendon, A. (1970). Movement Coordination in Social Interaction: Some examples described. Acta Psychologia 32, 100–125. Mey, J. (1995). Cognitive Technology — Technological Cognition. Proceedings of the First International Cognitive Technology Conference, August 1995, Hong Kong. Reprinted in AI & Society (1996) 10, 226–232. Mey, J. (1998). Adaptability. In: Concise Encyclopedia of Pragmatics, pp. 5–7. Oxford: Elsevier Science. Mey, J. (2001). Pragmatics. An Introduction. Oxford: Blackwell.
Body Moves and tacit knowing 265
Norman, D. (1999). The invisible computer. Cambridge, Mass.: MIT Press. Polanyi, M. (1964). Personal Knowledge: Towards a post critical philosophy. New York: Harper and Row. Polanyi, M. (1966). The Tacit Dimension. Doubleday. Reprinted version, 1983, Gloucester, Mass.: Peter Smith. Riener, M. & J. Gilbert (in press). The Symbiotic Roles of Empirical Experimentation and Thought Experimentation in the Learning of Physics. International Journal of Science Education. Rosenbrock, H. H. (1988). Engineering as an Art. AI & Society 2, 315–320. London: Springer. Scheflen, A. E. (1974). How Behaviour Means. Exploring the contexts of speech and meaning: Kinesics, posture, interaction, setting, and culture. New York: Anchor Press/Doubleday. Schiffrin, D. (1987). Discourse Markers. (Studies in Interactional Sociolinguitsics, 5). Cambridge: Cambridge University Press. Tilghman, B. R. (1988). Seeing and Seeing-As. AI & Society 2(4), 303–319. Winograd, T. & C. F. Flores (1986). Understanding Computers and Cognition: A new foundation for design. Norwood, N. J.: Ablex Press.
Gaze aversion and the primacy of emotional dysfunction in autism Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay Department of Psychology, Oxford Brookes University
Introduction Autism and Cognitive Technology Autism is relevant to cognitive technology in two important ways. Firstly, there is now available a wide range of computer-based assistive technology (Dautenhahn, Werry, Salter and te Boekhorst, 2003; Moore and Calvert, 2000; Alcade, Navarro, Marchena and Ruiz, 1998; Huttinger, 1996; Chen and Bernard-Opitz, 1993). Computer-based technologies appear to be well suited to the cognitive limitations of autistic individuals because social interaction is not required, they are rule-governed, predictable and controllable; they incorporate very clear-cut boundary conditions, they are naturally monotropic (one topic at a time), they are able to match the individual’s attention tunnel, are context-free, errormaking is safe, and there are options for non-verbal or verbal expression. Secondly, there is some ground for hope that understanding the cognitive deficits associated with autism will enhance our ability to emulate the ‘natural software’ (or ‘mindware’: Clark, 2000) underlying human social cognition. This would allow computers to be programmed to behave more like human agents hence facilitating human-computer interaction, and would also feed back into assistive technology, enabling autism sufferers to be better assisted and allowing assistive devices to be designed in a more principled manner. Though the second aspect of the autism-cognitive technology is least developed and perhaps, most exciting, there appears to be a serious obstacle to progress in this area. This derives from the possibility that the deficits associated with autism are primarily cognitive. As there is increasing evidence that autism has a substantial genetic component, the implication of cognitive primacy is that the cognitive processes that are dysfunctional in autism have a genetic
268 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
basis. Contemporary theories of autism (see section below) have almost exclusively taken this view, arguing both that cognitive deficits are primary and that these result from neurological deficiencies in an innate ‘theory of mind module’, an innate Central Executive system, an innate global processing system or an innate mirror neuron system subserving social imitation. If autistic deficits are indeed innate and cognitive, they obviously cannot result from defective learning and hence cannot be examples of ‘natural technology’ (El Ashegh and Lindsay, this volume; Meenan and Lindsay, 2002). Natural technology refers to cognitive software, or ‘mindware’, that is developed and transmitted as a cultural artefact and that is capable of (sometimes substantially) enhancing innate human capabilities. Writing and arithmetic are obvious examples. However, despite the recent emphasis on innate cognitive dysfunction as the source of autistic deficits, we argue below that current evidence does not compel this conclusion, and indeed the preoccupation of researchers with cognitive deficits has caused them to neglect the possibility that cognitive deficits in autism result from strategies developed to deal with genetically based emotional dysfunction. The neurological systems underlying emotion are much more likely to be biologically grounded than cognitive processes. An accumulating body of evidence seems to implicate emotional rather than cognitive dysfunction as fundamental in autism, and hence there appears to be an increasingly realistic prospect that secondary cognitive deficits can be analysed as cases of natural technology (see, for example, Ramachandran, undated MS) and that new and more effective interventions can be developed on the basis of such analyses. In the present chapter, we will briefly present arguments against the most influential cognitive theories of the autistic deficit, and then with similar brevity we will summarise the evidence that points to a locus in emotional dysfunction. We will then report an experimental study that seeks to test cognitive theories of autism against theories assigning primacy to emotional dysfunction.
The autistic syndrome Autism is a pathology of development, probably having a genetic basis or component (Badner and Gershon, 2002; Bailey, Palferman, Heavey and Le Couteur, 1998; Bailey, Le Couteur, Gottesman, Bolton, Simonoff, Yuzda and Rutter, 1995). Some researchers have argued that autism is caused by intrauterine or environmental toxins but the evidence for such attributions remains flimsy. Since the seminal work by Leo Kanner (1943) on Early Infantile Autism,
Gaze aversion and emotional dysfunction in autism 269
a cluster of related conditions has grown up around the core disorder that he identified.1 The autistic syndrome may be characterised by a triad of cognitive impairments, in social interaction, communication and imagination (Wing and Gould, 1979), or it may be seen as a point upon, or segment of a continuum of autistic spectrum disorders which also includes conditions such as SemanticPragmatic Disorder, Rett’s Disorder, Childhood Disintegration Disorder, Asperger’s Disorder and Pervasive Developmental Disorder (e.g., Bishop, 1989; Rutter and Schopler, 1985; Schopler, 1987; Wing, 1988; Wing and Gould, 1979). There is a wide range of autistic deficits. Sensory impairments are frequently present; attentional dysfunction is usual; a third or more of autistic individuals never develop functional speech; when speech is present, echolalia, palilalia and disturbance of intonation and other paralinguistic features are common. There is often intolerance of environmental change, focus of attention on restricted aspects of the stimulus world and stereotyped and repetitive behaviour such as hand flapping and rocking. In the social domain, it has been claimed that autistic disorders affect (1) understanding of the mental-physical distinction (the capacity to differentiate between mental and physical “events”); (2) understanding that the brain controls both mental and physical functions; (3) distinguishing between appearance and reality; (4) comprehending that other people may have false beliefs; (5) understanding that people hold different knowledge about situations; (6) recognising and using mental state words; (7) engaging in imaginative play; (8) understanding emotions and emotional behaviour, (9) following gaze and detecting other people’s intentions, using figurative speech and employing or interpreting deception (Baron-Cohen, 1995; Baron-Cohen, Tager-Flusberg, and Cohen, 2000).
Theories of autism Explanations of autistic dysfunction are almost as varied as the pattern of symptoms involved. Initially, faulty emotional learning was emphasised, often accompanied by the suggestion that high achieving parents provided poor models for the acquisition of non-verbal communication skills (e.g., Bettelheim, 1967). Since the work of Rimland (1964), however,2 explanations of autism have accepted the view that brain dysfunction lies at the root of autistic disorders, but have made very different proposals as to the form this dysfunction takes. Contemporary theoretical discussions about the origin and cause of autism often seem to conflate and confuse issues from quite different levels of explanation.
270 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
One key debate, quite clear in early publications, is whether autism is primarily a pattern of deficits resulting from emotional dysfunction that distorts subsequent cognitive development, or whether acquired brain damage causes cognitive malfunction that directly produces the manifest symptoms. The issue here is whether cognitive deficits are primary or secondary. A related debate focuses on how best to characterise the core cognitive deficits associated with autism. Are they attentional, perceptual, communicative or social? A third debate has tended to overshadow all others in the last decade or so, as transgenerational data on autism has begun to suggest a genetic basis for autism. This debate has focused on what innate cognitive modules, or functions, or systems might be disabled by the genetic malfunction responsible for the disorder. One group of theorists argues that a cognitive module responsible for modelling the mental states of others is selectively damaged; a rival body of research attributes the genetic fault to defects in a cognitive control system; while others claim that deficits results from inability to switch between local and global processing, or lack of special purpose neurones that enable humans to imitate the behaviour of other people. As we illustrate below, the central assumption of recent research has been that some innate cognitive system or module is defective in autistic individuals, and the function of this system can be identified by establishing the core cognitive deficits associated with autism. Though no-one doubts that emotional disorder is a prominent feature of the autistic syndrome, early suggestions that this emotional dysfunction might be fundamental have tended to become lost, as assumptions about the primacy of cognition over affect, and the search for innate cognitive modules have become the dominant ideas shaping the course of research. The most influential cognitive theories of the autistic deficit are Mindblindedness Theory, Executive Dysfunction Theory, Weak Central Coherence Theory and Imitation Deficit/Mirror Neuron Theory. The theory of Mindblindedness (also known as ‘lack of a theory of mind’) (Premack and Woodruff, 1978; Baron-Cohen 2002; 2003) claims that autism results from an absent or dysfunctional system for modelling the mental states of other people.3 Executive Dysfunction Theory (Rumsey and Hamburger, 1988; Hughes, Russell and Robbins, 1994) claims that the autistic deficit is associated with the Central Executive (Baddeley, 1986; 1990), a hypothetical cognitive control system widely believed to underlie problem solving, planning processes and the generation of novel responses, and the suppression of irrelevant or intrusive behaviours.4 Weak Central Coherence theory (Frith and Happé 1994; Happé, 1999)5 argues that autistic individuals are locked into a preoccupation with the sensory detail
Gaze aversion and emotional dysfunction in autism 271
of a stimulus and unable to attend to the whole gestalt. The most recent theory of autistic dysfunction, the Imitation Deficit/Mirror Neuron Theory (IM/MNT),6 emphasizes neurological structure rather than cognitive mechanisms per se, claiming that specific neurones responsible for the human ability to imitate others (Rogers and Pennington, 1991) are absent or dysfunctional. The four theories differ in the balance they strike between cognitive processes and neurological structures; all are ‘cognitive’ in the sense that they restrict themselves to cold cognition (Abelson, 1963) — the mechanisms invoked are those related to attention, perception, learning, memory, knowledge and belief — the vocabulary is that of information processing rather than that of feelings and affect. This is well-illustrated by the main experimental paradigm used to illustrate cognitive deficits in autism, a ‘false belief ’ task known as the Sally-Ann task. In the Sally-Ann task, a child participant is told the following story (or one of a large number of equivalent variants): Sally places her marble in a box and leaves the room. While she is gone, Ann moves the marble into a basket. Sally then returns to the room to get her marble. The child is then asked where Sally will look for her marble. Most normally developing 4 year olds will answer correctly (that Sally will look for the marble in the box) because they appreciate that Sally does not know the marble has been moved. Autistic children generally get the question wrong, even when their mental age is well above 4 years old. Findings such as this were originally claimed to imply that autistic children lack a theory of mind (ToM) — they find it difficult or impossible to distinguish what Sally knows from what they themselves know (Baron-Cohen et al., 2000; Frith, 1985). Rival theories have tended to accept the centrality of the Sally-Ann task, but have tried to explain the findings associated with it in terms of other constructs, such as inability to deal with modelling others’ mental states because of impoverished Central Executive function, or inability to imitate other minds because of an absence or paucity of mirror neurons. This focus on knowledge and belief predisposes towards neglect of another aspect of disordered behaviour in people with autism — disorder of emotion and affect.
Autism and emotional disfunction In contrast to the four cognitive theories of autism, Stress Overload Theory takes the view that autistic behaviour results from attempts to cope with an excess of fear and anxiety which rise to distressing levels as environmental stimuli increase in novelty or complexity. Because they are intrinsically complex and
272 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
unpredictable, social stimuli are particularly liable to overload the autistic individual’s coping mechanisms. On this view, emotional dysfunction is primary, and cognitive deficits are secondary consequences of strategies developed to deal with inappropriate and excessive emotion. Intolerance of change in the environment, repetitive behaviour and avoidance of social inputs are all natural strategies to adopt if an individual’s goal is to moderate novelty and complexity. An explanation of autistic deficits as secondary cognitive consequences of primary emotional dysfunction offers two reasons for optimism. First, this view suggests that the cognitive features of the syndrome may be acquired, and hence reversible through training. To employ the terminology we introduced earlier, autistic behaviour may result from failure to spontaneously acquire the natural technology normally employed to regulate social behaviour, without implying that the cognitive system is intrinsically incapable of acquiring it. Second, if the core problems observed in autism result from emotional disturbances, even if these are driven by genetic factors, the prospects of intervention using drugs become much more positive. A drug that replaces lost insights into the mind of other people is improbable; drugs that assist in emotional control are already available and might be therapeutically effective if administered before inappropriate cognitive adaptations to the core emotional condition have taken place. The suggestion that autistic disorders are caused by emotional dysfunction is as old as autism itself. Unfortunately, in addition to proposing an explanation in terms of affective dysfunction, Kanner (1943) also claimed that this dysfunction arose from the early emotional experiences of a child — specifically from the child’s reaction to cold and inexpressive mothers. The subsequent stigmatisation of parents of autistic children was both cruel and unjustified (Rimland, 1964). The realisation that already-suffering parents were inappropriately blamed because of the belief that emotional dysfunction was the primary locus of the autistic deficit had an unexpected consequence. In later debates about autism, the view that emotional dysfunction is not the primary cause but a secondary consequence of a physical disorder of the cognitive system has become transformed into a dogma. However, the case for emotion as the source of autistic deficits remains strong. The DSM-IV (American Psychiatric Association, 1994) criteria for Autistic Disorder include: –
marked impairment in the use of multiple nonverbal behaviours, such as eye-to-eye gaze, facial expression, body postures, and gestures to regulate social interaction
Gaze aversion and emotional dysfunction in autism 273
–
– – – – –
a lack of spontaneous seeking to share enjoyment, interests, or achievements with other people (e.g., by a lack of showing, bringing, or pointing out objects of interest) lack of social or emotional reciprocity encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus apparently inflexible adherence to specific, nonfunctional routines or rituals stereotyped and repetitive motor mannerisms (e.g., hand or finger flapping or twisting or complex whole-body movements) persistent precoccupation with parts of objects
Many of these features either directly indicate emotional dysfunction or can be interpreted as behavioural responses to emotional disturbance. For example, avoidance of eye-contact, preoccupation with detail and stereotyped, repetitive behaviour are all strategies that might be expected in individuals trying to avoid the excessive anxiety that results from sensory and emotional overload (Gillingham, 2000).8 It has been widely reported that some of the major difficulties experienced by autistic individuals are associated with the expression and understanding of emotion (Bemporad, Ratey and O’Driscoll, 1987; Loveland et al., 1989; Hobson and Lee, 1989). However, it is not always acknowledged that stimulus contexts associated with emotion invariably involve the processing of novel and complex social information, and that novel and complex information is liable to cause distress to autistic individuals even when it is not social in nature. Recently, explicit evidence of anxiety problems in autistic children has begun to emerge Amaral and Corbett (in press). Muris et al. (1998) examined the presence of co-occurring anxiety symptoms in 44 children with autism spectrum disorder. The sample included 15 children with autism, and 29 with pervasive developmental disorder-not otherwise specified (PDD-NOS). They found that more than 80% of the children met criteria for at least one anxiety disorder. Gillott et al. (2001) compared high-functioning children with autism to two control groups including children with specific language impairment and normally developing children on measures of anxiety and social worry. Children with autism were found to be more anxious on both indices. Four of the six factors on the anxiety scale were elevated with obsessive-compulsive disorder and separation anxiety showing the highest elevations. Despite the emphasis placed by clinicians and practitioners on emotional disturbance, academic theories have focused almost exclusively upon actual or
274 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
possible deficits in cognition. Presumably, the emotional dysfunction that is such a central feature of the autistic syndrome, rather than being denied, is being neglected on the grounds that it can be accounted for as a secondary consequence of one or more cognitive deficits. However, the assumption that cognitive deficits are primary, and emotional problems secondary, does not appear to be supported by any significant body of evidence. We therefore believe that it is of considerable importance to bring the possibility that autistic deficits are rooted in emotional disorder back into centre field. Accordingly, we have presented the Stress Overload Theory as an additional possibility that can be tested against the cognitive alternatives.
Neuropsychology of autism Three different neurological systems of the brain have been proposed as the site of autistic dysfunction. The frontal lobes, cerebral hemispheres integration and the amygdala. The frontal lobes have been implicated both by theorists who argue that mental state information is processed by generic executive functions (e.g., Frye et al., 1995, 1996) and by Mindblindness theorists (Baron-Cohen, 1994; Happé, 1996). Impairment of executive functions has long been associated with damage to prefrontal areas (e.g., Luria, 1966; Fuster, 1989; Duncan, 1986; Shallice, 1998). There is substantial evidence from brain injury victims, animal lesion studies and functional imaging studies that different aspects of executive functions are subserved by neural systems located in anatomically separated regions of the prefrontal cortex (e.g., Luria, 1966; Fuster, 1989; Robbins, 1996; Shallice and Burgess, 1996). Though such reports are still few in number, imaging studies have suggested that part of the left medial front cortex may indeed be implicated in autism (e.g., Frith, 2001). The amygdala, which is a hemispherically bilateral structure, also shows a different pattern of activation in autistic individuals from that in neuro-typical people when presented with social meaningful stimuli such as faces showing emotional expressions. This has led proponents of the Mindblindness theory of autistic deficits to claim that the amygdala is centrally involved in the cognitive processes underlying normal social behaviour (Baron-Cohen et al., 2000). However, recently reported work by Amaral and Corbett (in press) suggests that damage to the amygdala is not sufficient to produce the social deficits observed in autism.7 Amaral and Corbett observe that “[r]ecent data from studies in [their] laboratory on the effects of amygdala lesions in the macaque monkey are at variance with a fundamental role for the amygdala in social behavior” and
Gaze aversion and emotional dysfunction in autism 275
conclude that “… an important role for the amygdala is in the detection of threats and mobilizing an appropriate behavioral response, part of which is fear. If the amygdala is pathological in subjects with autism, it may contribute to their abnormal fears and increased anxiety rather than their abnormal social behavior” (Amaral and Corbett, in press). Crucially, though Amaral and Corbett’s work confirms the involvement of the amygdala in autistic dysfunction, it suggests that effects of damage to the amygdala are mediated by impairments to subsystems involved in emotion processing, rather than from direct interference with social cognition. The neuropsychological basis for weak central coherence has yet to be precisely specified. Most work within this framework has generally taken the view that weak central coherence results from the nature of the computational processes underlying cognition, rather than being associable with particular cortical structures.8 Accordingly such neuropsychological evidence as is currently associated with this theory does not rule out a primary locus for autistic dysfunction in the brain areas underlying emotion rather than cognition. It has been noted that IM/MNT benefits from its ability to precisely specify the neurological structures (mirror neurons) responsible for imitation behaviour and to assign them a specific cortical location in areas F4 and F5 in the ventral premotor cortex, with the probable involvement of similar structures in the superior temporal sulcus (Gallese and Goldman, 1998). However, the precision with which relevant cortical structures are specified also exposes this theory to disconfirmation, and a recent PET study by Decety, Chaminade, Grèzes and Meltzoff (2002) has produced findings that directly contradict its claims. This investigation found that imitation behaviour is associated with activity in quite different areas of the cortex: “The left inferior parietal is specifically involved in the imitation of the other by the self, whereas the right homologous region is more activated when the self imitates the other… Overall these results favor the interpretation of a lateralisation in the posterior part of the brain related to self versus others-related information respectively in the dominant versus the non-dominant hemisphere.” (Decety, Chaminade, Grèzes and Meltzoff, 2002, p. 271) The conclusions we draw from this review of the neuropsychology of autism is that areas of the prefrontal cortex and the amygdala are probably implicated in the syndrome. These structures undoubtedly play a central role in interfacing the limbic system responsible for emotional experience with systems in the premotor and orbito-frontal cortex that are involved in the formulation of intentions and the generation of action schemas. However, presently
276 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
available neuropsychological evidence is not sufficient to justify the view that dysfunctional cognition causes abnormal emotional experiences, as against the view that inappropriate emotional experience is the cause of cognitive dysfunction. Again, there seems every reason to keep under consideration the possibility that deficits in emotion processing are the primary source of autistic disorder.
Emotion and face recognition Emotion is expressed in distinct facial expressions and particular areas of the face are linked with different emotions such as happiness, surprise, sadness, anger, fear and disgust (Darwin, 1872). Whilst autistic children may understand and express simple emotions such as anger, they typically do not express more complex emotions such as surprise (Dennis et al., 2000) and are significantly impaired at recognising emotions (Hobson, 1986a, 1986b; Weekes and Hobson, 1987). When autistic children are presented with photographs of faces, at least some have been found to have difficulty in using photographs to identify faces and to correctly attribute emotion on the basis of facial expression in photographs (Tantam, Monahan, Nicolson and Stirling, 1989). Autistic children differ from control children in recognising pictured faces when orientation is manipulated (Langdell, 1978), are less able than controls to recognise emotional expressions and are also much less accurate in identifying a right-way-up face (Tantam, Monahan, Nicolson and Stirling, 1989). Autistic children, however, have been found as able as controls to label upside-down faces, and there is a much bigger difference within the control groups between right-way-up faces and upside-down faces than there is within the autistic groups. Tantam, Monaham, Nicholson and Stirling concluded that it was likely that this difficulty would apply to actual faces as well as pictures of faces and might explain some autistic individuals’ difficulty in social interactions. Though high-functioning autistic children are far more able at matching simple emotions than their lower-functioning counterparts (Dennis et al., 2000) they are less able then control participants at identifying emotions when matched for non-Verbal Mental Age (nVMA) (Tantam, Monahan, Nicolson and Stirling, 1989). However, there are no differences between autistic and control groups when they are matched for Verbal Mental Age (VMA) (Ozonoff, Pennington and Rogers, 1990). Celani, Battacchi and Aricidiacono (1999) suggested that this is because the recognition of facial expressions depends upon the use of analytic (or local) processing and holistic (or global) processing. Celani et al. speculated that analytic processing takes place in the left hemisphere,
Gaze aversion and emotional dysfunction in autism 277
information is perceived in terms of its properties and an inferential understanding of facial expression is gained. Holistic processing may depend more upon right hemisphere (Buck, 1984) and involve sub-cortical structures in the limbic system that respond to displays as a whole and hence may enable direct apprehension of emotional meaning. Celani et al. hypothesised that the holistic processing associated with the right hemisphere is impaired or unavailable in autism so that autistic individuals are forced to employ left hemisphere analytic processes to compare faces on the basis of component features of emotional expressions (Celani et al., 1999). ‘The eyes are the window to the soul’ is a statement attributed often to Leonardo da Vinci and, at least for humans, the eyes are indeed extremely important social cues. Direction of gaze is an important non-verbal cue for turn taking in face-to-face conversation and noting the pattern of a person’s gaze can reveal much about their determination of intentions and their mental state. We learn much about the minds of others by observing their eyes; whether they like us, what they are thinking about, and what they want. This has led to the suggestion that gaze interpretation is a form of ‘mindreading’ (Baron-Cohen, 1994): Such mindreading allows the extraction of clues about the focus of another person’s attention, desires, and even beliefs and a lack of sensitivity to gaze may underlie the impairments in social and cognitive abilities that are observed in autism. Humans may have an internal eye direction detector (EDD), which may play a particularly important part in ‘intentionality detection’, perhaps the most important component in a mindreading system (BaronCohen, 1995). Evidence for the existence of an EDD has come from an experiment showing that infants as young as 3 months of age can detect the direction of a person’s gaze using information from the eyes alone Hood, Willen and Driver, 1998). This has been interpreted to support the theory that an EDD mechanism is present in infants from a young age, and thus have access to a mechanism that allows infants to direct their own attention so as to match that of others, a phenomenon sometimes known as ‘joint attention’. Autistic individuals display highly atypical gaze behaviour, particularly with respect to following the eye movements of other people. Hobson et al. (1986a,b) suggest that autistic individuals attend to different features in the face, seeming to make less use of the eye area than do control participants in experimental studies. In early reports ‘gaze avoidance’ was considered a central feature of conditions such as autism and Aspergers (O’Connor and Hermelin, 1967). More recent researchers have suggested that rather than avoiding gaze, autistic individuals may restrict gaze-sampling to quick glances at the eyes of
278 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
others (Volkmar and Mayes, 1990), though Baron-Cohen (1995) appears to believe that gaze in autistic people is unimpaired and Dickerson has pointed out that the claim that autistic children are able to use gaze selectively, requires the assumption that they can use competently to some degree (Dickerson, Rae, Stribling, Dautenhahn, Ogden and Werry, in press). Normal individuals seem to scan other people’s faces by fixating in repeated cycles for approximately a third of a second, dwelling on the eyes longer than other facial areas (Argyle, 1994). The relationship between hypothetical cognitive deficits and atypicalities in gaze and eye-contact is still a poorly understood issue in autism research. Cognitive deficit theories can only explain dysfunctional processing of gaze information as a result of the absence of a theory of mind, inability to employ global processing, insufficient Central Executive resources, or lack of the mirror neurons that allow social behaviour to be interpreted. Another possibility is that “eye-gaze stimuli adversely stimulate the autonomic nervous system in autism, perhaps inhibiting the normal development of a theory of mind. If so, any deficit in autism is likely to be much more low-level and pervasive than a cortical theory of mind module” (Keeley, 2002, p. 4). When autistic children are separated from an adult by a substantial barrier, they look more at that adult than if the barrier were not present even though the adult is in full view and looking at the child in both conditions (MacConchie, 1973, cited in Richer and Cross, 1976). MacConchie suggests that the barrier reduces the probability of a social interaction between the adult and the child, thereby also reducing the threat normally associated with eye contact. Also, children with autism look more at adults and engage in less flight behaviour when the adults have both eyes covered than when the eyes are visible (Richer and Cross, 1976). These studies lend support to the suggestion that avoidance of eye contact is a means of reducing the emotional threat posed by social contact. Tasks requiring the discrimination of the direction of gaze, as those in an imaging study by Kawashima et al. (1999), activate an area in the left amygdala equally in both eye contact and in no eye contact conditions. However, a region in the right amygdala only becames activated during the eye contact condition (Kawashima et al., 1999). These results implicate the amygdala in the processing of gaze information: the left amygdala in the general processing of gaze direction with specific involvement of the right amygdala when another individual directly makes eye contact with the person whose activation levels are being imaged. This reinforces the suggestion that impaired amygdala function may be associated with autism.
Gaze aversion and emotional dysfunction in autism 279
We conclude that, even when matched for nVMA, autistic participants are less able to process facial expressions related to emotions than normal controls. As there is abundant neuropsychological evidence associating impairment of the amygdala with the symptoms of autism, it seems likely that such impairments contribute to the disorder. There is some evidence that a special purpose EDD exists and that this system is present even in very young infants. It is possible that EDD does not develop in autistic children as it does in other children. It still seems to be an open question, however, whether deficits associated with cognitive systems such as EDD are primary deficits, carrying other consequences in train, or whether they are secondary problems, resulting for example from abnormal emotional reactions to social stimuli.
Experiments Below we report two studies designed to discriminate between cognition- and emotion-based theories of autistic deficits. Previous researchers have failed to distinguish between a passive failure to make use of high-value information from the eyes of others (cue blindness) and an active tendency to avoid the gaze of others (cue aversion). Cognitive theories predict cue-blindness: as on these theories, autistic children cannot process social cues there is clearly no reason why they should seek to avoid looking at the areas of the face where such cues are located. Emotion-based theories on the other hand, predict cue aversion: because social cues are associated with excessive anxiety, autistic individuals will avoid processing them. In both cases important information will be lost, and cognitive processing will be impoverished as a result. But the underlying mechanisms are nonetheless quite distinct. The first study we report was intended to investigate whether the sample of autistic children participating in the study do indeed show impaired ability to identify emotional expressions using information from the eye region of photographed faces in comparison with controls. The design required participants to identify emotional expressions from whole faces, faces with eyes deleted, or from eyes only with the remainder of the face deleted. The second study compares the regions of complete faces that autistic and control participants attend to. The logic of the study is this: photographs of faces are briefly presented to participants. Superimposed on the face, in the mouth cheek or eye region, is a heart-shaped target. Participants are required to press a key as soon as they detect the heart shape and reaction times are measured. When the display has terminated, participants are asked where the target was located. If
280 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
cognitive deficit theories are correct, the time take to correctly identify the target location should be equal across locations. Even if autistic participants can’t process social cues, there is no reason why the heart-shape should take longer to detect in the eye region than in the mouth or cheek region. Emotional deficit theories however, predict that autistic participants will selectively avoid attending to the region of face around the eyes because this are is rich in social cues. On these theories, targets located by the eyes should take longer to detect than targets located by the mouth or by the cheek.
Experiment 1 This experiment examines whether autistic children have more difficulty than normal controls and learning disabled controls at identifying emotions from pictures of the eyes alone. The accuracy of emotion identification is compared using eyes-only displays, whole-face displays and no-eyes displays. It is hypothesised that the autistic group will fare particularly badly in the eyes-only condition.
Method Participants Three groups of participants were used for this experiment. The first group (n = 17) were attending a special school and all had a diagnosis of autism using established criteria. The second group (n = 14) all had learning disabilities and also attended a special school. The third group (n = 18) were control children attending a mainstream school. Participants were matched for non-verbal mental age (nVMA). All participants were males aged between 5 and 16 from one of two schools in South Essex. As autism affects more males than females (ratio 4:1) the study was confined to males participants in order to avoid having to deal with the problem of what proportion of female participants should appear in the study. Learning-disabled controls were used alongside normal controls because 75% of autistic individuals have a learning disability as well and indeed, learning disability was a criterion for acceptance at the school where the study was carried out. Children with severe learning disabilities, Attention Deficit Disorder (ADD), Attention Deficit Hyperactivity Disorder
Gaze aversion and emotional dysfunction in autism 281
(ADHD) and very low-function autistic children were excluded from the study, as they would have found difficulty in completing the tasks employed. Experiments 1 and 2 employed the same participants, but for any particular participant there was an interval of at least two weeks between experimental sessions. This interval allowed each session to be kept as brief as possible, as well as minimising the risk of transfer effects and between-group differences in attention or memory span.
Apparatus A laptop computer running a Superlab programme was used to present faces from Ekman and Friessen (1978). Pictures of faces displaying emotion were presented with only the eyes visible (‘just-eyes’ condition), everything but the eyes visible (‘no-eyes’ condition) or the whole face visible (‘whole-face’ condition). Five emotional expressions were used as targets — happy, sad, angry, afraid and surprised. Ravens Progressive Matrices (standard) was used as the nVMA test. Four different faces were used in a counterbalanced design with each participant viewing two of the four available faces (Appendix 4).
Procedure The experiment was preceded by a training session intended to ensure that all participants were equally competent at identifying facial expressions of emotions and attaching verbal labels to them. Participants were seated in front of a laptop computer and told that they would see a picture of a face followed by a second screen showing five more faces. The sets of five faces were made up of the original facial expression and four different facial expressions. Participants were then asked to name the member of the set of five which looked most like the previous face. The experimenter then named the five different emotions illustrated in the second screen. The display and naming exercise was repeated as often as necessary until participants could confidently identify the emotions. The training phase was useful in familiarising participants with the task and in ensuring that between-group differences in emotion recognition using whole and partial faces was not due to pre-existing differences in recognising or labelling the expressions presented. A pilot study run on five children showed that to prevent the prevent the training phase from requiring any more than a trivial amount of new learning, some synonymous expressions would have to be accepted as equivalent to specific
282 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
emotion words for some participants. For example, “smiley” for happy, “frightened” for afraid, “cross” for angry, “shocked” for surprised and “upset” for sad. Participants were then told that some of the pictures would show all of a face, some would show just the eyes and others would have the eyes blacked out so that you can’t see them. If they were unsure they were instructed to guess. They were then asked if they had any questions. The programme displayed 30 pictures on the laptop screen (2 × face identities × 5 emotions × 3 display types). The pictured faces showed one of five different emotions (happy, sad, angry, afraid or surprised). Each face was displayed for 2000ms, and was followed by a screen showing the five emotions in words with a schematic picture next to it from a standardised bank of emotional expressions used by the school for autistic and learning disabled children. This acted as a prompt for the answer. A set of practice examples was administered first with a gap afterwards for questions or concerns. If the participant said they did not know, they were encouraged to guess. The participant gave the experimenter their answers and the experimenter entered the response on the computer. This was to make the experiment accessible to as many children as possible including some lower functioning autists. Reaction times were not recorded in Experiment 1.
Results Table 1 shows the mean number of emotional expression correctly identified by autistic, learning disabled and control participants. Table 1.Mean number of emotional expressions correctly identified by autistic, learning disabled and control participants. All figures are out of a maximum of 10. Standard deviations are in brackets. Display Type Group
Eyes only
No eyes
Whole face
Autistic (n = 17)
(3.00 (0.37)
(4.41 (0.43)
(4.82 (0.45)
Learning Disabled (n = 14)
(5.14 (0.41)
(1.92 (0.47)
(3.36 (0.50)
Control (n = 18)
(7.50 (0.36)
(5.78 (0.42)
(7.44 (0.44)
Gaze aversion and emotional dysfunction in autism 283
Analysis of Variance was performed on the mean number of correct responses with display type as a within-participant factor and group as a between-participant factor. There was a significant main effect of display type (F = 10.155, df = (2,45), p = 0.001) and of group (F = 33.201, df = (2,46), p = 0.001). There was also a significant display type × group interaction (F = 9.463, df = (4,90), p = 0.001). Post hoc analyses were conducted to further examine these effects.
Autistic Group For the autistic group there was a significant main effect of display type (F(2,15)=6.631, p=0.009). Performance was found to be worse in the eyes-alone condition (mean = 3) than in the no-eyes condition (4.41 — F(1,16) = 8.727, p = 0.009) or in the whole-face (4.82) condition (F(1,16) = 11.828, p = 0.003). However there was no difference between no-eyes (4.42) and whole face conditions (4.82 — F(1,16) = 0.662, p = 0.442). These findings suggest that autistic children find it more difficult to identify emotion when only the eyes are presented as cues to emotional expression.
Learning Disabled Control Group For the learning disabled controls there was a significant main effect of display type (F(2,12) = 11.706, p = 0.002). Performance was superior with eyes (5.14) than no-eyes displays (1.92 — F(1,13) = 25.288, p = 0.001) or whole-face displays (3.36 — F(1,13) = 11.085, p = 0.005). Furthermore recognition of emotion was superior with whole-face displays (3.36) than no-eyes displays (1.92 — F(1,13) = 7.514, p = 0.017).
Normal Control Group For the normal control group there was a significant main effect of display type (F(2,16) = 8.413, p = 0.003). Superior emotion recognition was found with eyes (7.5) than no-eyes displays (5.78 — F(1,17) = 13.038, p = 0.002). However eyes (7.5) could not be distinguished from whole-face displays (7.44 — F(1,17)=0.027, p = 0.871). Furthermore normal controls performed better with whole face (7.44) than no eyes (5.78 — F(1,17) = 17.0, p = 0.001). Further analysis examined differences between the three participant groups for the different types of displays. For the eyes-only displays the autistic group
284 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
(3) performed significantly worse than learning controls (5.14 — p = 0.001) or normal controls (7.5 — p = 0.001). Learning disabled controls also performed worse than normal controls (p = 0.001). For the no-eyes displays the autistic group (4.41) performed better than learning disabled controls (1.92 — p = 0.001) and their performance on these displays was only slightly worse than normal controls (5.78) — a difference which verged on significance (p = 0.082). However the learning disabled controls were worse at this task than the normal controls (p = 0.001). For the whole face displays, the autistic (4.82) and learning disabled controls (3.36) could not be distinguished (p = 0.113). However the normal controls group (7.44) performed better than the autistic (p = 0.001) or learning controls (p = 0.001).
Discussion The autistic children’s performance was found to be worse in the eyes-alone condition than in the no-eyes condition or in the whole-face condition. However there was no difference between no-eyes and whole-face conditions. These findings suggest that autistic children find it more difficult to identify emotion when only the eyes are presented as cues to emotional expression. By contrast, for the learning disabled control group performance was found to be best for the eyes-only displays, followed by whole-face displays and lastly no-eye displays. This group were most successful when they could rely on the eye region for successful categorisation of emotion — however the whole face displays may have provided extra information that distracted these participants from the all important eye region, perhaps by providing extra cues. The normal control participants performed best in the conditions when eyes were presented (eyes-only and whole-face), they were much poorer in the no-eyes condition. Unlike the learning disabled control condition however performance in the eyes and whole-face conditions could not be distinguished suggesting that the extra information provided in the whole display condition did not confuse these participants as much as it did the learning disabled control participants. The procedure used in Experiment 1 evidently differs from real life emotion identification in that recognition decisions are based on images of faces that are both still and incomplete, hence it is legitimate to doubt whether the findings of the study generalise to emotion identification in real life. However this may be, the experimental procedures employed in the present study appear to have been successful in demonstrating real differences between autistic and control
Gaze aversion and emotional dysfunction in autism 285
participants. Furthermore, the observed differences seem to be in line with expectations based on previous research, including studies that were not as experimentally constrained as the present study. The findings of the present study support those of Tantam et al. (1989) who found that when matched for nVMA the autistic participants are less able than control participants at emotion recognition tasks. Celani et al. (1999) had previously found that autistic individuals process information at a more elemental level than control participants do. In the present study the autistic individuals did almost equally well in the ‘no eyes’ and ‘whole face’ conditions suggesting that they only use part of a face to recognise emotion. It is possible that they employed a feature-based approach to emotion recognition as suggested by Hobson et al. (1988), similarly the data are consistent with the hypothesis that the normal controls used a more holistic approach. In summary, autistic participants were less accurate at emotion recognition using just the eyes than were the control participants, supporting the hypothesis that autistic children gain less information about emotion from the eye region than normal or learning disabled control children.
Experiment 2 The second experiment sought to investigate the way in which autistic children pay attention to facial features by measuring the time taken to identifying a shape when it is sometimes superimposed on the eye region of a pictured face, sometimes on the cheek, and sometimes on the mouth. The study examined the effects of group (autistic, learning disability control and normal control) on the time taken to locate a heart shape superimposed on the eye, cheek or the mouth region of facial displays.
Method Participants The same 49 participants took part in Experiments 1 & 2.
286 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
Procedure Participants were told that they would be shown more pictures of faces but this time each picture would have a heart shape hidden somewhere on the face (see Appendix 1 for examples). The experimenter then showed the participant an example of the heart shape. They were told to press the space bar as soon as they say the shape and then to tell the experimenter where it was. They were told that the shape would be on the eye, the mouth or the cheek. Participants were also informed that the experimenter wanted to see how fast they could respond and so they should press the space bar as soon as they saw the heart. At this point any questions participants had were answered and they were given a few practice trials to ensure they understood the requirements of the study. The experiment, which consisted of 12 pictures shown sequentially on the screen, then proceeded. All participants were 100% accurate at locating the heart shape.
Results Table 2 presents mean response times for locating a superimposed heart shape on the eye, mouth and cheek regions of pictured faces for the 3 groups of participants. A two-factor Analysis of Variance with group as a between participants factor and display type as a within participants factor was carried out on the reaction time data. Significant main effects of display type (F(2,45) = 4.455, p = 0.017) and of group (F(2,46) = 8.096, p = 0.001) were found. The interaction between group × display verged on significance (F(4,90) = 2.263, p = 0.069).
Table 2.Mean reaction times (in milliseconds) for locating a superimposed shape on pictured face by autistic, learning-disabled and control participants. Standard deviations are in brackets. Display Type Group
Eye
Mouth
Cheek
Autistic (n = 17)
2575.5 (184.6)
1902.1 (133.0)
2600.9 (202.9)
Learning Disabled (n = 14)
1912.4 (203.5)
1490.5 (146.6)
1617.7 (223.7)
Control (n = 18)
1762.6 (179.5)
1789.4 (129.3)
1627.2 (197.4)
Gaze aversion and emotional dysfunction in autism 287
Post hoc analyses were conducted to further examine these effects.
Autistic Group For this group the main effect of display type was found to be significant (F(2,15) = 6.15, p = 0.01). The autistic group took significantly longer to locate the shape in the eyes condition (2575.5) than in the mouth condition (1902.1 — F(1,16) = 12.641, p = 0.003). However there was no significant difference between the eye (2575.5) and cheek conditions (2600.9 — F(1,16) = 0.08, p = 0.931). They also took longer in the cheek condition (2600.9) than in the mouth condition (1902.1 — F(1,16) = 4.831, p = 0.043).
Learning Disabled Control Group For this group the main effect of display type was not significant (F(2,12) = 1.928, p = 0.188).
Normal Control Group For this group the main effect of display type was not significant (F(2,16) = 1.413, p = 0.272). Further post-hoc analysis sought to compare the three participant groups in terms of their response times to the different types of displays. For the eyedisplays, there was a main effect of group on response times (F(2,46) = 5.547, p = 0.007). Autistic participants were notably slower at responding to eyestimuli with responses slower than both learning disabled controls (2575.5 vs. 1912.4 — p = 0.063) and normal controls (2575.4 vs. 1762.6 — p = 0.011). However the learning disabled controls and the normal controls could not be statistically distinguished (p = 0.858). For the mouth-displays the effect of group was not significant (F(2,46) = 2.30, p = 0.112). Finally, for the cheek-displays the effects of group was significant (F(2,46) = 6.80, p = 0.003). Response times for the autistic children in this condition (2600.9) were significantly longer than either learning disabled controls (1617.7 — p = 0.013) or normal controls (1627.2 — p = 0.008). Learning disabled controls and normal controls could not be statistically distinguished (p = 1.00).
288 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
Discussion Autistic children identified the heart shaped target fastest in the mouth condition. They were significantly slower in the eyes and in the cheek condition. In other words, the further away from the eyes the target was located, the more quickly it was detected. For the learning control group and the normal control group there was no effect of shape location on response times. For the eye displays, the autistic group responded more slowly than either control group, which did not differ from one another in terms of response times. For the mouth displays, the response times of the three participant groups could not be statistically distinguished. For the cheek displays there was also an effect of group, with autistic children being significantly slower to respond than either control group. The between group differences in response to display type are important. Because the stimuli were semi-naturalistic, visual contrast between the superimposed heart shape and background were not completely identical across facial areas, the complexity of the background probably differed also, and the heart shapes were located at different spatial positions on the screen (e.g., the shape located over the eye was physically higher than that located near the mouth). If all participant groups had found the eye displays to be more difficult, the explanation would probably lie in physical differences between this and the other display types. However, the fact that specific eye display difficulty was confined to the autistic group suggests that physical aspects of the stimuli are not responsible. It remains possible that autistic participants have some hitherto unsuspected difficulty with low contrast, or high location stimuli, but at present this possibility is gratuitous. By contrast, the prior research evidence that autists have difficulty in processing information associated with gaze and eye location is compelling. The findings are consistent with the hypothesis that autistic children are gaze averse and actively avoid looking at the eye region of faces. Nether control group showed any difference between the three conditions indicating they have neither preference nor aversion for the eye (or indeed to any other) region of faces. Langdell (1978) carried out an early investigation of the behaviour of autistic participants using whole photographs of faces and parts of photographs. The ability of autistic children to sort photographs of simple emotions (whole photographs & half face photographs) was examined. Both control and autistic groups were able to sort the full photographs and those of the lower halves of faces but the autistic group were impaired in sorting the photographs of the
Gaze aversion and emotional dysfunction in autism 289
upper half of the face. The results of the present study are consistent with Langdell’s findings in that reaction times for the upper half of the face (the eye and cheek conditions) were significantly longer than for the lower half of the face (the mouth condition). In the light of theoretical discussion presented earlier, this evidence that there is positive gaze aversion in autistic children seems to support hypotheses based upon emotional dysfunction rather than cognitive deficits. Whilst cognitive deficit theories differ in the precise mechanism that is claimed to be defective, all maintain that autistic people are unable to process certain kinds of information in displays. It is difficult to understand why autistic individuals should selectively avoid sources of information to which they are effectively, blind. Emotion-based theories, such as the Stress Overload Theory, on the other hand, predict exactly the outcome we have reported: autistic people will seek to avoid novel and complex aspects of the stimulus environment that threaten to overwhelm their cognitive coping strategies. In the case of human faces, this would be expected to result in avoidance of the eye region because of the complexity and unpredictability of gaze information.
General discussion A great deal of information is obtained from the eyes of normal individuals. The eyes are used to assess whether someone is being truthful, whether their smile is genuine and what they are planning to do next. The eyes may also be used to flirt, to share a joke, or to discretely direct the attention of another person. (Vertegaal, Slagter, Van der Veer and Nijholt, 2000; Argyle and Cook, 1976; Kendon, 1967). Autistic individuals conspicuously fail to display these abilities under ordinary conditions, though there is no positive evidence that they do not possess them and some evidence that they do demonstrate competence in directing gaze (Dickerson, Rae, Stribling, Dautenhahn, Ogden and Werry, in press). To the extent that the latter is true, the selective use of gaze observed in people with autism may be under some volitional control. Baron-Cohen (1994) believes that an important function of human eye and gaze monitoring is the determination of mental states, or what he calls ‘mindreading’. He also suggests that a mechanism called the ‘Eye Direction Detector’ is absent or defective in autistic individuals, resulting in a lack of sensitivity to gaze and to impairments in social and cognitive abilities (Baron-Cohen, 1995). The data from the present study strongly suggest that these impairments are a
290 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
secondary consequence of a primary emotional dysfunction. Autistic individuals do not cognitively process social information because they avoid the sources of such information as part of a stress management strategy. There is considerable evidence linking dysfunction of the amygdala to the symptoms of autism. As mentioned earlier, the experiment by Kawashima et al. (1999) associates the left amygdala in the general processing of gaze direction but implicates the right amygdala when eye contact with another individual is made. The authors conclude that these results demonstrate the involvement of the human amygdala in the processing of social judgements of other people based on the direction of their gaze. The amount and type of activity in the cerebellar, mesolimbic and temporal lobe regions of the brain has also been claimed to differ significantly between autistic participants and participants when processing facial expressions (Critchley et al., 2000). This seems compatible with the suggestion that the circuits controlling the flow of information between emotion centres and the frontal cortex may be the primary source of dysfunction in autistic people. The review of theoretical analyses of autistic deficits, the review of the neuropsychological evidence, and the new data that we report all seem consistent with the suggestion that the primary dysfunction underlying autistic disorders is associated with the emotional systems rather than the cognitive systems of the brain. This is encouraging for two reasons. Firstly, theories that postulate innate cognitive modules, such as a Theory of Mind module have a poor track record in neuropsychology (e.g., Fodor, 1983). Such theories, along with others claiming generic cognitive dysfunction are particularly fragile in the case of autism as the condition does not emerge until the age of three or later, and the evidence for defects present from birth is necessarily inferential. Though patterns of selective sparing and loss associated with adult brain damage confer plausibility on the suggestion that cognitive processes have some kind of modular organisation in adults, this provides no basis at all for believing that cognitive modules are present from birth. Indeed, Lindsay and Gorayska (2002) have argued that modular structure may well result from mindware organisation: the problem spaces that underlie goal-oriented action planning effectively become independent modules if they share neither goals, nor plan elements, this functional independence is described by Lindsay and Gorayska as ‘relevance discontinuity’. Secondly, if the social and cognitive deficits associated with autism are not themselves genetically determined, but result from adaptation to emotional dysfunction, it may be useful to regard them as the result of defective natural technology (Meenan and Lindsay, 2002; El Ashegh an Lindsay, this volume; Ramachandran, Undated MS).
Gaze aversion and emotional dysfunction in autism 291
Natural technology appears to exist in both spontaneously developed and induced forms. Speech acquisition for example, involves mindware representations of grammatical, semantic and pragmatic relationships, but these do not have to be explicitly taught, indeed they are acquired even in considerably impoverished learning environments (Lenneberg, 1967). Writing, on the other hand, is a natural cognitive technology the development of which must be induced by explicit teaching. Induced natural technology is always likely to be readily modifiable as external intervention is built into the acquisition mechanism. Social interpretations of gaze and the visual signs of emotion seem likely to develop spontaneously, rather than to require explicit teaching. Competence in these areas may therefore be susceptible to relatively limited intervention. However, all forms of natural technology are cognitive adaptations and therefore, to some degree modifiable. From the natural technology perspective, the cognitve pathology associated with autism is caused by deficient mindware, and in principle it should be possible to modify the mindware so as to eliminate the pathology. It is certainly worthwhile to continue to evaluate explicit teaching procedures, such as computer-based interventions, that are likely to facilitate the direct acquisition of appropriate natural technologies in people with autism. Perhaps, in the light of the data reported above, new interventions can be developed that focus on emotional dysfunction as the source of faulty mindware. Early intervention might be able to limit initial acquisition of dysfunctional natural technology. For example, control of aversive aspects of the autistic child’s environmental, such as novelty and complexity might assist the development or re-engineering of social information-gathering and interaction strategies. Additionally, materials that induce as little emotion as possible should be used in explicit teaching of the cognitive strategies underlying gaze and the interpretation of emotional cues. This is almost the converse of effective teaching strategies in most contexts.
Notes 1. Wing and Potter (2002) quote a prevalence rate of up to 60 per 10, 000 for autism, and it is likely that that the figure is even higher for the whole autistic spectrum. The rate of autism in males is approximately three times greater than in females. 2. Rimland (1966) was one of the first investigators to suggest that autism results from physical causes such as brain damage, though his specific proposal that relevant damage took the form of retrolental fibroblasia resulting from postnatal oxygen administration was rapidly discredited.
292 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
3. Mindblindedness theory (also known as ‘lack of a theory of mind’) is perhaps the most widely known cognitive theory offered to explain autism (Premack and Woodruff, 1978; Baron-Cohen 2002; 2003). The central claim underlying the idea of mindblindness is that a special-purpose module in the brain is responsible for the human ability to understand the intentions and behaviour of others. In autistic individuals this module is absent or damaged. Some difficulty or abnormality in understanding other people’s points of view is not seen as the only psychological feature of the autistic spectrum, but it is taken to be the core feature and [?that] appears to be universal among individuals with autism. The followers of this theory attribute complete mind-blindness, or total lack of a theory of mind only to extreme cases of autism. More commonly, a basic understanding of others’ mental states is supposed to be available to autistic individuals, but not at the level that one would expect from measured ability in other areas. The higher prevalence of autism amongst males is explained by suggesting that male brains have evolved under selection pressure to systematise and categorise the environment, whilst female brains have evolved under selection pressure to support empathy functions (Baron-Cohen 2002; 2003). On this view, autism is the result of an “extreme male brain”. 4. Executive Dysfunction Theory (Rumsey and Hamburger, 1988; Hughes, Russell and Robbins, 1994) claims that the autistic deficit is associated with the Central Executive — a hypothetical cognitive control system widely believed to underlie problem solving, planning processes and the generation of novel responses, and the suppression of irrelevant or intrusive behaviours (Baddeley, 1986; 1990). Central Executive activity is believed to be associated with the prefrontal cortex, and a key motivation for Executive Dysfunction Theory is the observed similarity between autistic individuals and patients with frontal lobe injury. On this theory difficulties in initiating and sustaining social interaction result from more fundamental cognitive failure in activities such as attention shifting, planning, flexible thinking, and disengaging from current stimulus control. There is also some evidence that autistic children do poorly on tests such as the Wisconsin Card Sorting Test and the Tower of London Task that are claimed to be sensitive to prefrontal injury (Prior and Hoffmann, 1990). The common claim that autistic children are incapable of, or incompetent at, deception may be explained as part of the executive dysfunction because the Central Executive would be expected to play a major part in constructing and maintaining a cognitive model of the world as it is falsely represented to be. Inability to comprehend that other people have incorrect knowledge, or knowledge different from one’s own personal beliefs are similarly claimed to be due to difficulty in inhibiting ‘reality-based’ responses. 5. Weak Central Coherence theory (Frith and Happé, 1994; Happé, 1999) is a cognitive theory of autism based on the suggestion that there is a deficit at the level of basic perceptual processes, resulting in a failure to construct or effectively use gestalt information. Autistic individuals are supposed to be unable to see the wood for the trees. The term ‘Weak Central Coherence’ is used to explain the finding that autistic children often show an obsessive preoccupation with minute sensory detail; with parts of a stimulus rather than the whole. This theory is sometimes linked to the view that the left cerebral hemisphere analyses detail, whilst the right hemisphere constructs gestalts, suggesting that the failure of central coherence is a consequence of hemispheric imbalance or inter-hemispheric communication and control. Autistic children often show islets of normal or even superior ability in specific
Gaze aversion and emotional dysfunction in autism 293
areas such as mathematics and drawing, despite a generally retarded cognitive profile and low IQ. Weak Central Coherence explains these clinical findings in terms of the completely different way that in which the autistic perceptual system processes the incoming stimuli. 6. The most recent theory of autistic dysfunction, the Imitation Deficit/Mirror Neuron Theory (IM/MNT), does not emphasize cognitive mechanisms but links behaviour directly to neurological structures. The basic idea was that failure to develop a theory of mind could be accounted for by the lack of the ability to imitate others (Rogers and Pennington, 1991). Other aspects of autistic spectrum disorders, such as general social deficits, echolalia in speech, and stereotyped and repetitive behaviour can also be seen as the result of imitation when it is inappropriate, or imitation failure when it is helpful. This psychological theory has attracted much more interest however since the identification of mirror neurons in area F5 of the prefrontal cortex of monkeys (Gallese et al. 1996; Rizzolatti et al. 1996). Mirror neurons (tellingly labelled ‘Monkey see, monkey do’ cells (Carey, 1996)) are action-coding neurons that fire when a monkey performs a specific action, and, more remarkably, when the monkey sees the same action performed by another monkey or a human. Other related classes of neurons seem to code goal-directed actions and respond selectively to goal-driven body movements, such as reaching for or manipulating an object (Williams et al., in press). It has been speculated that the part of the monkey cortex containing the mirror neurons that deal with hand actions now subserves speech in humans, acting as a bridge between one’s own speech and actions and the utterances and behaviour of others (Rizzolatt and Arbib, 1998). The connection between mirror neuron deficits and imitation failure as a theoretical basis for autism has been made by a number of researchers including Ramachandran (undated) and Williams et al. (in press). Whereas the more cognitive theories of autism have found themselves casting about retrospectively for some physical basis in brain processes, IM/MNT has the considerable advantage of a built-in neuropsychological mechanism. 7. The amygdala is made up of the medial nucleus, the lateral/basolateral nuclei, the central nucleus, and the basal nucleus. The amygdala is located in the temporal lobe and appears to act as some kind of way-station between the prefrontal cortex and the limbic system. There is little doubt that the central nucleus of the amygdala plays a crucial role in the process of determining which complex environmental stimuli elicit emotional responses such as fear. Remove the amygdala from a monkey, for example and it loses its fear of snakes (Kolb and Whishaw, 2003; Damasio, 1994; 2000; Le Doux, 2000). From the amygdala fibres project to and from the prefrontal cortex and the areas of the brain responsible for the expression of various emotional responses. Damage to the central nucleus (or the lateral/basolateral nuclei which provides the central nucleus with sensory information) reduces or abolishes a wide range of physiological reactions and emotional behaviours (Carlson, 1998). Morphological anomalies have been reported in the amygdalas of autistic individuals (Bauman and Kemper, 1985) and structural Magnetic Resonance Imaging (sMRI) techniques have also found reduced amygdala volumes in high-functioning autistic individuals compared with normal participants (Abell et al. 1999, cited in Adolphs et al. 2001). Surgical lesions in the amygdala of monkeys has been claimed to produce behaviour similar to that of autistic children (Bachevalier, 1991) suggesting that abnormal functioning of this area may account for some of the symptoms commonly found in autistic individuals.
294 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
Support for the amygdala hypothesis has been sought by looking at participants with amygdala lesion and with functional imaging of the amygdala in normal individuals (Adolphs, Sears and Piven 2001; Baron-Cohen, Ring, Bullmore, Wheelwright, Ashwin and Williams, 2000; Adolphs, Tranel, Damasio and Damasio, 1994; Morris et al. 1996). Studies such as those cited have generally been taken to support the idea that the amygdala has an important function in the recognition of emotion from facial expressions and may be involved in complex social judgements such as judging the trustworthiness of a person. However, a more recent study by Amaral and Corbett (in press) has found that surgical lesions in monkeys more precisely confined to the amygdala than those described by Bechevalier (1991), and specific damage to the amygala in humans (cases SM and HM) does not produce the abnormal social behaviour found in autistic individuals (Amaral & Corbett, in press). 8. Weak Central Coherence Theory was briefly associated with a bold claim by Marshall and Fink that the right hemisphere of the brain operates in global processing mode, while the left hemisphere is specialised by local processing. This claim was apparently supported by functional imaging data generated during an object recognition task. Unfortunately a later attempt to reproduce these findings using letter recognition produced exactly the reverse pattern of results to those predicted by the theory (Fink, Marshall, Halligan, Frith, Frackowiak and Dolan, 1997). Subsequent work within this framework has generally taken the view that weak central coherence results from the nature of the computational processes underlying cognition, rather than being associable with particular cortical structures (de Carvalho, Ferreira and Fiszman, 1999; Thagard and Verbeurgt, 1998; O’Loughlin and Thagard, 2003). This immediately presents the difficulty of explaining why deficits are restricted to particular domains such as social cognition. One option is to claim that social cognition is particularly dependent upon global processing, but there is little theoretical justification for this view. At best, it might be expected that social information processing would be only selectively impaired, and that dysfunction in non-social cognition would also be present in tasks requiring global processing. However, this would predict a more distributed pattern of deficits than those so far reported in autistic individuals. Another option would be to argue that autism is associated with a dual deficit: a mindblindness problem and a central coherence problem. This is somewhat unparsimonious and requires strong evidence that neural computation operates differently in autistic individuals and in normals. Such evidence is not yet available. 9. Emotional disorder and vulnerability to sensory over-stimulation receive little emphasis in the recent academic literature on autism which has focused almost exclusively on sociocognitive issues such as intentions and beliefs. This is in marked contrast to the emphasis found in reports by autism sufferers themselves or produced by practitioners and carer support networks. A web search carried out on 16th June 2003, using the search terms ‘autism + over-stimulation’ on the Google search engine, generated 1003 items few or none of which were from academic sources, and most of which were from individuals or agencies concerned with the home or clinical management of autism.
Gaze aversion and emotional dysfunction in autism 295
Appendix 1
296 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
References Abell, F., M. Krams, J. Ashburner, K. Friston, R. Frackowiak, F. Happé, C. Frith & U. Frith (1999). The Neuroanatomy Of Autism: A Voxel-Based Whole Brain Analysis of Structural Scans. Neuroreports 10, 1647–1651. Abelson, R. (1963). Computer simulation of ‘hot’ cognition. In S. Tomkins & S. S. Messick (Eds.), Computer Simulation of Personality, pp. 277–98. New York: Wiley. Adolphs, R., D. Tranel, H. Damasio & A. Damasio (1994). Impaired Recognition Of Emotion In Facial Expressions Following Bilateral Damage To The Human Amygdala. The Journal Of Neuroscience 15, 5879–5892. Adolphs, R., L. Sears & J. Piven (2001). Abnormal Processing Of Social Information From Faces In Autism. Journal of Cognitive Neuroscience 13(2), 232–240. Amaral, D. G. & B. A. Corbett (in press). The Amygdala, Autism and Anxiety. In M. Rutter (Chair) Autism: neural basis and treatment possibilities, Novartis Foundation Symposium 251. New York: Wiley. Web version available at: http://psych.colorado.edu/ ~munakata/csh/Novartis_paper_6-12-02.doc (accessed 14th June 2003). Alcade, C., J. I. Navarro, E. Marchena & G. Ruiz (1998). Acquisition of basic concepts by children with intellectual disabilities using a computer-assisted learning approach. Psychology Reports 82(3 Pt 1), 1051–6. American Psychiatric Association (1994). Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition — DSM-IV). Washington D.C: American Psychiatric Association. Argyle, M. (1994). Bodily Communication, second edition, London: Routledge. Argyle, M. & M. Cook. (1976). Gaze and Mutual Gaze. London: Cambridge University Press. Bachevalier, J. (1991). An Animal Model For Childhood Autism. In: C. A. Taminga & S. C. Schulz (Eds.), Advances in Neuropsychiatry and Psychopharmacology, Volume 1: Schizophrenia Research. New York: Raven Press. Baddeley, A. D. (1990). Human memory: Theory and practice. Oxford, Oxford University Press. Baddeley, A. D. (1986). Working memory. Oxford: Clarendon Press. Badner, J. & E. Gershon. (2002). Regional meta-analysis of published data supports linkage of autism with markers on chromosome 7. Molecular Psychiatry 7, pp. 56–66. Bailey A., A. Le Couteur, I. Gottesman, P. Bolton, E. Simonoff, E. Yuzda & M. Rutter. (1995). Autism as a strongly genetic disorder: evidence from a British twin study. Psychological Medicine 25(1), 63–77. Bailey, A., S. Palferman, L. Heavey, & A. Le Couteur. (1998). Autism: The phenotype in relatives. Journal of Autism and Developmental Disorders 2, 369–392. Baron-Cohen, S. (1994). How To Build A Baby That Can Read Minds. Cahiers de Psychologie Cognitive 13, 513–552. Baron-Cohen, S. (1995a). Mindblindness. Cambridge MA: MIT Press. Baron-Cohen, S. (2002). The extreme male brain theory of autism. Trends in Cognitive Sciences 6 (6), 248–254. Baron-Cohen, S. (2003). The Essential Difference: Men, Women and the Extreme Male Brain. London: Allen Lane, The Penguin Press.
Gaze aversion and emotional dysfunction in autism 297
Baron-Cohen, S., H. Tager-Flusberg & D. J. Cohen (Eds.) (2000). Understanding Other Minds: Perspectives From Developmental Cognitive Neuroscience. Oxford: Oxford University Press. Baron-Cohen, S., H. A. Ring, E. T. Bullmore, S. Wheelwright, C. Ashwin & S. C. R. Williams (2000). The Amygdala Theory Of Autism. Neuroscience & Biobehavioural Reviews 24, 355–364. Bauman, M. & T. L. Kemper (1985). Histoanatomic Observations of the Brain in Early Infantile Autism. Neurology 35, 866–874. Bemporad, J. R., J. J. Ratey & G. O’Driscoll (1987). Autism and Emotion: An ethological theory. American Journal of Orthopsychiatry 57(4), 477–485. Bettelheim, B. (1967). The Empty Fortress: Infantile autism and the birth of the self. New York: The Free Press. Bishop, D. V. M. (1989). Autism, Asperger’s syndrome and semantic-pragmatic disorder: Where are the boundaries? British Journal of Disorders of Communication 24, 107–121. Buck, R. (1984). The Communication of Emotion. Guildford Press. New York. Carey, D. P. (1996). ‘Monkey see, monkey do’ cells. Current Biology, 6, 1087–88. Carlson, N. R. (1998). Physiology of Behaviour (6th Edition). Boston, MA: Allyn & Bacon. Celani, G., M. W. Battacchi & L. Arcidiacono (1999). The Understanding of the Emotional Meaning of Facial Expressions in People with Autism. Journal of Autism and Development Disorders 29(1), 57–65. Chen, S. H. & V. Bernard-Opitz (1993). Comparison of personal and computer-assisted instruction for children with autism. Mental Retardation 31(6), 368–76. Clark, A. (2000). Mindware: an introduction to the philosophy of cognitive science. Oxford: Oxford University Press. Critchley. H. D. & E. M. Daly, E. T. Bullmore, S. C. Williams, T. van Amelsvoort, D. M. Robertson, A. Rowe, M. Philips, G. McAlonan, P. Howlin & D.G. Murphy. (2000). The functional neuroanatomy of social behaviour: Changes in cerebral blood flow when people with autistic disorder process facial expressions. Brain 123(Pt 11), 2203–12. Damasio, A. R. (2000). A second chance foe emotion. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion, pp. 12–23. New York: Oxford University Press. Damasio, A. R. (1994). Descartes’ Error: Emotion, Reason, and the Human Brain. New York: Putnam. Darwin, C. (1872). The Expression of the Emotions in Man and Animals. London: Murray. (Cited in M. Dennis, L. Lockyer & A. L. Lazenby (2000). How High-Functioning Children With Autism Understand Real And Deceptive Emotion. Autism 4(4), 370–381.) Dautenhahn, K., I. Werry T. Salter & R. te Boekhorst (2003). Towards Adaptive Autonomous robots in Autism Therapy. IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA’03), Kobe, Japan, July 2003. de Carvalho, L. A. V., N de C. Ferreira & A. Fiszman (1999). A neurocomputational model for autism. Proceedings of the IV Brazilian Conference on Neural Networks, IV Congresso Brasiliero de Redes Neurais, July 20–22, 1999 — ITA, São José dos campos — SP — Brazil, pp. 344–349. Decety, J., T. Chaminade, J. Grèzes & A. N. Meltzoff (2002). A PET exploration of the neural machanism involved in reciprocal imitation. NeuroImage 15, 265–272.
298 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
Dennis, M., L. Lockyer & A. L. Lazenby (2000). How High-Functioning Children With Autism Understand Real And Deceptive Emotion. Autism 4(4), 370–381. Dickerson, P., J. Rae, P. Stribling, K. Dautenhahn, B. Ogden & I. Werry. (in press). Autistic children’s co-ordination of gaze and talk: re-examining the ‘asocial autist’. In P. Seedhouse & K. Richards, (Eds.), Applying Conversation Analysis. London: Palgrave Macmillan. Duncan, J. (1986). Disorganization of behaviour after frontal lobe damage. Cognitive Neuropsychology 3, 271–290. Ekman, P. & W. V. Friessen (1978). Pictures of Facial Affect. Palo Alto, CA: California Consulting Psychological Press. El Ashegh, H. A. & R. Lindsay (this volume). Cognition and Body Image, pp. 175–223. Fink, G., J. C. Marshall, P. W. Halligan, C. D. Frith, R. S. Frackowiak & R. J. Dolan (1997). Hemispheric specialization for global and local processing: the effect of stimulus category. Proceedings of the Royal Society of London, B Biological Sciences 264(1381), 487–94. Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press. Frith, U. (1985). Recent Experiments On Autistic Children’s Cognitive And Social Skills. Communication 19, 16–23. Frith, U. (2001), Mindblindness and the Brain in Autism. Neuron 32, 969–979. Frith, U. & F. Happé (1994). Autism: beyond “theory of mind”. Cognition 50, 115–132. Frye, D., P. D. Zelazo & T. Palfai (1995). Theory of mind and rule-based reasoning. Cognitive Development 10, 483–527. Frye, D., P. D. Zelazo, P. J. Brooks & M. C. Samuels (1996). Inference and action in early causal reasoning. Developmental Psychology 32, 120–31. Fuster, J. M. (1989). The prefrontal cortex, anatomy, physiology and neuropsychology of the frontal lobe. 2nd edition. New York: Raven Press. Gallese, V., L. Fadiga, L. Fogassi & G. Rizzolatti (1996). Action recognition in the premotor cortex. Brain 119, 593–609. Gallese, V. & A. Goldman (1998). Mirror neurons and the simulation theory of mindreading. Trends in Cognitive Science 2 (12), 493–502. Gillott, A., F. Furniss & A. Walter (2001). Anxiety in high-functioning children with autism. Autism 5, 277–286. Gillingham, G. (2000). Autism: a new understanding! Solving the mystery of autism, Aspergers and PPP-NOS. Edmonton, Alberta: Tacit Publishing. Happé, F. (1996). Studying weak central coherence at low levels: children with autism do not succumb to visual illusions. A research note. Journal of Child Psychology and Psychiatry 37, 873–877. Happé, F. (1999). Autism: Cognitive Deficit or Cognitive Style? Trends in Cognitive Neurosciences 3(6), 216–222. Hobson, R. P. (1986a). The Autistic Child’s Appraisal of Expressions of Emotion. Journal of Child Psychology and Psychiatry 27, 671–680. Hobson, R. P. (1986b). The Autistic Child’s Appraisal of Expressions of Emotion: A Further Study. Journal of Autism and Development Disorders 17, 63–79.
Gaze aversion and emotional dysfunction in autism 299
Hobson, R. P. & A. Lee (1989). Emotion-Related and Abstract Concepts in Autistic People: Evidence From the British Picture Scale. Journal of Autism and Development Disorders 19(4), 601–623. Hobson, R. P., J. Ouston & A. Lee (1988). What’s in a Face? The Case of Autism. British Journal of Psychology 79, 441–453. Hood, B. H., J. D. Willen & J. Driver (1998). Adults Eyes Trigger Shifts of Visual Attention in Human Infants. Psychological Science 9(2), 131–134. Hughes, C., J. Russell & T. W. Robbins (1994). Evidence for executive dysfunction in autism. Neuropsychologia 32, 477–92. Huttinger, P. (1996). Computer applications in programs for young children with disabilities: Recurring themes. Focus on Autism and Other Developmental Disabilities 11, 105–124. Kanner, L. (1943). Autistic Disturbances of Affective Contact. Nervous Child, 2, 217–250. Reprinted in L. Kanner (Ed.), Childhood Psychosis: Initial Studies and New Insights, Washington, D. C.: V. H. Winston, 1973. Also reprinted in A. M. Donnellan (Ed.) Classic Readings in Autism. New York: Teacher’s College Press, 1985. Kawashima, R., M. Sugiura, T. Kato, A. Nakamura, K. Hatano, K. Ito, H. Fukuda, S. Kojima & K. Nakamura (1999). The human Amygdala plays an important role in Gaze Monitoring: A PET Study. Brain 122, 217–250. Keeley, B. L. (2002). Eye-gaze information processing theory: a case-study in primate cognitive neuroethology. In M. Bekoff, C. Allen & G. Burkhardt (Eds.), The Cognitive Animal: Empirical and Theoretical Perspectives on Animal Cognition Cambridge, MA: MIT Press. Web document version found at: http://bernard.pitzer.edu/~bkeeley/ WORK/PUBS/coganimal.pdf (accessed 15th June 2003). Kendon, A. (1967). Some Functions of Gaze Direction in Social Interaction. Acta Psychologica 32, 1–25. Kolb, B. & I. Q. Whishaw (2003). Fundamentals of Human Neuropsychology. (Fifth Edition). New York: Worth Publishers. Langdell, T. (1978). Recognition of Faces: An Approach to the Study of Autism. Journal of Child Psychology and Psychiatry 19, 225–268. Le Doux, J. E. (2000). Cognitive-emotional interactions. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion, pp. 129–55. New York: Oxford University Press. Lenneberg, E. H. (1967). Biological foundation of language. New York: Wiley & Sons Lindsay, R. & B. Gorayska (2002). Relevance, goal management and cognitive technology. International Journal of Cognition and Technology 1(2), pp. 187–232. Reprinted in this volume, pp. 63–107. Loveland, K. A., B. Tunali-Kotoski, D. A. Pearson, K. A. Brelsfeld, J. Ortegon & R. Chen (1989). Imitation and Expression of Facial Affect in Autism. Development and Psychopathology 6, 433–44. Luria A. R. (1966). Higher cortical functions in man. New York: Basic Books. Meenan, S. & Lindsay, R. (2002). Planning and the neurotechnology of social Behaviour. International Journal of Cognition and Technology 1(2), 233–74. Moore, M. & S. Calvert (2000). Brief report: vocabulary acquisition for children with autism: teacher or computer instruction. Journal of Autism and Developmental Disorders 30(4), 359–62.
300 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
Morris, J. S., C. D. Frith, D. I. Perrett, D. Rowland, A. W. Youngh & A. J. Calder (1996). A Differential Neural Response in the Human Amygdala to Fearful and Happy Facial Expressions. Nature 383, 812–815. Muris, P., P. Steerneman, H. Merckelbach, I. Holdrinet & C. Meesters (1998). Comorbid anxiety symptoms in children with pervasive developmental disorders. Journal of Anxiety Disorders 12, 387–393. O’Connor, N. & B. Hermelin (1967). The selective visual attention of autistic children. Journal of Child Psychology and Psychiatry 8, 167–79. O’Louglin C. & P. Thagard (2003). Autism and Coherence: a computational model. http:// cogsci.uwaterloo.ca/Articles/Pages/autism.pdf (accessed 14th June 2003). Ozonoff, S., B. F. Pennington & S. J. Rogers (1990). Are There Emotion Perception Deficits in Young Autistic Children? Journal of Child Psychology and Psychiatry 31(3), 343–361. Premack, D. & G. Woodruff (1978). Does the chimpanzee have a theory of mind? Behavioural and Brain Sciences 1, 515–526. Prior, M. R. & W. Hoffmann (1990). Neuropsychological testing of autistic children through an Exploration with frontal lobe tests. Journal of Autism and Developmental Disorders 20, 581–590. Ramachandran, V.S. (undated). Mirror neurons and imitation learning as the driving force behind “the great leap forward” in human evolution. Web document found at: http://www.edge.org/ 3rd_culture/ramachandran/ramachandran_p1.html (accessed 10th June, 2003). Raven, J. C. (1956). Standard Progressive Matrices. London: H. K. Lewis. Richer, J. M. & R. G. Coss (1976). Gaze Aversion in Autistic And Normal Children. Acta Psychiatrica Scandinavia 53, 193–210. Rimland, R. (1964). Infantile Autism: The Syndrome and Its Implications for a Neural Theory of Behavior. New York: Appleton-Century-Crofts. Rizzolatti, G. & M. A. Arbib (1998). Language within our grasp. Trends in Neuroscience 21, 188–94. Rizzolatti, G., L. Fadiga, L. Matelli, M. Bertinardi, E. Paulesu, D. Perani & F. Fazio (1996). Localisation of grasp representations in humans by PET: 1. Observation vs. execution. Experimental Brain Research 111, 246–52. Robbins, T. W. (1996). Dissociating executive functions of the prefrontal cortex. [Review]. Philosophical Transactions of the Royal Society of London, Biological Sciences 351, 1463–71. Rogers, S. J. & B. F. Pennington (1991). A theoretical approach to the deficits in Infantile Autism. Developmental Psychpathology 3, 137–62. Rumsey, J. M. & S. D. Hamburger (1988). Neuropsychological findings in high-functioning autistic men with infantile autism, residual state. Journal of Clinical and Experimental Neuropsychology 10, 201–221. Rutter, M. & E. Schopler (1987). Autism and pervasive developmental disorders: concepts and diagnostic issues. Journal of Autism and Developmental Disorders 17, 159–186. Shallice, T. (1998). From Neuropsychology to Mental Structure. Cambridge: CUP. Shallice, T. & P. Burgess (1996). The domain of supervisory processes and temporal organization of behaviour. Philosophical Transactions of the Royal Society of London, Biological Sciences 351, 1405–12.
Gaze aversion and emotional dysfunction in autism 301
Schlopler, E. (1985). Editorial: Convergence of learning disability, higher-level autism, and Asperger’s syndrome. Journal of Autism and Developmental Disorders 15, 359. Tantam, D., L. Monahan, H. Nicolson, & J. Stirling (1989). Autistic Children’s Ability to Interpret Faces: A Research Note. Neuropsychologia 5, 757–768. Thagard, P. & K. Verbeurgt (1998). Coherence as constraint satisfaction. Cognitive Science 22, 1–24. Vertegaal, R., R. Slagter, G. C. Van der Veer, and A. Nijholt (2000). Why Conversational Agents Should Catch the Eye. In Extended Abstracts of CHI 2000. The Hague, The Netherlands: ACM 2000, 257–258. Volkmar, F. R. & L. C. Mayes (1990). Gaze behaviour in autism. Development and Psychpathology 2, 61–69. Weekes, S. J. & R. P. Hobson (1987). The Salience of Facial Expression for Autistic Children. Journal of Child Psychology and Psychiatry 28, 137–152. Williams, J. H. G., A. Whiten, T. Suddendorf & D. I. Perrett (in press). Imitation, mirror neurons and autism. Neuroscience and Biobehavioral Reviews. Wing, L. (1988). The continuum of autistic characteristics. In E. Schopler & G. B. Mesibov (Eds.), Diagnosis and Assessment in Autism, pp. 91–110. New York: Plenum. Wing, L. & J. Gould (1979). Severe impairments of social interaction and associated abnormalities in children: Epidemiology and classification. Journal of Autism and Developmental Disorders 9, 11–29. Wing, L. & D. Potter (2002). Mental Retardation and Developmental Disabilities Research Review 8(3), 51–61.
Communicating sequential activities An investigation into the modelling of collaborative action for system design Marina Jirotka and Paul Luff University of Oxford / King’s College London
1.
Introduction
Recently a number of critiques have emerged of Human-Computer Interaction (HCI) that have not only been directed at its primary focus — the individual user of a computer system — but also at its conceptual underpinnings, principally drawn from cognitive science. Thus, researchers have begun to consider broader topics of interest, such as how technology can and does support collaborative and communicative activities, and have suggested developments of the conceptual framework, such as social cognition, distributed cognition and cognitive technology. To differing degrees these developments, like HCI itself, have both theoretical and practical concerns, on the one hand trying to develop systematic analyses of technologies in use whilst on the other, endeavouring to inform the practices, procedures and methods for system design. This interweaving of description, analysis and prescription has been a longstanding concern, whether in the developing particular tools, techniques and guidelines for interface design, or by informing more general approaches to system development. It seems a perennial problem to develop prescriptions that are compatible with the underlying conceptions and concerns of the analytic orientation deployed. In this chapter, we will report on one such investigation. We will draw from a study that utilises one of the orientations employed by those concerned with the analysis of technologies in use — the ethnomethodological orientation (Heath and Luff, 2000). Researchers seeking to draw from this orientation share many concerns with researchers in Cognitive Technology (Gorayska and Mey, 2002). For example, studies are principally concerned with naturalistic settings,
304 Marina Jirotka and Paul Luff
focussing on work activities in context, hence the number of ‘workplace studies’ informed from this orientation (Lueg, 2002, reprinted in this volume, pp.225–239; Kaushik et al., 2002). Studies are also concerned with analysing social interaction, communication and use of talk in context (Dautenhahn, 2002, reprinted in this volume, pp. 128–152; Dascal, 2002 reprinted in this volume, pp. 37–62). Ethnomethodological studies tend to explore the communicative and collaborative uses of technologies, not only within particular naturalistic settings but with respect to specific innovative collaborative technologies, hence their popularity within Computer-Supported Collaborative Work (CSCW). Researchers drawing from this orientation have undertaken ethnographic studies, drawing upon materials gathered in the field, such as field notes, audio and video recordings, to try and understand the complexities of the everyday actions through which technologies are made sense of, and how collaborative actions are coordinated through such systems (e.g., Heath and Luff, 2000). Drawing upon these understandings, there inevitably has also been a concern to develop consequences for the design and deployment of technologies, whether this is through simple implications for particular technologies or as general guidelines for system development (cf. Plowman et al., 1995). However, while being concerned that the critical features of the analytic orientation can be maintained, it is difficult to suggest methods that do not divert attention from the features that make that orientation distinctive. What seems to be characteristic of workplace studies are the details of everyday work and interaction that are revealed through them, particularly with respect to the ways in which these actions are situated, contingently produced, and shaped from moment-tomoment by the participants. However, it can be hard to direct attention to such details when cast as frameworks, guidelines and other techniques for the designer. Such methods can appear much like those suggested elsewhere for system design, methods that themselves have not met with undue success when deployed (cf. Bellotti, 1988). Researchers have thus sought to explore more in more systematic fashion the relationship between studies of technologies in use and the derived implications for design. By considering the relationship in terms of a communication between ethnographers and designers, attention can be paid to the different interests of those concerned. So, proposals have been made for presenting data and materials gathered in fieldwork (Pycock et al., 1998), for representing and summarising the results of studies (Hughes et al., 2000) and for outlining patterns found in previous studies (Martin et al., 2001). In these endeavours there is a concern for representation of data, but also remaining sensitive to the
Communicating sequential activities 305
underlying analysis. One recurring resource for facilitating the communication of an analysis is through modelling. Designers may build models in order to increase understanding of a domain or a system that can then be read by others and later discarded. However, models can provide more than simply being a communication device; operations may be performed on models that provide resources for reasoning about the data being modelled. This ability to reason about a model suggests one way in which the work reported here is related to recent initiatives in Cognitive Technology. The models should form a resource for designers, supporting their activities. The models being artefacts and tools, in a similar way to which Dascal (2002, reprinted in this volume) considers language more generally, as a cognitive technology. More importantly perhaps, the work has common concerns to those of researchers in Cognitive Technology regarding the uses of everyday tools and objects by participants in their settings. These may be conventional artefacts such as paper documents, or more complex devices, like computer systems. In ethnographic studies in CSCW and Cognitive Technology, although there may be different ways of conceiving of the ‘users’ activities, there is a common interest in investigating the skills and practices through which these objects are made sense of and produce recognisable actions in the domain (Gorayska and Mey 2002, reprinted in this volume). There is a third way in which this research has shared concerns with those in Cognitive Technology. Alongside studies of communicative and collaborative activities in a number of settings, researchers an CSCW have been involved (or studied) a range of ‘technical interventions’, where either systems have been deployed, prototype technologies have been experimented with or models, patterns or schema of system use have been formulated. These interventions have the eventual aim of either improving the kinds of technologies that could be used in everyday settings, or the methods and approaches through which those technologies are designed. However, they frequently serve another purpose. By undertaking them, researchers can develop their own understandings of human conduct (cf. Pfeiffer, 2002, reprinted in this volume). In the case at hand, in common with the concerns of Cognitive Technology, we are interested in exploring how through attempts at modelling communication and collaboration we might enhance our understanding of social conduct. We begin by drawing upon our previous research (Jirotka, 2000; Jirotka and Luff, 2001; Luff and Jirotka, 1998) and model aspects of social interaction where the structure of the modelling notation resonates with the purpose to which it is being put; namely, developing models of complex collaborative activities. In
306 Marina Jirotka and Paul Luff
order to do this we conduct an investigative exercise drawing upon a particular notation: Communicating Sequential Processes (Hoare, 1985) designed initially for describing concurrent systems and reasoning about parallel and concurrent activities. CSP, as a process language, provides powerful resources to model many of the features identified in previous studies of collaborative work (Luff and Jirotka, 1998; Jirotka, 2000). We discuss the appropriateness of this notation for presenting an analysis of situated, sequential and collaborative activities by drawing on illustrative examples from a workplace study undertaken in a financial dealing room. We then go on to investigate how such an approach could provide tools for reasoning about the consequences of deploying technologies in the workplace. We discuss a potential technological intervention in the trading floor that has consequences for the ways in which information is communicated and distributed around the dealing room. We draw upon the models of trading activities built up previously, to speculate how they could be used to reason about the consequences of the deployment of this technology, and the possible different impacts on existing practices. We draw from this investigation to discuss the general implications for developing prescriptions for design from analysis of naturalistic materials.
2. Background: Developing ethnomethodological analyses and sequential analyses for system design Workplace Studies, drawing upon a wide range of orientations, are concerned with detailing the practices through which everyday work and interaction are accomplished (e.g., Engeström and Middleton, 1996; Heath et al., 2000; Luff et al., 2000; Plowman et al., 1995). Due in part of their concerns with the situated use of technologies, both mundane and complex, and partly because of influential critiques of HCI (e.g., Suchman, 1987), those workplace studies drawing on ethnomethodological orientation have seemed particular prominent in fields such as Computer-Supported Cooperative Work (CSCW) — fields where there is a parallel interest in designing and developing new technologies (Harper, 2000; Button and Sharrock, 2000; Suchman, 2000). One recurring concern in these has been the collaborative resources that are utilised both to understand and produce intelligible actions and activities through everyday artefacts in the workplace. Although the artefacts that feature in these studies are varied, including whiteboards (Goodwin and Goodwin, 1996), electromechanical displays (Heath and Luff, 1992) and CCTV screens
Communicating sequential activities 307
(Luff et al., 2000), given the nature of many of the workplaces under investigation there inevitably has been a focus on the use of documents whether these held and presented on paper or electronically. Examples are the uses of paper flight strips in air traffic control (Harper and Hughes, 1993), of worksheets in printing presses (Bowers and Button, 1995), of paper tickets in financial trading rooms (Heath et al., 1994), of computer systems for telephone call takers (Whalen and Vinkhuyzen, 2000) and of scheduling systems to manage passenger transport (Luff and Heath, 2000). Many of the concerns surrounding these practices share much in common with studies within Cognitive Technology particularly those that focus on the communicative and interactional practices surrounding the uses of technology. For example, Goodwin and Goodwin (1996) explore the ways that whiteboards are used in control rooms as a shared communicative resource both to make sense of colleagues’ utterances and their other conduct. In a quite different domain, Greatbatch et al. (1993) reveal how the interaction between doctor and patient is co-ordinated with activities on a computer system. Whalen and Vinkhuyzen (2000) outline, in a number of call centre settings, the ways the information displayed on the computer screen is used as a resource within talk with the remote party on the phone, how text typed into the system is co-ordinated with the telephone contributions of the co-interactant and how procedures specified by the system are transformed for the practical circumstances that the participants face. What is distinctive about these studies is that they develop upon a social scientific analytic framework that emerged within the ethnomethodological orientation — upon conversation and interaction analysis — and explore the details of how talk, bodily conduct and the use of material resources accomplish social activities (Sacks, 1992; Atkinson and Heritage, 1984) In particular, they develop a sequential analysis of the coordination of talk that emerged from the initial studies by Sacks et al. (1974). Such analyses draw on participants’ own understandings how they emerge turn-by-turn and provide for a powerful and detailed analysis of the moment-to-moment production and coordination of collaborative activities. Conversation and interaction analyses draw extensively on the sequential analyses of talk and conduct. Through this sequential analysis, researchers develop how one turn of talk (which could be several utterances, a single utterance or merely a word or vocalisation) displays a participant’s understanding of a co-interactant’s prior utterance whilst also providing for its intelligibility to be displayed in the next. Turns therefore provide a public resource for displaying and producing understandings, available to both participants and analysts. Indeed, the organisations of turns in a sequence
308 Marina Jirotka and Paul Luff
requires participants (and therefore other analysts) to monitor their production at all points throughout their course (Sacks et al., 1974).1 As might be gathered, conversation and interaction analyses draw on naturalistic materials; audiorecordings of talk, for example; or in the case of workplace studies, more typically on video recordings of everyday conduct. These workplace studies delineate a conception of conduct in everyday settings, as collaborative, collaboration as being principally interactional, and interactions being produced and recognised through a sequential organisation. As with other workplace studies drawing on the ethnomethodological orientation there has been an interest in drawing upon these detailed analyses not only to develop understandings of technologies in use but also to consider the implications of the analyses for proposed technologies and systems. A critical issue for these efforts lies in considering ways of transforming the ethnographic material that preserve the richness of the ethnographic account whilst being sensitive to the practices of engineers and designers of proposed technologies. After all, if such analyses are to be useful within the design process, they too must be presented in ways that resonate with the practices of designers and engineers. With this regard, researchers have considered the questions that engineers and designers are likely to ask of ethnographers (Randall et al., 1994), ‘quick and dirty’ methods for design (Hughes et al., 1994) and notational and representational approaches for presenting analyses (Hughes et al., 2000). One particular software engineering activity has been the focus of much interest — that of modelling Throughout the development process, software engineers are encouraged to develop models of particular processes, mechanisms and solutions. These models are not only perceived as aids for communication between designers and others involved in the process, but also for clarification, as particular issues can be focused on, abstracted and unpacked. Though system developers may not explicitly use a formal or structural approach to system design, they frequently find the need to focus on aspects of a problem, develop solutions that are generic to more than one case and utilise notations to communicate issues and potential solutions to others (Booch, 1991). Given the complex details of social conduct revealed in workplace studies, it is therefore not surprising that some researchers have proposed modelling aspects of the social and organisational context in which technologies are to be deployed. In this way it is hoped that it may be possible to communicate the details of ethnographic studies in forms and representations that are familiar to system designers. However, whilst models can be developed as such communication devices, they
Communicating sequential activities 309
also have the potential for reasoning about the data being modelled. Previous investigations into modelling the details of work practices have tended to focus on issues of representation as communication (Viller and Sommerville, 1999). This may be due to the complexity of the models developed, but it would seem to neglect one of the critical uses of the models that have been delineated. Nonetheless, the task of developing descriptive models of activities revealed by naturalistic analyses is fraught with difficulties. Recent attempts have sought to provide frameworks for describing, representing and communicating the social and organisational aspects of work, the contingent and the tacit, the collaborative and the situated. By their very nature these are hard to delineate, formalise and represent. The kinds of activities seeking to be conveyed are produced through informal, interpretative and interactional practices. The challenge to modelling aspects of social conduct lies in the ability to represent the very details of collaboration, coordination and interaction that these workplace studies reveal. This may be particularly apparent if the principal purpose of the model is for communication with designers. It may be that modelling is not, in itself, incompatible with social scientific studies; even though previous attempts in HCI have been criticised for glossing the very phenomena revealed through the analysis (Button, 1990). However, if the analyses are to have a more significant practical purpose, it may be that the models have to be more ‘useful’ to designers. Instead of attempting to simplify the ethnographic information into models as communicative devices for system designers, perhaps we might view the development of models as resources for reasoning about technological intervention in social and collaborative settings. This chapter investigates the use of a modelling notation that provides such capabilities. In its structure Communicating Sequential Processes (CSP) (Hoare, 1985) incorporates powerful techniques for expressing issues raised by ethnomethodological studies, particularly those that detail the interactional and sequential production of collaborative activities. Sequentiality is an integral part of CSP. Inherent in its structure, CSP has powerful resources to describe sequential processes. This is not to say that other notations cannot describe or represent such aspects. However, when notational structures do not adequately support the description of relevant phenomena, it is often the case that additional structures must be artificially constructed and these may obscure the phenomena that are being described. CSP seems to provide a more direct notation for particular aspects of collaborative conduct of concern to us here. CSP also allows the development of, and reasoning about, processes that interact with each other and the environment. The notation provides for
310 Marina Jirotka and Paul Luff
various mathematical models and reasoning methods that offer analysts resources to reason about possible states and the consequences of various combinations of states occurring. It takes as critical the notion of a process whose behaviour is described by a sequence of events it might ‘use’. A process may be put in parallel or be interleaved with other processes so that various combinations of events can evolve. Thus, CSP is concerned with sequentiality, with parallel and interleaved activities. It may be then that CSP could provide a way of elucidating particular aspects of social and organisational activities. In the remainder of this chapter we wish to investigate this possibility. We will draw upon illustrative examples from prior analyses of collaborative work practice in financial trading rooms. We consider how aspects of activities-ininteraction may be characterised in CSP, what may be modelled, and what the consequences of doing so would be. Based on this preliminary investigation, we will discuss the potential for the development of such models as a resource for reasoning about technological intervention in workplace settings.
3. Modelling collaborative work practice The development of CSP involved a movement away from considering computer processes as linear and distinct from the environment around them, to systems where events can happen in parallel and interact with other events (Hoare, 1985). Thus, in concurrent systems whilst the various components are “in (more or less) independent states (and) it is necessary to understand which combination of states can arise and the consequences of each” (Roscoe, 1998, p.2). The relevance of this approach is perhaps most apparent when considering computer communications and distributed systems, where various processes can occur in different places and at different times. In these cases it is important to develop a model of concurrent processes in order to reason about the consistency between them and whether any potential communication breakdowns could occur.2 As CSP allows reasoning about processes that interact with one another and their environment, the most fundamental object in CSP is a communication event. Communication in CSP has been defined as “a transaction or synchronisation between two or more processes rather than as necessarily being the transmission of data one way” (Roscoe, 1998, pp. 8–9). Thus, communication events are considered as events or actions that can describe patterns of behaviour regarding objects in the world. The set of events viewed relevant in the description of an object is termed its alphabet. In CSP a process is the behaviour pattern of an object described by the set of events selected in its alphabet.
Communicating sequential activities
In the following, we consider the use of CSP notation with respect to the use of models of collaborative and communicative activities. For this we will draw on our previous analyses of a particular complex setting, financial dealing rooms (Heath et al., 1994–5; Jirotka et al., 1993). These are highly sophisticated settings characterised as fast moving, highly complex and relying on complex collaborations between participants. Hence, they seem particularly challenging when considering the developments of models of collaborative activities.3 The sequential analyses of different domains of financial trading reveal the highly complex and collaborative nature of dealing, characterised by very frequent (often brief) deals undertaken in parallel and interrelated with other activities. Traders can be engaged in more than one activity at a time with various coparticipants, and can participate in these activities in various ways, from being a (counter-) party to a deal to simply overhearing that a deal has been struck by another dealer. It is also apparent that the ways activities are interleaved are complex. Rather than wait until a deal is agreed to record the necessary information (called ‘deal capture’), dealers frequently record items of the deal as they emerge in the interaction.4 For the purposes of this investigation we do not attempt to build up a complete model of the setting in CSP. Rather, we wish to examine the issues that emerge when considering modelling socio-interactional resources using this notation for the purposes of informing design. In this exercise, we shall use simple illustrative examples of fundamental concepts in CSP such as parallelism.5 3.1 Making a deal To begin, we consider a fragment of activity from the dealing room that reveals the critical features of a deal. Tom, a market maker, is a doing a deal with Miles, a salesman through an intercom system. Miles: Tom: Miles: Tom: Miles:
BEEP Hello Hello Miles I want to sell five o seven five Shell, five o seven five Shell for (Kleinwort) whatever you can do Tom Eh four eighty three and a half That’s very kind of you. Three one six. Thank you sir. BEEP
Very briefly, in this fragment, Miles a salesman calls Tom to sell 5075 units of Shell stock on behalf of his client Kleinwort and he asks Tom to quote a price
311
312
Marina Jirotka and Paul Luff
(‘whatever you can do Tom’). Tom provides a buying price (£4.83½ per share) which Miles accepts by giving Tom Kleinwort’s customer number (‘That’s very kind of you. Three one six. Thank you sir’); an utterance that also ends the call. At this point we can see the key features of a deal: the buyer (Tom); the seller (Miles on behalf of Kleinwort); the price (£4.83½); the shares being traded (Shell); the number of shares (5075) and the customer number (316). These six components of a deal, the stock, amount of stock, price, buyer, seller, and the customer number are the basic features of interest when trading. It is clear from the fragment that dealers in the course of the communication have agreed a deal. However, it is unclear exactly when the deal, or the components of the deal, are agreed, and if such distinctions in timing are of consequence to the participants. It is critical, however, that the components are conveyed during its course and on termination these, and the deal, come to be ‘agreed’. In CSP we could define ‘agreeing a deal’ in terms of two independent processes: one for ‘agreeing a price’ (PRICE) and one for ‘agreeing the amount’ (AMOUNT) of stock to be dealt. PRICE = agreePrice Æ agreeDeal Æ STOP AMOUNT = agreeAmount Æ agreeDeal Æ STOP
Here agreePrice and agreeDeal are events in the alphabet of the PRICE process and agreeAmount and agreeDeal are in the alphabet of AMOUNT. Processes can participate in events in sequence (e.g., agreePrice and then agreeDeal) and can then be put together in parallel (denoted by the operator ||). When two processes are brought together to run concurrently, events that are in both their alphabets require the simultaneous participation of both processes. However, each process also can evolve its own events independently. So, if we now put PRICE and AMOUNT in parallel: PRICE || AMOUNT
we can see that the two processes are synchronised around agreeDeal. As these processes have been put in parallel, we cannot agreeDeal until we agreePrice and agreeAmount. The notation allows us to preserve the indeterminacy of the ordering of these two activities. As detailed in the workplace study, it is important for dealers to be able to agree an amount of stock to trade and to agree the price, but the ordering of these two events may be indeterminate. CSP also provides traces that are records of events, that is the sequence of events that occur so we can keep a record of communication between the environment and the process. Thus, traces of the above might be
Communicating sequential activities
·agreePrice, agreeAmount, agreeDealÒ ·agreeAmount, agreePrice, agreeDealÒ
We can further extend the complexity by defining a dealing process in terms of PRICE, AMOUNT and other processes as: DEAL = PRICE || AMOUNT || STOCK || BUYORSELL || etc.
where each of the processes PRICE, AMOUNT, STOCK, BUYORSELL, may be synchronised around agreeDeal, however, it is not specified in what order they are accomplished. The deal itself may become even more complex when we also consider how the details of the transaction are recorded. Observations of deal production and real-time record-keeping reveal that dealers do not always wait until the deal has been agreed before recording items. Rather, the ways in which components are written down may be tied to the order in which they are produced in the talk. Dealers may record the components of a deal as each item emerges in the talk. However, dealers may also wait until three or four deals have been done before recording items, or they may record four or five deals in an interleaved manner with the deals at different stages of completion. The relationship between recording a deal and agreeing a deal can be further elaborated in the price process outlined above. PRICE = agreePrice Æ (recordPrice Æ agreeDeal Æ STOP | agreeDeal Æ recordPrice Æ STOP)
Processes may be described as a single stream of behaviour, but they may also be influenced by their interactions with the environment in which they are placed. Thus, CSP provides ways of describing processes that offer a choice of actions in their environment (denoted by the operator |). For example, in the example above we have further defined the price process as being a choice between recording the price and then agreeing a deal or of agreeing the deal followed by recording the price. CSP provides a modular approach to the composition of each of the components of a deal. There is a constant concern with the sequence or ordering of events; for example, when processes are put in parallel, it is of interest what events happen when, as processes are synchronised around particular events. Critically, the order of these events is not determinate. The notation also provides a means of describing processes that offer a choice of actions in the environment. These choice operators may generate a number of traces and combinations of events that can be used as a resource for reasoning
313
314 Marina Jirotka and Paul Luff
about the work setting. From the materials at hand we can see that the participants agree the components that make up the deal and the overall deal itself, but there is no specific point where agreement ‘happens’. Rather, it is constituted by the various actions of the participants. Fundamentally, we are not attempting to model all the details of the ethnographic analysis. Rather, we wish to convey the critical issues of interaction and collaboration so often featured in workplace studies. In our study of the dealing room we wanted to demonstrate that various activities are accomplished, but that the ordering in which they are done is not necessarily important. Though certain actions can be seen to accomplish an activity like agreeing a deal, the components may be either too hard or impossible to separate out from the accomplishment itself. What appears to be critical from workplace studies is the open and flexible nature of certain activities; that they can be accomplished in different orders depending on the local circumstances at hand. Furthermore, that there may be many ways of producing various activities, through different utterances or indeed through other types of conduct. CSP allows us to represent in a parsimonious manner, issues of non-determinism, choice and parallelism as these are built into the notation itself. As we shall see later our interest in this notation lies not only in its ability to represent and describe activities in the workplace, but also the possibility of being able to reason about a range of possible outcomes. 3.2 Monitoring the local environment In explicating the details of workplace activities researchers have frequently noted the importance of monitoring or awareness practices in the domains under investigation. Whether these are in control rooms (Goodwin and Goodwin, 1996; Heath and Luff, 1992; Harper and Hughes, 1993), offices (Rouncefield et al., 1994) or factory floors (Button and Sharrock, 1997) these observations point to the importance of tacit, implicit and informal communicative practices in the achievement of everyday work. They also raise important issues regarding the interweaving of the private and the public, the individual and shared and the conceptions of each in CSCW and HCI, and potential technological developments to support activities and interactions in work domains. Such studies have revealed how sensitive participants are to the conduct of others, how participants can design their own actions to be monitored and how even what appear to be crude communicative activities can be tuned to the activities in the local environment.
Communicating sequential activities
Some sense of these findings can be observed in the trading room. As might be expected this is a noisy environment where each trader may be undertaking numerous interlinked activities at any one time. Whilst undertaking their own activities traders also have to be aware of what their colleagues are doing. Frequently, traders will also shout outloud various offers, bids or other news for their colleagues. These ‘outlouds’ are usually, simple, single utterances shouted into the general milieu, and not appearing to require any particular response from a colleague. They are part of the reason that trading rooms are characterised as noisy and rowdy places. A trader aware of these, whilst dealing with his or own activities, like the multiple deals carried out on several phones at once, is known as having a ‘third ear’ which is seen as critical for undertaking competent trading. In the following fragment, activities in the trading room are particularly hectic. Tom and Richard are each concluding deals of their own on the phone. Stacey, another trader sitting someway distant is also on the phone, she turns and shouts out information about the call concerning Hanson shares. Stacey: Tom: Richard: Richard: Stacey: Richard: Tom:
Hanson twenty of an eighth (forty by fifteen) Shearson on the bid Do you want to hit them? Um Yes Who’s that? Rene? Yes Do we want to sell forty We want to sell forty five don’t we? Give them the lot
Stacey’s outloud that Shearson, another market maker in the City, is ‘on the bid’, announces that they are buying Hanson stock. As she is announcing this Richard is closing up a conversation on the telephone and looking at the numerous screens in front of him. Tom has also been dealing, but as Stacey’s shout is being completed, he turns momentarily and slightly towards Richard and asks whether they should ‘hit them’ (i.e., sell their Hanson shares to Shearson). After agreeing with Tom, Richard then turns in the direction of Stacey and shouts to find out more precisely who is at the other end of her phone conversation. Tom and Richard then discuss, briefly, how many shares they should sell. What appears a crude broadcast of some emerging information by Stacey actually invokes a series of delicately produced activities and reveals the ways in which the participants are sensitive to the conduct of their colleagues. For example, Tom’s question presupposes that Richard has both heard the utterance and may
315
316 Marina Jirotka and Paul Luff
be prepared to collaborate in selling the stock. Despite Richard making no indication that he has heard Stacey’s utterance, the utterance initiates collaboration with him concerning the selling of stock. Richard seems to make sense of the question unproblematically, indeed in later talk with Tom, he displays he has indeed heard Stacey’s utterance and made sense of it. A short time later a substantial amount of stock is sold. Outlouds like Stacey’s in this example whilst, at first, seeming potentially disruptive to others in the room are sensitively designed. Although the initial utterance is shouted, this does not mean that it is necessarily insensitive to the conduct of colleagues. Indeed, as has been mentioned elsewhere (Heath et al., 1993) the design of outlouds and other shouted utterances can be tailored to the demands of the local setting; they can be heard by a range of participants and by their very design they do not demand particular responses. They also provide for potential collaboration as Tom can presuppose that colleagues for whom the information may be relevant will have heard the outloud. So, whilst participants need to monitor the local environment of objects, such as screens, they also need to monitor the activities of their colleagues in relation to those objects. Outlouds though critically considered as public in nature can invoke other shared activities and those considered as individual or private. They can also be sensitively produced for the prospective concerns of potential hearers, recipients and other ‘overhearers’, with different participant statuses (cf. Goffman, 1971) in the setting; the particular activities undertaken being contingent upon the circumstances at hand. CSP can offer resources for reflecting these characteristics. CSP allows us to consider, for example by defining different processes, the other processes that can evolve if an outloud event occurs. Events, in common with alphabets of more than one process, can be shared and thus, made public. The model can thus be used to consider what might happen after an activity has occurred, but also what can be seen to be made relevant by other participants who may then undertake possibly different next actions. We shall not go into detail specifying particular processes and events, but rather give a general outline of how CSP can be used to model the different choices that may be made within the dealer processes. For example, one dealer (i.e., a Dealer1) may initiate a phone call as a result of hearing the information in the outloud. A second dealer (Dealer2) may check his position in a particular stock. Diagrammatically, this might be represented as follows:
Communicating sequential activities
Dealer 1 initiateTelephoneCall Outloud ‘Hanson twenty of an eighth (forty by fifteen) Shearson on the bid’
Dealer 2 checkAmount
The following is a gloss of how the notation might be used for at an abstract level: DEALER1 (——— dealerOutloud, initiatePhoneCall, agreeDeal) DEALER2 (——— dealerOutloud, checkAmount | reduceAmount) increaseAmount))
All such deal processes have the possibility to evolve if an outloud occurs; all having the opportunity to recognise the relevance of an outloud for them and initiate a trading process in a particular stock, but not necessarily. This suggests ways of considering the public and private nature of activities in the workplace. Events, such as outlouds, in common to the alphabets of more than one process can be shared, and thus be considered publicly available. Where processes do not share events, those events can be hidden from other processes, and therefore be considered private. More importantly, the various means through which activities are made public can be reflected in the ways in which different processes evolve. CSP provides the mechanisms in its structure to describe collaboration and coordination that allow us to consider not just that an activity has occurred, but that it has been seen to be made relevant by other participants who may then undertake possibly different next activities. There is therefore a possibility of distinguishing how activities are made public in collaborative settings and the different ways that those activities engender activities by co-participants. Note that the critical issue is not the notation or the representational quality of the notation, but the mechanisms that underpin it. CSP is a concise notation and given its provenance, the notation provides for parsimonious representations of sequential processes with non-determinant orderings — concerns that resonate with issues that emerge from detailed analyses of everyday communicative activities in workplaces. However, by allowing for execution and reasoning about traces, CSP allows for the possibility of providing more than a tool for communicating the findings of workplace studies. CSP has the potential for
317
318
Marina Jirotka and Paul Luff
reasoning about transformations to work practices, particularly with respect to those that may arise following the introduction of a technology. This offers a potentially powerful resource for designers. If models of activities can be combined with parallel models of proposed technological solutions, manipulating the model can result in generating a number of different traces of events. Such a possibility will be sketched out in the next section where we will outline the potential use of a notation like CSP in the light of a proposed technical intervention. We will outline this with respect to an envisaged technical intervention — one that might have consequences for the outlouds just considered. 3.3 A proposed technical intervention Often through discussions with organisational members, designers arrive at a set of technological options to support participants in the domain. For example, in this case, one could envisage the potential of novel traders to support communication and collaboration between two or more trading floors of the same bank (e.g., between London and New York offices). A proposal could consider supporting ‘distributed outlouds’ in such a way that colleagues in offices could both produce them for others in the remote offices and hear them from other settings. Different technological solutions could support alternative ways of achieving this. For instance, as outlouds are produced from another setting, it may seem feasible to make announcements in one domain also public to the relevant participants in the other setting(s) — say, through an automated announcement system. Such an intervention could transform outlouds so that they could be provided in audio, video, textual or graphical form, or some mixture of these, and could be automatically produced or initiated by others in a remote domain. Given such a scenario we could outline a set of issues that could be relevant for designers considering distributing outlouds. So, for example, our previous analysis (see also, Jirotka, 2000) raised a number of issues related to broadcasts in the co-present environment that would suggest that: –
– –
Outlouds are coordinated with the activities of those who receive them. They are produced in relation to the activities of others in the dealing room, not just broadcast irrespective of what is happening in the setting; Recipients are sensitive to particular outlouds and not to others; Dealers are seen to be able to relate their conduct to the conduct of others, so the relationship between one trader’s activity to an outloud can be inferred by other members of the dealing room;
Communicating sequential activities 319
–
In their production, outlouds are coordinated with and related to other outlouds.
By interrogating the model, it may be possible to consider these different technological solutions in relation to aspects of producing and recognising outlouds and how these might be accomplished.
4. Technological interventions in the workplace As we do not expect to capture all possible conduct and information about the setting for the analysis, nor expect to offer a complete model of the work of the setting under investigation, the model produced could be quite simple. It is developed in order to check the understanding of both designers and ethnographers of the phenomena — in this case, outlouds. Designers could then build different models of the various technological options to support the solutions. For example, a new CSP process may be defined that models the technologically assisted outloud, as in, DISTRIBUTEDOUTLOUD
Where the process has an event pushButton in its alphabet, representing the pushing of a button that enables a dealer to distribute their outloud. In order to reason about the consequences of different combinations of processes, the designer should be able to evaluate what the consequences of different technological features might be for social conduct from combining the processes and examining the traces. An initial model of the trading room activities could be developed as a set of top level CSP processes in parallel, for example, P = MARKETMAKER || DEAL || OUTLOUD || SALESMAN
Some of these processes reflect the general activities of individuals in the setting e.g., market maker and sales person, or they reflect objects or events occurring in the domain, e.g., deal. Other processes reflect the socio-interactional resources, e.g., outlouds. We could then replace the existing outloud process with the new process for distributed outlouds Q = MARKETMAKER || DEAL || DISTRIBUTEDOUTLOUD || SALESMAN
If T1 is a trace of P, then it should be straightforward to check whether T1 is also a trace of Q. If it is, and the relevant events have been modelled, then it would
320 Marina Jirotka and Paul Luff
appear that the technology to distribute outlouds would have little impact on the scenario represented by T1. However, it may be necessary to modify the other processes in order to allow the processes to evolve in a way that may not be identical to T1 but are similar. Thus, in our example of the restrictive automated technology for distributing outlouds, if the DISTRIBUTEDOUTLOUD process has an event pushButton in its alphabet, the designer could follow the trace of the process up to the point at which the process could not evolve in the desired direction without the button on the system being pressed. At least one of the processes modelling a dealer will also need to participate in that pushButton event. The designer will need to modify the behaviour of the dealer process to get the trace to achieve the same goal. Thus, it would seem that the technological features proposed for distributing outlouds in this case may have a specific impact on the domain. One example of this would be the simple inference that dealers may be required to do at least one extra activity, pressing a button, in order to do an outloud. Drawing on the details of how outlouds are achieved in the current setting, designers can consider how an individual could coordinate the production of the outloud with activities in the remote setting. When deliberating the technological features needed to support these broadcasts, designers will need to consider what is required at the recipient’s site to make the outloud relevant and apparent. Furthermore, they may need to determine what other features are necessary and relevant if the outloud is not produced through audio, and how these features need to be supported. For example, if a technology was considered that required constraints on when a dealer can make a broadcast to a remote domain; it may be necessary to consider how such capabilities are provided, how they are accessed and coordinated with other activities (even when considering such simple methods as using a lighted button to coordinate when an outloud could be made). Thus, drawing upon the models, designers could consider what aspects of this outloud are permanent or transitory, and what other items the outloud might coordinate or overlap with — for example, noise or talk in the dealing room or other visual items on a screen. In this case it would be important to consider the different ways in which the outloud may be made public and how it may be produced in parallel with other activities. Features of the technological options may be modelled where designers consider what the CSP processes and corresponding traces might be. In the remote trading example, when combining models of technological features with social conduct, designers may consider, for instance, what might happen when
Communicating sequential activities
a remote trader is trying to operate a particular system so that a reasonable set of activities occur. Designers might examine the different ways information from outlouds can be made publicly accessible, and the different sets of resources that should be present. To do this, they might consider the model of outlouds as they are achieved at present, in relation to dealing and other activities in the dealing room. At this point, designers may need to reflect upon what components, if any, of outlouds are necessary to support, what their granularity is, and how they are voiced and parsed. In CSP this is done in relation to modelling the processes of different types of technological support and different media. The traces that are produced when manipulating a CSP model allow for exploration of the consequences of the different combinations of processes. The analysis in the above example just looks at one evolution path of the process. But there are also opportunities to analyse processes for deadlock that would indicate a problem with design that might also be resolved by modifying other processes. In addition, other traces may be produced that have not been drawn from the observed setting and may be interesting for analysis. Designers assessing these traces may decide that the set of events produced do not make sense and therefore the process model may be incorrect. Alternatively, a set of events may never have been observed in the setting during the study, but nonetheless suggest concerns for design and therefore need to be considered further. For this, video fragments collected in the study could support the more conventional use of scenarios for design. Of course, it may be that issues raised may not be addressed in the initial ethnographic investigation. Modeling could then suggest foci for further analysis or even further data collection. In this way the modeling and the social scientific study can be iterative and mutually informing. Moreover, with the traces generated through CSP models can be resource for analysis suggesting various orderings of activities that have not been considered or more unusual sequences of events may arise that need further investigation.
5. Discussion Though the examples given here are undoubtedly brief and merely illustrate some characteristics of CSP, they suggest how the notation provides resources for reflecting on the nature of communicative and collaborative activities. We have also considered how the notation could provide resources for conveying:
321
322 Marina Jirotka and Paul Luff
how the ordering of activities undertaken by individuals can be left open and flexible; how particular actions serve to coordinate actions with those of others; the ways in which the activities of co-participants are interrelated, for example: how visible actions produced by a participant, but not explicitly directed at another, can be utilised by others; how a single action can accomplish multiple interactional activities; and how activities emerge through collaboration. CSP’s consideration of parallel processes, private and public events, and the potential for outcomes to be non-deterministic provide powerful resources for reflecting the contingent, emergent and sequential nature of collaborative activities. This is not to suggest that other notations cannot represent certain kinds of social behaviour. It is certainly possible to characterise aspects of achievements such as mutual awareness in terms of objects and relationships between them (cf. Viller and Sommerville, 1999). It is also possible to provide frameworks for providing access to more detailed materials (Hughes et al., 2000) and patterns that suggest recurrent features found in a number of settings (Martin et al., 2001). However, such notations may not be the most appropriate for modelling the contingent nature of workplace conduct. Not only do the notations often imply a clear delineation of activities that is not suggested in the analysis, their structure and form tends to foreground an association of activities to individuals. Indeed the notations themselves seem to unduly focus attention on the stable features of a domain (or across domains), This seems in marked contrast to the workplace activities that these notations were trying to convey. These, drawing on an ethnomethodological orientation, bring to light how activities emerge, transform throughout their course and where certain features may be left ambiguous for practical purposes. It seems important to convey these flexible, open, contingent and situated practices to designers if the technologies they develop are to be sensitive to the settings in which they are deployed. The notations adopted may indeed obscure the very aspects they are intending to convey. Attempting to model communicative action in terms of single channels or linear relationships between co-participants, as between ‘speaker’ and ‘hearer’ or ‘writer’ and ‘reader’, for example, may hide from view the interactive and collaborative production of the interaction, from momentto-moment. With its concern with concurrent activities, CSP does appear to offer resources for conveying critical features of collaborative and interactional workplace activities. Meanwhile, it is also possible to utilise CSP models that still, as in the ordering of particular activities, leave features indeterminate, ambiguous or ambivalent, but which are necessary aspects of many activities in every settings.
Communicating sequential activities 323
Of course, this is not to say that it is not possible to convey such features in other notations. Complex models of collaborative work could be developed using object-oriented and algebraic notations, but in doing so, it is important to consider the motivations behind the modelling exercise in the first place. If the modelling exercise is merely intended to convey information to system developers it is unclear why a formal (or semi-formal) notation is necessary, particularly as natural language is quite an effective way of revealing the ambiguous, complex and rich nature of everyday conduct. Even if there are pragmatic reasons for adopting a particular notation, or that it results in parsimonious descriptions, its use may not ensure that the most appropriate aspects of ‘hard problems’ are conveyed. Indeed, it may be that if the objective of the model is to be a communication tool then other resources may be appropriate for this. An analysis of requirements supported by instances from audio-visual recordings captured in the field may both serve to highlight critical issues of relevance to design, but also provide ways of conveying the complexity and richness of the work practices under consideration (Jirotka et al., 1993). Hence, it would be hoped that through adopting a particular notation more will be possible than simply building a communication device, that, for example, provides some resources for considering the implications of a particular domain or requirements analysis within the system development process. One shortcoming of using semi-formal notations is that they provide few resources for considering the consequences of the model developed. If the system analyst wishes to explore systematically the interrelationships between the components of the model, this either has to be accomplished through detailed inspection or by building appropriate tools for analysis. Nor is this necessarily straightforward with formal notations. It is possible using a variety of formal notations to define data types and operators that represent certain aspects of naturally-occurring conduct, and these can be used to outline particular objects, activities and their interrelationships of interest. These could provide representations of fine details of interaction and collaboration (Jirotka, 2000). However, it is not necessarily the case that the structure of the modelling notation will resonate with the purposes to which it is put. In order to reason about a model, it is necessary to allow designers to manipulate what has been defined in the model, to explore the consequences of these definitions. CSP not only appears to provide an appropriate structure for reflecting concurrent and collaborative activities, but also a means of reasoning about them. As it is primarily concerned with the representation of concurrent activities, there are a range of resources embedded within CSP for describing
324 Marina Jirotka and Paul Luff
coordinated action. Moreover, the traces inherent in CSP are not merely records of the process, but resources for reasoning about the processes defined. The traces of a CSP model allow for the exploration of the consequences of different combinations of processes. The ability to reason with a model suggests a way in which the modelling activity could provide more than a means of communicating between ethnographers and designers. Undoubtedly there are problems with trying to manage complex descriptions of workplace activities for the purposes of design, particularly given the different analytic resources and practical motivations of the different parties (cf. Randall et al., 1994). It may be that tools and techniques to ‘bridge the gap’ may indeed assist the communication of analyses of complex activities for the purposes of design, but it would be a shame if in assisting the conveyance these approaches obscured the very thing they were intended to convey. Considering the problem of the relationship between the work of social scientists and computer system developers as one of communication may also overlook the possibility for greater collaboration between the two disciplines. If a modelling technique can facilitate the consideration of particular aspects of the setting, without undue work being engaged in defining additional notation and tools, then it may be that the modelling exercise would not just be a resource for communication to others in the development process. Modelling might provide a way to explore the consequences of an emerging analysis early on in the system design process so that through it particular aspects could be considered in more detail in the fieldwork. A preliminary analysis of the collaborative conduct of participants, how they are aware of each other’s activities, how a particular tool or artefact figures in an activity or how an activity is initiated, left open or closed, as examples, could provide the resources for an initial model which, when analysed, could raise issues for further investigation and analysis. Thus, the approach might be one where, in the context of business and organisational concerns, modelling and ethnographic analysis are interwoven; the modelling suggesting areas for further analysis and the analysis refining the process models. This iteration could continue to the point at which designers in conjunction with analysts and organisational members discuss different technological design options, model features of these and then reason about the consequences in relation to the video analysis. The traces that are produced when manipulating a CSP model allow for exploration of the consequences of the different combinations of processes. There are opportunities to analyse processes
Communicating sequential activities 325
for deadlock that would indicate a problem with design that might also be resolved by modifying other processes. In addition, other traces may be produced that have not been drawn from the observed setting yet may be interesting for analysis. Though an analysis could go into very fine detail, this may not be necessary. Designers may be able to model quite complex features of social interaction, but this may not be necessary for considering the consequences of different technological choices. For certain cases more detailed analysis may be needed, where there is an interest in examining the sequencing of activities or how one activity relates to another, for example. In some ways, what is being suggested here has parallels in certain engineering fields. Models are not developed to represent a problem or domain in all its details but to focus attention on particular issues, to make apparent the questions that have to be addressed and as a resource for discussing possible solutions. In such cases, it is not just that the model can accurately represent the problem but that it can be used, manipulated and design inferences can be made from it.
6. Summary Modelling may be one way of bridging the gap between those analysing the nature of workplace activities through social scientific studies and those trying to develop appropriate and relevant technologies for such domains. As the studies of social and organisational conduct tend to be rich and detailed, models of various kinds could provide a way of communicating these details to system developers in the early stages of system design. Unfortunately, social scientific studies have revealed the open, ambiguous, tacit and interactional nature of social and collaborative activities, which perhaps inevitably prove difficult to model in formal and semi-formal notations. Moreover, modelling organisational activities may be particularly problematic when the frameworks used to model them are too rigid and inflexible to account for the emergent and situated nature of social and organisational conduct. It may therefore be worth reconsidering the role of, and the resources for, modelling when seeking to utilise studies of social practice for system design. For example, there may be alternative ways of clarifying, communicating and presenting ethnographic analyses for requirements and design. The analyses themselves, particularly when supported by suitable materials, such as video recordings, may be the most suitable way of communicating what is necessary for different participants at different times in the development process.
326 Marina Jirotka and Paul Luff
It has been a recurrent concern in HCI, CSCW and Cognitive Technology not to solely communicate findings of a study, but also to have a more proactive role in the design process, commonly expressed as moving from description to prescription. This may be in terms of trying to make inferences about how different technologies impact on different users, requirements for new technologies can be derived, or potential design solutions can be assessed. For other analytic orientations (Barnard, 1991) this has been a longstanding concern and raises fundamental problems regarding the nature of making predictions from one discipline with consequences for another. In CSCW as in HCI and Cognitive Technology it has been recognised that considering the contribution of social and cognitive sciences as producing just a corpus of findings for design may not be the most appropriate ways in which these disciplines can be useful. Nevertheless, workplace studies, as a number of other studies of technology in use, are frequently undertaken within pragmatic development processes; as part of a requirements process, as precursors to design, or to assess possible technological options. It therefore is a recurring issue to draw out implications from the empirical studies for certain design. What seems to be required is a way of both systematically drawing out implications whilst remaining sensitive to the analytic orientations of the studies. In this chapter, we have provided a sketch of how this process might be accomplished through modelling, in the particular case of ethnographic studies. It has been suggested that the value of these studies for the design process lies in drawing out the details of complexities of the setting, particularly the informal, social and collaborative nature of work practices. However, it may not be most appropriate to merely try and model these details. What seems critical is to reflect the sensitivities that underpin the analyses. In the case where the analysis explores the coordination of collaborative and communicative activities, it may be most relevant to use a modelling notation that itself is concerned with the coordination and sequencing of events and processes. Modelling, however, may be more critical if, given appropriate tools and notations, it serves to support the analysis of requirements, raising critical issues not just for design but also for social scientific enquiry. Providing ways of reasoning with the models developed, although drawing on social scientific analysis, may serve as examples or contrasts to, attempts at developing ‘cognitive’ technologies. In either case, the use of such models would require a more fine-grained collaboration between social and computer scientists, where the problems are not considered merely in terms of how best to convey information from the former to the latter, or indeed how an understanding of everyday work
Communicating sequential activities 327
and interaction can inform system design, but also how the requirements for developing technology can support the investigation of social interaction and work practice. Taking the concept of sequence as a primary resource may go some way to increasing our understanding of technology in action.
Notes 1. It should be re-iterated that conversation and interaction analysis have emerged in the social sciences and so researchers drawing from them are concerned with social actions and activities. Hence, researchers are interested in ‘understanding’, ‘intelligibility’, ‘recognition’ and ‘agreement’ as social actions, rather than, say. cognitive processes. They are interested in how they are made public, displayed and are socially witnessable. 2. Perhaps the most familiar example is when considering deadlock, where no particular component of a system can make progress because each is waiting for a next item of information from the other processes. Livelock is a complementary problem, where a process continues indefinitely because it cannot be interrupted by any external process. CSP provides a notation to represent such processes, examine them, and reasoned about them. 3. They are also interesting as they have been undergoing major technological transformation over the last decade and as such are rich environments to investigate the design of technology and work practice. Indeed, our study was conducted as part of research with a major telecommunications company concerned with developing new technologies such as voice recognition systems for trading rooms. 4. These recording practices have consequences for technologies developed to support deal capture. We have been documented these elsewhere (Jirotka, 2000; Jirotka et al., 1993). 5. There have been a number of previous efforts to model and represent sequential and interactional resources for system design and specification (Cawsey, 1990; Finkelstein and Fuks, H. 1990; Gilbert et al. 1990; Frohlich and Luff, 1990). These have been particularly interested in modelling sequential features revealed by analyses of naturally occurring turns of talk. These efforts have typically used conventional declarative, and linear, formalisms. Further details of this and other modelling exercises can be found in Jirotka (2000).
References Atkinson J. M. & J. C. Heritage (Eds.) (1984). Structures of Social Action: Studies in Conversation Analysis. Cambridge: Cambridge University Press. Barnard, P. (1991). Bridging Between Basic Theories and the Artifacts of Human-computer Interaction, in Designing Interaction: Psychology at the Human-Computer Interface, Carroll, J. M. (Eds.), pp. 103–27. Cambridge: Cambridge University Press.
328 Marina Jirotka and Paul Luff
Bellotti, V. (1988). Implications of Current Design Practice for the Use of HCI Techniques. In D. M. Jones & R. Winder (Eds.), People and Computers IV: Proceedings of the Fourth Conference of the BCS Specialist Group, pp.13–34. Cambridge: Cambridge University Press. Booch, G. (1991). Software Engineering Economics. Englewood Cliffs, N. J.: Prentice Hall. Bowers, J. & G. Button (1995). Workflow from Within and Without: Technology and Cooperative Work on the Print Industry Shop Floor. In ECSCW’95, pp.51–66. Stockholm, Sweden. London: Kluwer Academic Publishers. Button, G. (1990). Going up a Blind Alley: conflating Conversation Analysis and Computational Modelling. In P. Luff, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 67–90. London and New York: Academic Press. Button, G. & W. Sharrock (1997). The Production of Order and the Order of Production: Possibilities for Distributed Organisations, Work and Technology in the Print Industry. In ECSCW ’97, Lancaster. Kluwer Academic Publishers. Button, G. & W. Sharrock (2000). Design by Problem Solving. Workplace Studies: Recovering Work Practice and Informing System Design. In P. Luff, J. Hindmarsh & C. Heath, Workplace Studies: Recovering Work Practice and Informing System Design, pp. 46–67. Cambridge: Cambridge University Press. Cawsey, A. (1990). A Computational Model of Explanatory Discourse: local interactions in a plan-based explanation. In P. Luff, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 223–236. London and New York: Academic Press. Engeström, Y. & D. Middleton (Eds.) (1996). Cognition and Communication at Work. Cambridge: Cambridge University Press. Finkelstein, A. & H. Fuks (1990). Conversation Analysis and Specification. In P. Luff, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 175–185. London and New York: Academic Press. Frohlich, D. M. & P. Luff (1990). Applying the Technology of Conversation to the Technology for Conversation. In P. Luff, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 189–222. London and New York: Academic Press. Gilbert, G. N., R. Wooffitt & N. Fraser (1990). Organising Computer Talk. In P. Luff, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 237–260. London and New York: Academic Press. Goffman, E. (1971). Relations in Public. Harmondsworth: Penguin. Goodwin, C. & M. H. Goodwin (1996). Seeing as a Situated Activity: Formulating Planes. In Y. Engeström & D. Middleton (Eds.), Cognition and Communication at Work, pp. 61–95. Cambridge: Cambridge University Press. Gorayska, B. & J. L. Mey (2002). Introduction: Pragmatics of Technology, International Journal of Cognitive Technology 1(1), 1–20. Greatbatch, D., P. Luff, C. Heath & P. Campion (1993). Interpersonal Communication and Human-Computer Interaction: an examination of the use of computers in medical consultations. Interacting With Computers 5(2), 193–216. Harper, R. H. R. (2000). Analysing Work Practice and the Potential Role of New Technology at the International Monetary Fund: Some Remarks on the Role of Ethnomethodology. In P. Luff, J. Hindmarsh & C. Heath (Eds), Workplace Studies: Recovering Work Practice and Informing System Design, pp. 169–186. Cambridge: Cambridge University Press.
Communicating sequential activities 329
Harper, R. & J. Hughes (1993). What a f-ing system! Send ’em all to the same place and then expect us to stop ’em hitting: Making Technology Work in Air Traffic Control. In G. Button (Ed.), Technology in Working Order, pp. 127–144. London: Routledge. Harper, R., J. Hughes & D, Shapiro (1991). Working in Harmony: An Examination of Computer Technology and Air Traffic Control. In J. Bowers & S. D. Benford (Eds.), Studies in Computer Supported Cooperative Work. Theory, Practice and Design, pp. 225–234. Amsterdam: North-Holland. Heath, C. C., M. Jirotka, P. Luff & J. Hindmarsh (1993). Unpacking Collaboration: the Interactional Organisation of Trading in a City Dealing Room. In Proceedings of ECSCW 1993, Milan, September 13th–17th, pp. 155–170. London: Kluwer Academic Publishers. Heath, C. C., M. Jirotka, P. Luff & J. Hindmarsh, J. (1994–5). Unpacking Collaboration: the Interactional Organisation of Trading in a City Dealing Room. CSCW Journal 3(2), 147–165. Heath, C. C., H. Knoblauch & P. Luff (2000). Technology and social interaction: the emergence of ‘workplace studies’. British Journal of Sociology 51(2), 299–320. Heath, C. C. & P. Luff (1992). Collaboration and Control: Crisis Management and Multimedia Technology in London Underground Line Control Rooms. CSCW Journal 1(1–2), 69–94. Heath, C. C. & P. Luff (2000). Technology in Action. Cambridge: Cambridge University Press. Hoare, C. A. R. (1985). Communicating Sequential Processes. Englewood Cliffs NJ: Prentice Hall. Hughes, J. A., V. King, T. Rodden & H. Andersen (1994). Moving out of the Control Room: Ethnography in System Design. In Proceedings of CSCW ’94, Chapel Hill, North Carolina, Oct. 22–26, pp. 429–40. New York: ACM Press. Hughes, J., J. O’Brien, T. Rodden & M., Rouncefield (2000). Ethnography, Communication and Support for Design. In P. Luff, J. Hindmarsh & C. Heath, Workplace Studies: Recovering Work Practice and Informing System Design, pp. 187–214. Cambridge: Cambridge University Press. Jirotka, M. (2000). An Investigation into Contextual Approaches to Requirements Capture. Unpublished DPhil Thesis. University of Oxford. Jirotka, M. & P. Luff (2001). Representing and modeling collaborative practices for systems development. In C. Floyd, Y. Dittrich & R. Klischewski (Eds.), Social Thinking — Software Practice, pp. 111–140. Cambridge, MA: MIT Press. Jirotka, M., P. Luff & C. Heath (1993). Requirements for Technology in Complex Environments: Tasks and Interaction in a City Dealing Room. SIGOIS Bulletin (Special Issue) Do users get what they want? (DUG ’93), 14(2–December), 17–23. Kaushik, R., Kline, S., David, P., and D’Arcy, J. O. (2002). Differences between computermediated and face-to-face communication in a collaborative fiction project, International Journal of Cognition and Technology 1(2), 303–26. Luff, P., J. Hindmarsh & C. Heath (Eds.) (2000). Workplace Studies: Recovering Work Practice and Informing System Design. Cambridge: Cambridge University Press. Luff, P. & C. C. Heath (2000). The Collaborative Production of Computer Commands in Command and Control. International Journal of Human-Computer Studies 52, 669–699. Luff, P. C. Heath & M. Jirotka (2000). Surveying the Scene: Technologies for Everyday Awareness and Monitoring in Control Rooms. Interacting With Computers 13, 193–228.
330 Marina Jirotka and Paul Luff
Luff, P. & M. Jirotka (1998). Interactional Resources for the Support of Collaborative Activities: Common Problems for the Design of Technologies to Support Groups and Communities. In T. Ishida (Ed.), Community Computing and Support Systems: Social Interaction in Networked Communities, pp. 249–266. Berlin: Springer Verlag. Martin, M., T. Rodden, M. Rouncefiled, I. Sommerville & S. Viller (2001). Finding patterns in the fieldwork. In European Conference on Computer Supported Coopertaive Work, Germany. London: Kluwer Academic Publishers. Pfeiffer, R. (2002). Robots as Cognitive Tools. International Journal of Cognitive Technology 1(1), 125–144. Plowman, L., Y. Rogers & M. Ramage (1995). ‘What are Workplace Studies For?’ In Proceedings of ECSCW’95, Stockholm, Sweden, 10–14 September, pp. 309–324. Pycock, J., K. Palfreyman, J. Allanson & G. Button (1998). Representing Fieldwork and Articulating Requirements through Virtual Reality. In Proceedings of CSCW 98, Seattle. New York: ACM Press. Randall, D., J. Hughes & D. Shapiro (1994). Steps towards a Partnership: Ethnography and System Design. In M. Jirotka & J. Goguen (Eds.), Requirements Engineering: Social and Technical Issues, pp. 241–258. London: Academic Press. Roscoe, A. W. (1998). The Theory and Practice of Concurrency. Englewood Cliffs, NJ: Prentice Hall. Rouncefield, M., J.A. Hughes, T. Rodden & S. Viller (1994). Working with “Constant Interruption”: CSCW and the Small Office. In Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina, USA, pp. 275–286. New York: ACM Press. Sacks, H. (1992). Lectures in Conversation: Volumes I and II. Oxford: Blackwell. Sacks, H., E. A. Schegloff & G. Jefferson (1974). A simplest systematics for the organisation of turn-taking for conversation, Language 50(4), 696–735. Sommerville, I., T. Rodden, P. Sawyer, R. Bentley & M. Twidale (1993). Incorporating Ethnography into Requirements Engineering. In Proceedings of RE’93: International Symposium on Requirements Engineering, San Diego, Jan 4–6. Suchman, L. (1987). Plans and Situated Actions: The Problem of Human-Machie Communication. Cambridge: Cambridge University Press. Suchman, L. (2000). Making a Case: “Knowledge” and ‘Routine” Work in Document Production. In P. Luff, J. Hindmarsh & C. Heath (Eds.), Workplace Studies: Recovering Work Practice and Informing System Design, pp. 29–45. Cambridge: Cambridge University Press. Viller, S. & I. Sommerville (1999). Coherence: an Approach to Representing Ethnographic Analyses in System Design. Human-Computer Interaction 14(1–2), 9–42. Whalen, J. & E. Vinkhuyzen (2000). Expert Systems in (Inter)Action: Diagnosing Document Machine Problems Over the Telephone In P. Luff, J. Hindmarsh & C. Heath (Eds.), Workplace Studies: Recovering Work Practice and Informing System Design, pp. 92–140. Cambridge: Cambridge University Press.
Part III
Coda
“The end of the Dreyfus affair” (Post)Heideggerian meditations on man, machine and meaning* Syed Mustafa Ali The Open University, England
1.
Introduction
According to Janney, “an underlying assumption of Cognitive Technology [is] that computers can be regarded as tools for prosthetically extending the capacities of the human mind.” (Janney, 1997, p. 1) On this view, Cognitive Technology is not concerned with the replication or replacement of human cognition — arguably the central goal of ‘strong’ artificial intelligence — but with the construction of cyborgs, that is, cybernetic organisms or man-machine hybrids, in which possibilities for human cognition are enhanced (Haraway, 1985). However, it may be necessary to reconsider this position in order to address what might be referred to as the ‘Schizophrenia Problem’ associated with human-computer interaction. Janney describes the essence of this problem as follows: “As a partner, the computer tends to resemble a schizophrenic suffering from severe ‘intrapsychic ataxia’ — the psychiatric term for a radical separation of cognition from emotion. Its frame of reference, like that of the schizophrenic, is detached, rigid, and self-reflexive. Interacting in accordance with the requirements of its programs, the computer, like the schizophrenic, forces us to empathize one-sidedly with it and communicate with it on its own terms. And the suspicion arises that the better we can do this, the more like it we become.” (Janney, 1997, p. 1) Crucially, on his view, intrapsychic ataxia, is “a built-in feature of computers.” (Janney, 1997, p. 4) Notwithstanding the intrinsic nature of the Schizophrenia Problem, Janney remains optimistic about the possibility of its (at least partial) ‘solution’ within cognitive technologies as is evidenced by his intent ‘to encourage discussion about what can be done in Cognitive Technology to address the problems pointed out [emphasis added].”
334 Syed Mustafa Ali
(Janney, 1997, p.1) As he goes on to state, “an important future goal of Cognitive Technology will have to be to encourage the development of computer technology that reduces our need for psychic self-amputation.” (Janney, 1997, p. 5) While concurring with Janney that “a one-sided extension of the cognitive capacities of the human mind — at the expense of the user’s emotional and motivational capacities — is technological madness” (Janney, 1997, p. 1), it is maintained that if the Schizophrenia Problem is to be ‘solved’ — by which is meant elimination and not mere reduction of the need for psychic self-amputation — it will be necessary for Cognitive Technology to reconsider its position on the issue of replication of human cognition and emotion. Although efforts are underway in this direction, it is suggested herein that they are unlikely to prove ultimately successful. This is because the Schizophrenia problem can be shown to be intrinsically, if only partially, related to the ‘hard problem’ of consciousness (Chalmers, 1996), that is, the problem of explaining how ontological subjectivity (or first-person experience) can arise from an ontologically objective (or non-experiential) substrate. For example, Picard (1997) has argued that the problem of synthesizing emotion can largely be bracketed from the problem of explaining (and possibly synthesizing) consciousness. However, as she is careful to point out, consciousness and emotion, while not identical, are “closely intertwined”. While current scientific (specifically, neurological) evidence lends support to the view that consciousness is not necessary for the occurrence of all emotions, Picard concedes that emotional experience “appears to rely upon consciousness for its existence.” (Picard, 1997, p. 73). If consciousness is necessary for emotional experience, then in order to solve the Schizophrenia Problem, cognitive technologies must first solve the ‘hard problem’. This would seem to suggest that, contrary to one of the underlying assumptions of Cognitive Technology, replication of mind (cognition and emotion) — arguably the central goal of AI (or Artificial Intelligence) — constitutes a necessary condition for IA (or Intelligence Augmentation). In this connection, it might be argued that the thought of the German phenomenologist Martin Heidegger (1889–1976) — more specifically, that aspect of his early thinking concerned with the being (or ontology) of human beings as interpreted by Hubert Dreyfus — is highly relevant to Cognitive Technology in that it appears to suggest how the Schizophrenia Problem can be solved. According to Dreyfus (1991), Heidegger holds subjective experience to be grounded in, and thereby emergent from, a more primitive existential experience — Dasein or being-in-the-world — that is ontologically prior to subjectivity and objectivity. If Dreyfus’ Heidegger is correct, then the Schizophrenia Problem
“The end of the Dreyfus affair” 335
is solvable because the ‘hard problem’ can be solved by constructing artificial Daseins capable of generating consciousness as an emergent phenomenon. In this chapter, it will be argued that appealing to Heideggerian thought in the context of attempting to solve the Schizophrenia Problem associated with cognitive technologies is problematic on (at least) three counts: First, Dreyfus’ interpretation of Heidegger, or rather, technologists’ selective appropriation of Dreyfus’ interpretation of Heidegger, while (possibly) illuminating from a technological standpoint, can be shown to be distorting when viewed from the perspective of Heidegger scholarship. Crucially, this fact may be of more than merely academic significance; second, Heidegger’s commitment to an empirical-realist conception of nature as intrinsically non-experiential can be shown to undermine the possibility of a Heideggerian ‘emergentist’ solution to the ‘hard problem’; third, it is suggested that because the technical construction of artificial systems — in this instance, synthetic Daseins — occurs under an implicit subject-object (or artificer-artifact) orientation, the primitive components of such systems will necessarily stand in extrinsic (or external) relation to each other. This fact is of critical significance since Heidegger holds that beings are relationally-constituted, thereby entailing a commitment to an ontology grounded in intrinsic (or internal) relationality. In closing, it will be argued that, since Heidegger cannot solve the ‘hard problem’, it is necessary to look elsewhere for a solution to the Schizophrenia Problem. In this connection, Whiteheadian panexperientialism seems promising in that it appears to solve the ‘hard problem’. However, this is at the price of a commitment to an ontology grounded in intrinsic (or internal) relationality which undermines the possibility for constructing artificial Daseins capable of consciousness, thereby rendering the Schizophrenia Problem unsolvable.
2. ‘The Dreyfus affair’ Determining the implications of Heidegger’s thought for Cognitive Technology is arguably as difficult a task as determining his standing in Western academic philosophy: On the one hand, Heidegger is (generally) regarded as an intellectual charlatan of consummate proportion (and extremely dubious moral standing) by members of the Anglo-American philosophical establishment; on the other hand, he is (largely) revered as a genuinely original thinker who has contributed both profusely and profoundly to the enrichment of Continental philosophy. Similarly, on the one hand, Heidegger’s later thought, in particular,
336 Syed Mustafa Ali
his assertion that “the essence of technology is by no means anything technological” (Heidegger, 1977, p. 4), has been regarded by anti-technologists as establishing grounds upon which to mount a universal critique of technology; on the other hand, certain Heideggerian insights have been embraced by technologists in an attempt at resolving intractable problems of long standing. Although the claim that Heidegger has contributed significantly to the debate on the meaning and scope of technology is not, in itself, in question, determining the precise nature of his contribution(s) — in the present context, the implications of his thought for the development and critical evaluation of cognitive technologies — is problematic because there are many ways to interpret and appropriate his meditations on this issue by appealing to different ‘aspects’ and ‘phases’ of his phenomenological inquiry into being. In this connection, Dreyfus’ (1972) seminal critique of ‘GOFAI’ (GoodOld-Fashioned-Artificial-Intelligence), which makes extensive use of the ‘existential analytic of Dasein’ (that is, the situated analysis of the onto-phenomenological structures of human being) presented in Heidegger’s Being and Time (Heidegger, 1962) in order to contest the sufficiency of disembodied, a-contextual, symbolic computation as a means by which to instantiate real yet synthetic intelligence, has played an important, perhaps even decisive, role in motivating practitioners to consider engaged, embedded, and non-representational approaches to computing grounded (at least partly) in Heideggerian thought. It is crucial to appreciate at the outset that Dreyfus’ approach to AI critique was philosophical and not technological, being driven by a desire to draw attention to the perceived failings of an extant technology. Dreyfus’ primary concern was not — and, arguably, could not be, given his lack of technical expertise — to develop technological solutions to the problems of AI; this task was left to the technologists among his later followers. Connectionist approaches to consciousness (Globus, 1995) and cognition (Clark, 1997), robotic approaches to artificial life (Wheeler, 1996; Prem, 1997), and the (re)conceptualisation of the information systems paradigm in terms of communication rather than computation (Winograd and Flores, 1986; Coyne, 1995) have all benefited from Dreyfus’ engagement with Heidegger. There are (at least) two points to note in connection with the above: First, ‘The Dreyfus Affair’ — that is, Dreyfus’ engagement with Heidegger, on the one hand, and with the AI community, on the other hand — provides a relatively recent example of the social determination of technology, the specifically philosophical character of the determination calling into question more conventional theses on technological determinism; second, perhaps what is
“The end of the Dreyfus affair” 337
most significant and yet often overlooked, is the fact that Dreyfus’ critique of AI was only finally acknowledged and subsequently integrated into technology theory and practice because it could be so incorporated. In short, it is maintained that Dreyfus — and thereby Heidegger — was eventually taken seriously by technologists because his interpretation of Heidegger allowed the technological project to continue. While this appears to reverse the order of determination described previously, it is not this fact that is especially interesting since the reflexive nature of the relations of determination between society and technology has long been appreciated by sociologists and philosophers of technology. Rather, what is interesting, is the fact that Dreyfus’ critique was ultimately regarded as both valid and relevant because it showed that an embedded developmental history of embodied engagement constitutes a necessary condition for ‘coping’ with the world in an intelligent manner. Crucially, as Olafson (1994) and others have shown, Dreyfus’ notion of ‘coping’ is grounded in an instrumentalist-pragmatist interpretation of Heidegger’s thought. However, as Blattner (1992), Fell (1992), and, significantly, Dreyfus himself (Dreyfus, 1992), have all shown, instrumentalist, pragmatist and/or behaviourist interpretations of Heidegger’s thought are both limited and ‘dangerous’ because partial and hence, distorting. According to Olafson, “it would be a pity if Dreyfus, who has done so much to refute the computer theory of human being, were to be in the painful position of seeing his own formulations give an illusory sense of affinity with Heidegger to people who are utterly at odds with his views.” (Olafson, 1994, p. 52) It is, therefore, somewhat ironic that Dreyfus, who has been charged with misappropriating Heidegger’s thought on various grounds,1 himself ends up being misappropriated by practitioners of technoscience (AI, A-Life, etc.). In summary, and philosophically speaking, ‘The Dreyfus Affair’ appears to be over.
3. Heidegger and cognitive technology The implications of the end of ‘The Dreyfus Affair’ for Cognitive Technology are somewhat unclear since it is possible that Dreyfus’ interpretation of Heidegger remains practically relevant despite its philosophical shortcomings. For example, the application of Heideggerian thought to cognitive technology, with the latter interpreted as artificial (or synthetic) means by which meaning might be extended in the interaction between humans and machines, appears warranted
338 Syed Mustafa Ali
given (1) the identification of being with intelligibility or meaning, viz. Sein as Sinn (or sense), (2) the mutual dependency of being and Dasein (or being-inthe-world), (3) the ontological priority of Dasein over the conscious subject, and (4) the onto-phenomenological claim that being-with (Mitsein) other Daseins is a constitutive existential structure of Dasein. This is because (1)–(4) ostensibly provide the foundations of a framework for solving the Schizophrenia Problem by allowing for an emergentist solution to the ‘hard problem’ that can be implemented by natural and artificial (or synthetic) Daseins alike. On this basis, it might be argued that it is necessary to shift the goal of Cognitive Technology from constructing ‘instruments of mind’ — what Heidegger would call Zuhandenheit, which Dreyfus translates as ‘availability’ (or that which is ‘ready-to-hand’) in reference to Dasein-centric, pragmatically-functional, transparent ‘equipment’ (Zeug) — to emergent construction of mindedinstruments, that is, ‘instruments with mind’.
4. Heidegger and the ‘hard problem’ of consciousness According to Schatzki (1982), Heidegger is an empirical realist: On his view, what something is ‘in itself ’ is what it is independently of its actually being encountered by a Dasein. (Kant, by contrast, is held to be a transcendental realist: On his view, what something is ‘in itself ’ is what it is independently of any possible knowledge of it.) It is crucial to appreciate that empirical realism entails that the being of all beings, both human and non-human, is, in principle, publicly accessible to Dasein because this fact assumes critical significance when the ‘other-minds’ problem, that is, the problem of determining whether or not other beings are capable of consciousness (first-personhood, ontological subjectivity, private experience), is considered. The (later) Heideggerian solution to this problem involves recognizing the following as existential facts: (1) being-with other Daseins is a fundamental (or constitutive) structure of Dasein; (2) Dasein (as being-in-the-world) has primacy over consciousness; (3) both Dasein and consciousness are linguistically-constructed. On this basis, the ‘other-minds’ problem is discharged by observing that because (1) Daseins share language and (2) there are a plurality of Daseins, therefore, a plurality of consciousnesses (or minds) is possible. However, it is important to draw out the full implications of this approach to solving the ‘other-minds’ problem: Heidegger is forced to conceive subjectivity in objective (or public) terms because, on his empirically-realist view, the subjectivity of a subject is disclosable, in
“The end of the Dreyfus affair” 339
principle, to and by other subjects. Since it is only Daseins that share language, only Daseins can become consciousnesses (or first-person, private subjects). Crucially, on his view, nature as it is ‘in-itself ’ (that is, independent of Dasein) discloses itself ‘in a barren merciless’ as ontologically objective and hence, ‘absurd’ or meaningless. Heidegger (1962) insists that this view of nature is not grounded in a value judgement but reflects an ontological determination that follows from the fact that it is Dasein alone who gives being (intelligibility or meaning) to beings.2 However, this position is contestable on (at least) four grounds: First, it is not at all clear that consciousness is a (purely) linguistic phenomenon, more specifically, an emergent linguistic artifact. Second, more importantly, it does not follow from the fact that since Daseins are the only beings that share language, therefore only Daseins are capable of conscious (or at least some degree of private, subjective) experience. According to Krell (1992), life may constitute a sufficient existential condition for being a ‘clearing’ or ‘opening’, that is, a space of possible ways for things (including human beings) to be. While it might be conceded that the being (sense or meaning) of beings disclosed by Dasein is of a significantly higher order than that disclosed by (other) beings themselves, it simply does not follow from the shareability of language peculiar to Dasein that disclosure of being by other beings is impossible; human-centred meaning is not necessarily coextensive with meaning as such. In short, Heidegger’s position appears untenably anthropocentric. Third, the view that nature is fundamentally ‘vacuous’ or non-experiential is an assumption which is undermined by the empirical fact that, while experiential beings are definitively known to exist, it is unclear whether any non-experiential beings have, in fact, ever been encountered (Griffin, 1998). Finally, Heidegger’s dualism of meaningful subjects and meaningless objects gives rise to the ‘hard problem’ (Chalmers, 1996), that is, the problem of explaining how ontological subjectivity can arise from an ontologically objective substrate. Heidegger cannot avoid this problem because his empirical realism commits him to the view that science can, in principle, causally explain how things came to be the way they are (Dreyfus, 1991); clearly, this includes explaining how the brain — which Globus (1995) identifies as a necessary condition for Dasein — can give rise to consciousness. Emergentist solutions to the ‘hard problem’, which view consciousness as an irreducible systemic property arising from the interaction of components, none of which possess this
340 Syed Mustafa Ali
property or properties categorially-continuous with this property in isolation or in other systemic complexes, are problematic because they disregard the principle of ontological continuity, arguably a cornerstone of scientific naturalism (Griffin, 1998).3
5. Post-Heideggerian ontology and cognitive technology It appears then that Heidegger’s engagement with cognitive technology, at least with respect to the relevance of his thought to the Schizophrenia Problem, is, like ‘The Dreyfus Affair’, at an end. Principally, this is because Heidegger cannot solve the ‘hard problem’ due to what is, somewhat ironically, a phenomenologically-unsound (mis)conception of nature as intrinsically non-experiential. Thus, if the Schizophrenia Problem is to be addressed, it is necessary to consider ‘post-Heideggerian’ conceptions of the being of nature. On Whiteheadian panexperientialism, for example, nature is held to be relationally-constituted and experiential at its most primitive ontological level. However, this does not imply that all beings are experiential in the same way (that is, ontological monism does not entail ontical monism); rather, certain complex beings enjoy a higher-level of experience relative to simpler beings. In addition, all complex beings belong to one of two kinds, experiential ‘compound’ (or ‘societal’) individuals or non-experiential aggregates, depending on the nature of their internal (or constitutive) relational organisation (Griffin, 1998). Crucially, if Whiteheadian panexperientialism is the way that nature is initself then the possibility of constructing an artificial Dasein is radically undermined because artificing (construction, making) involves an orientation in which ‘subjects’ stand in ontological opposition to ‘objects’ (Heidegger, 1977), thereby ‘rupturing’ the nexus of internal (subjective, constitutive) relations constituting natural beings so as to establish — more precisely, impose — external (objective, non-constitutive) relations between ‘primitives’ (components) in the synthetic systemic complex (Ladrière, 1998). To the extent that Dasein is, ontically-speaking, a natural phenomenon,4 its being must be internallyconstituted; however, artificial systems are externally-constituted which implies that they cannot provide the necessary ontical (causal) substrate for Dasein. In short, genuine Mitsein, arguably a necessary condition for an emergentist solution to the ‘hard problem’ and, thereby, to the Schizophrenia Problem associated with cognitive technologies, cannot be generated technically.
“The end of the Dreyfus affair” 341
6. Conclusion If, as has been herein argued, the Schizophrenia Problem is unsolvable in the sense that the need for human psychic self-amputation cannot be eliminated completely but merely reduced in extent, it appears that cognitive technologies are faced with a choice: Either uphold the assumption of phenomenological symmetry underlying Janney’s conflated conception of “the prosthesis as partner” and consider alternative technological prostheses other than the computer, or abandon this assumption and embrace a genuinely pragmatic — and Heideggerian — orientation to the computer taken as tool.5 Irrespective of the choice made, Janney’s project of “finding out where the prosthesis ‘pinches’, so to speak [since] progress will depend on discovering and describing the sources of sensory and psychic irritation at the human-computer interface” (Janney, 1997, p. 5), remains both valid and important.
Notes * This chapter is a modified version of Ali (2001). 1. For example, with respect to the question of whether Heideggerian phenomenology is anti-individualistic (Olafson, 1994), anti-representational in character (Christensen, 1997, 1998), and at least minimally consistent with some version of scientific naturalism (Christensen, 1997, 1998; Pylkkö, 1998). 2. According to Fell, “[this] is one of the most problematic assertions in the entire Heideggerian corpus.” (Fell, 1979, p. 119) 3. It might be argued that the validity of this argument is called into question by the fact that “‘consciousness’ and its cognates are no longer part of Heidegger’s operative philosophical vocabulary. They have been replaced by a concept of existence as the mode of being of an entity for which the things with which it deals are there [Da-Sein], whether in the mode of perceptual presence or some other form of presence-in-absence.” (Olafson, 1994, p. 52) Crucially, “if Dasein is a unitary entity defined by existence and presence and not a compound of body and mind, then it is utterly obscure how ‘consciousness’ could be reintroduced into it.” For this reason, Olafson maintains that “if the story of the demise of ‘consciousness’ and ‘mind’ had been told in a way that does full justice to the ontology that replaces mind-body dualism, this sort of ad hoc revision [in which Dasein ‘emerges as a conscious subject’] would be quite unnecessary.” (Olafson, 1994, p.52) However, if Heideggerian phenomenology is indeed a form of empirical realism, and if, as Dreyfus maintains, “later Heidegger could be called a plural realist”, that is, one who asserts that “there can be many true answers to the question, What is real?” (Dreyfus, 1991, pp.262–263) including a scientificallynaturalistic answer, and granted Heidegger’s ‘disenchanted’ conception of nature in both his
342 Syed Mustafa Ali
early and later thinking, it follows that some such ‘emergentist’ revision as the one proposed by Dreyfus is necessary and yet insufficient as a solution to the ‘hard problem’. 4. On empirical-realism, specific natural — more precisely, biological — conditions are necessary, yet insufficient, for Dasein (Dreyfus, 1991; Globus, 1995). 5. A similar position is adopted by Stojanov and Stojanoski (2001) who argue that computers should not be viewed as interlocutors (or dialogic agents) standing in a symmetric relation with human beings, but rather as asymmetrically-related cognitive prostheses for humans. However, it is important to appreciate that their position is grounded in the assumption that interface ontology is metaphorical, an epistemological stance that conflicts with the ontological orientation of Heideggerian phenomenology in which how something is taken to be is constitutive of what that thing is.
References Ali, S. M. (2001). “The End of the (Dreyfus) Affair” (Post)Heideggerian Meditations on Man, Machine, and Meaning. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Proceedings of the Fourth International Cognitive Technology Conference CT’2001: Instruments of Mind, pp. 149–156. Berlin: Springer-Verlag. Blattner, W. D. (1992). Existential Temporality in Being and Time: (Why Heidegger is not a Pragmatist). In H. L. Dreyfus and H. Hall (Eds.), Heidegger: A Critical Reader, pp. 99–129. Oxford: Blackwell. Chalmers, D. J. (1996). The Conscious Mind: In Search of a Fundamental Theory. Oxford: Oxford University Press. Christensen, C. B. (1997). Heidegger’s Representationalism. The Review of Metaphysics 51, 77–103. Christensen, C. B (1998). Getting Heidegger Off the West Coast. Inquiry 41, 65–87. Clark, A. (1997). Being There: Putting Brain, Body, and World Together Again. Cambridge, Mass.: MIT Press. Coyne, R. D. (1995). Designing Information Technology in the Postmodern Age: From Method to Metaphor. Cambridge, Mass.: MIT Press. Dreyfus, H. L. (1972). What Computers Can’t Do: A Critique of Artificial Reason. New York: Harper & Row. Dreyfus, H. L. (1991). Being-in-the-world: A Commentary on Division I of Heidegger’s Being and Time. Cambridge, Mass.: MIT Press. Dreyfus, H. L. (1992). Heidegger’s History of the Being of Equipment. In H. L. Dreyfus and H. Hall (Eds.), Heidegger: A Critical Reader, pp. 173–185. Oxford: Blackwell. Fell, J. P. (1979). Heidegger and Sartre: An Essay on Being and Place. New York: Columbia University Press. Fell, J. P. (1992). The Familiar and The Strange: On the Limits of Praxis in the Early Heidegger. In H. Dreyfus and H. Hall (Eds.), Heidegger: A Critical Reader, pp. 65–80. Oxford: Blackwell. Globus, G. G. (1995). The Postmodern Brain. Amsterdam/Philadephia: John Benjamins.
“The end of the Dreyfus affair” 343
Griffin, D. R. (1998). Unsnarling the World-Knot: Consciousness, Freedom and The Mind-Body Problem. Berkeley: University of California Press. Haraway, D. J. (1985). Manifesto for Cyborgs: Science, Technology, and Socialist Feminism in the 1980’s. Socialist Review 80, 65–108. Heidegger, M. (1962). Being and Time. Translated by J. Macquarrie and E. Robinson. New York: Harper & Row. Heidegger, M. (1977). The Question Concerning Technology and Other Essays. Translated by W. Lovitt. New York: Harper & Row. Janney, R. W. (1997). The Prosthesis as Partner: Pragmatics and the Human-Computer Interface. In J.P. Marsh, C.L. Nehaniv & B. Gorayska (Eds.), Proceedings of the. Second International Cognitive Technology Conference CT’97: Humanizing the Information Age, pp.1–6. IEEE Computer Society Press. Modified version (“Computers and Psychosis”) appears in J.P. Marsh, B. Gorayska & J.L. Mey (Eds.) (1999), Humane Interfaces: Questions of methods and practice in Cognitive Technology, pp.71–79. Amsterdam: North Holland. Krell, D. F. (1992). Daimon Life: Heidegger and Life-Philosophy. Bloomington and Indianopolis: Indiana University Press. Ladrière, J. (1998). The Technical Universe in an Ontological Perspective. Philosophy and Technology 4 (1), 66–91. Olafson, F. (1994). Heidegger à la Wittgenstein Or ‘Coping’ with Professor Dreyfus. Inquiry 37, 45–64. Picard, R. W. (1997). Affective Computing. Cambridge, Mass.: MIT Press. Prem, E. (1997). Epistemic Autonomy in Models of Living Systems. In P. Husbands and I. Harvey (Eds). Fourth European Conference on Artificial Life, pp. 2–9. Cambridge, Mass.: MIT Press. Pylkkö, P. (1998). The Aconceptual Mind: Heideggerian Themes in Holistic Naturalism. Amsterdam: John Benjamins. Schatzki, T. (1982). Early Heidegger on Being, The Clearing, and Realism. In H. L. Dreyfus and H. Hall (Eds.), Heidegger: A Critical Reader, pp. 81–124. Oxford: Blackwell. Stojanov, G. & K. Stojanoski. (2001). Computer Interfaces: From Communication to MindProsthesis Metaphor. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Proceedings of the Fourth International Cognitive Technology Conference CT’2001: Instruments of Mind, pp. 301–310. Berlin: Springer-Verlag. Wheeler, M. (1996). From Robots to Rothko: The Bringing Forth of Worlds. In M. A. Boden (Ed.), The Philosophy of Artificial Life, pp. 209–236. Oxford: Oxford University Press. Winograd, T. & F. Flores. (1986), Understanding Computers and Cognition: A New Foundation for Design. Reading: Addison-Wesley.
Martin Luther King and the “ghost in the machine” Will Fitzgerald Kalamazoo College
Twins of different color In 1955, John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon submitted “A proposal for the Dartmouth summer research project on Artificial Intelligence” (McCarthy et al. 1955). This workshop, which was held in the summer of 1956 at Dartmouth College in Hanover, New Hampshire, was a “two month, ten man study” of “the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” Although there were certainly precursors to artificial intelligence research before this summer — Turing’s paper on intelligence (Turing, 1950) and the field of cybernetics come immediately to mind — this workshop was the first to use the term “artificial intelligence” (AI). Among the conference attendees were men crucial to the development of AI and computing in general. McCarthy, who was at Dartmouth at the time, went on to create the computer language Lisp, of fundamental importance to AI. Minsky and McCarthy founded the AI lab at MIT. Rochester, who worked for IBM, was influential in the design of early IBM computers. Shannon, of course, had already published “A mathematical theory of communication” (Shannon, 1948), the foundational document for information theory. Other attendees were Herbert Simon, who received the 1978 Nobel prize for Economics for his work on decision theory; Allen Newell, who, along with Simon had created the “Logic Theorist”, an early AI reasoning program that debuted at the conference; Arthur Samuel and Alex Berstein, also of IBM, who wrote influential programs that played checkers and chess; Ray Solomonoff, who did foundational work in machine learning and theories of induction; Oliver Selfridge, also a machine learning researcher; and Trenchard More, who developed array theory (fundamental to
346 Will Fitzgerald
the programming language APL). McCorduck makes a strong case that the Dartmouth Conference was a “turning point” for the field, and that attendees of the first AI conference effectively “defined the establishment” of AI research, at least in the United States (McCorduck, 1979) for the next twenty years. Also in 1955, Rosa Parks famously refused to give up her seat for White riders on a bus in Montgomery, Alabama. The bus driver called the police; they arrested Parks for breaking Montgomery’s segregation laws which required her to give up her seat. The Montgomery Improvement Association was formed, and Dr. Martin Luther King was elected its president. The boycott lasted through almost a year, despite great financial hardship and violence against King and others, until the United States Supreme Court declared Montgomery’s racial segregation laws unconstitutional. King achieved prominence as a result of the Montgomery bus boycott, and the struggle for civil rights for AfricanAmericans became a national issue. Both the modern U. S. Civil Rights movement and AI research were born in the mid-fifties. It will surprise no one that these two had little or no influence on each other. All of the attendees of the Dartmouth summer conference were White, were male, were from a small number of northern institutions — Princeton, MIT, CIT (later CMU), and IBM. The civil rights movement started in the U. S. South by African-Americans who sought basic human rights: the right to vote, to use public transportation and other public accommodations freely, to work. Still, by taking no notice of the civil rights movement happening around them, the attendees at the Dartmouth conference may have missed some opportunities. What would have AI research been like, for example, if King had attended the Dartmouth conference?
King on high technology King, of course, was not a technologist, nor did he write extensively about technology. When he did, he tended to be pessimistic about its goals and consequences. For example, he warned that automation might become a “Moloch, consuming jobs and (labor) contract gains (King, 1986b).” In his last Sunday morning sermon before his assassination, he said (King, 1986c): There can be no gainsaying of the fact that a great revolution is taking place in the world today. In a sense, it is a triple revolution: that is a technological revolution, with the impact of automation and cybernation; then there is a revolution in weaponry, with the emergence of atomic and nuclear weapons of
Martin Luther King and the “ghost in the machine” 347
warfare. Then there is a human rights revolution, with the freedom explosion that is taking place all over the world. … Through our scientific and technological genius, we have made of this world a neighborhood and yet we have not had the ethical commitment to make of it a brotherhood.
Clearly, he hoped that high technology could aid the human rights revolution, but he feared it would not. The first set of opportunities that AI researchers missed were to build a science that could be of peaceful service to the community of humanity. King saw the issue in stark terms: “In a day when Sputniks and Explorers are dashing through outer space and guided ballistic missiles are carving highways of death through the stratosphere”, he wrote in 1961 (King, 1986c), “no nation can win a war. The choice is no longer between violence and non-violence; it is either non-violence or non-existence. Unless we find some alternative to war, we will destroy ourselves by the misuse of our own instruments.”
The ghost in the machine The philosopher Gilbert Ryle spoke, with “deliberate abusiveness” of Cartesian dualism as “the ghost in the machine.” The conjecture of the Dartmouth workshop (“every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it”) is often described in dualistic terms: intelligence is just software that — so far — runs on the hardware of the human brain, but, in theory, can also run on computer hardware. This is often called the “Physical Symbol System Hypothesis” (Newell and Simon, 1976). I do not wish to discuss whether the Physical Symbol System Hypothesis is a priori philosophically false. The expression “ghost in the machine” is very evocative, however, and many writers have used this image. In particular, I would like to quote from Gail B. Griffin’s Teaching Whiteness (Griffin, 2003), a collection of critical essays, stories and reflections on being White and teaching at a largely White institution. Griffin picks up on the idea of the ghost in the machine to describe how Whiteness haunts: The irony (or paradox, or both) of whiteness is that its failure to name itself, while it arrogates one kind of godlike power (the power of universality and ubiquity), denies another. For to be universal and ubiquitous — to be Everything, Everywhere — is in fact to be Nothing, and Nowhere, in particular….As the absent agent in a passive construction, whiteness erases itself. White
348 Will Fitzgerald
language says, in short, “I am not here; I do not exist.” It does so, of course, to avoid implicating itself in the relations, past and present, of racism. But the price for such exoneration is eternal absence, non-being — ghostliness.
It is, perhaps, naive to think that the researchers who attended the 1956 Dartmouth workshop could have usefully viewed themselves as “White” (or even as male). But it is also, I think, significant that all of the original artificial intelligence researchers were White (and male), and this “ghostliness” of Whiteness hovers over AI research to the present. To pick up a copy of AI Magazine and look at the pictures of authors and participants is still to see mostly White, male faces. Again, this will come as a surprise to no one. And yet this was another opportunity lost. It can at least be imagined that the early AI researchers could have, as the “active agent of an active construction”, reflected on, written about, and modeled the effects of their own Whiteness on their science. A popular and excellent texbook on artificial intelligence (Russell and Norvig, 2003) states AI researchers tend to define AI as either a descriptive or prescriptive field of either thinking or action. The word used for prescriptive is rational. A rational thinker draws just the right conclusions from premises; a rational actor acts to maximize its best expected outcome. Does it go without saying that defining intelligence in this way seems especially easy to do from a White and male perspective?
A naive model of idea discovery As any AI researcher can tell you, doing research in artificial intelligence is hard. Everything seems to be tied up with everything else. It was thought that breaking the field into smaller components would be useful, and, of course, it is. But it turns out that to understand computer vision is just as hard as getting computers to use language as humans do; to program computers to create their own programs — to plan — is just as hard as drawing the right inferences at the right time. There is even an expression for this: “AI-hard.” Language, vision, planning, and every useful subfield of AI seem AI-hard; and the field of AI can use all the help it can get to discover new ideas that push the field forward. Here’s a deliberately naive view of idea discovery: consider good ideas as being in a Platonic field, waiting to be discovered. Maybe they are in Erdös’s “God’s book of Proofs”, maybe, like Kepler, we hope to think God’s thoughts after him. Consider researchers as essentially experimental trials; each idea is likely to be discovered by a particular researcher with some probability.
Martin Luther King and the “ghost in the machine” 349
Assume, again, with deliberate naiveté, that the ideas are independent of one another, as are the researchers. Further, let’s assume that each of the researchers is as good as each of the others, and that the ideas are all equally easy to discover. That is, we’re assuming the probability — call it p — of any researcher discovering any particular idea is always the same. Making these assumptions gives us a simple equation for how likely an idea is to be discovered by at least one researcher: 1 −(1 − p)n, where n is the number of researchers. If the goal of AI is to discover with more good ideas, how can we do this?
Increasing the exponential To discover more ideas, we want to increase 1 −(1 − p)n. One way is to increase p, but (given our naive model) this is the same for all ideas and researchers. The only other way to increase this number is to increase the exponential, n. Perhaps someone has given you a time machine, and allowed you to hand pick the invitation list to the Dartmouth workshop, and you can invite up to 100 people, but just in proportion to their representation in the general population. In 1940, according to US census figures, about 43% of the population were White, non-Hispanic males — basically, the kind of people who were invited to the original workshop. If you just invite the 43, the probability of an idea being discovered is about 0.35.1 If women are included (bringing the total to 88), this jumps to 0.59, and, with all US citizens represented, 0.64. If, in the present (based on 2000 census figures), you were given the same opportunity, just inviting White non-Hispanic males to the workshop is even grimmer, because the percentage has dropped to about 34%, so the probability is just 0.29. In other words, by including a full complement of people, we more than double the odds of an idea being discovered.
Colorless green ideas, sleeping furiously We know that people are not colorless; it’s likely that ideas are not colorless either. It’s not surprising that the men of AI made great strides in discovering good ideas about rational action. It’s also not surprising that the men of AI did not begin researching the importance of emotion on human thinking — the
350 Will Fitzgerald
focus on “rationality” makes this a stretch. The intuition here is that the “color” of researchers and ideas effect the likelihood of the an idea being discovered. Let’s make the model just slightly less naive, and add “color” to ideas and researchers. Let’s say that researchers are either blue or green, and ideas are either blue or green. Further, let’s assume that it is much more likely for researchers to discover ideas of the same color than ideas of a different color.2 Given 43 “green” researchers (representing the percentage of White, non-Hispanic men in 1940) and 0 “blue” researchers, and assuming that “green” and “blue” ideas are equally distributed, the probability of an idea being discovered is 0.21. With 43 “green” researchers and 57 “blue” researchers, this increases to 0.42. Using 2000 figures, 34 “green” researchers and 0 “blue” researchers yields only a 0.16 probability. In other words, using 1940 figures, the odds of an idea being discovered using both “blue” and “green” researchers is about two times as great as using “green” researchers alone. Using 2000 figures, the odds are about 2.5 times as great. If we take a more “social constructivist” stance, modeling this by allowing the ratio of blue to green ideas to be equivalent to the ratio of blue and green researchers, the odds of an idea being discovered by blue and green researchers is about 3.5 times as great as using green researchers alone.
Dr. King is my research advisor This model of scientific discovery is deliberately naive, as I said, and, as written can be criticized in two important ways. First, to the extent that the model is oversimplified, it may not be the case that increasing the diversity of a research team will yield new good ideas. Second, by using “blue” and “green”, it can be said that I am avoiding the real issues of race and gender for which they are proxies. And, to both of these criticisms, I will agree. Yet, I think the exercise is a useful one, for it brings out the question: are there research areas of what it means to be “intelligent” or “human” that artificial intelligence should be exploring, but has not? Are there themes of “being human” or “being intelligent” that are not captured by the “rational agent” model? Rereading some of King’s essays (King, Jr., 1986abc) makes it clear that this is the case. Among the themes that King addresses are justice, mercy, conversion, forgiveness, violence, revenge, race, politics, resistance, persuasion, honor, dignity, sacrifice, love, and evil. And, of course, there are many more. A really good AI model of forgiveness, for example, is, I suspect, no harder to create than a good AI model of temporal reasoning, and no easier as well.
Martin Luther King and the “ghost in the machine”
Now, to be fair, some researchers have investigated themes of this sort, especially by AI “scruffies” (who favor experimentation and reflection, in contrast to the AI “neats” who desire mathematical and logical formalization). For example, much of the research by Roger Schank and those associated with him (Schank, 1982; Schank and Abelson 1977) focused on story understanding, and it’s difficult to make progress in real story understanding without focusing on themes such as these. Still, as, Russell and Norvig state, “recent years have seen a revolution in both the content and the methodology of work in artificial intelligence. It is now more common to build on existing theories than to propose brand new ones, to base claims on rigorous theorems or hard experimental evidence rather than on intuition.” Perhaps this means that the field is more mature; perhaps it just means the neats have won, and the rigor is rigor mortis, as Birnbaum claims (Birnbaum, 1991). But perhaps AI needs a renewal in the themes that it studies to open itself up to new, big ideas of what it means to think, to act, to be human.
What could AI be? Martin Luther King had more important things to attend to than to participate in the 1956 Dartmouth workshop on artificial intelligence. Still, had he done so as a full participant, the field of artificial intelligence research might have followed different directions. AI missed several opportunities, but AI researchers can still pursue richer strands of research. I sometimes wonder whether, along with fields like “computational biochemistry” and “computational physics” and all of the other “computational X” fields, there couldn’t be a “computational humanism” that could claim (and reclaim) some of the themes described by King and others, both building models of anything of what it means to be human, as well as being a model of an humane, anti-racist science.
Notes 1. This is with p = 0.01. 2. For the examples below, I use 0.01 and 0.001, respectively. See Table 1 in the Appendix for details.
351
352 Will Fitzgerald
Appendix: Idea discovery with color There are Green ideas and Blue ideas. Let P(G) be the proportion of Green ideas and P(B) be the proportion of Blue ideas. There are green and blue researchers as well. Let P(g) be the proportion of green researchers and P(b) be the proportion of blue researchers. Researchers are more likely to discover ideas that are the same color as the researcher. Let P(=) be the probability that a researcher discovers an idea of the same color and P(π) the probability that a researcher discovers an idea of a different color. Then p, the probability of an idea being discovered, is: P(G) × P(g) × P(=) + P(B) × P(g) × P(π) + P(B) × P(g) × P(π) + P(B) × P(B) × P(=). If N is the number of researchers, then P(I), the probability of an idea being discovered, is 1 −(1 − p)n. Table 1.Naive model of idea discovery for ideas with “color”. P(I) is the probability of an idea being discovered (1 −(1 − p)n). Equal number of ideas P
N
Year
Researcher P(G)
P(B)
P(g)
P(b)
P(=)
P(π)
P(I)
1940
White male 0.50 0.50 White 0.50 All
0.50 0.50 0.50
1.00 0.49 0.43
0.00 0.51 0.57
0.01 0.01 0.01
0.001 0.0055 43 0.001 0.0055 88 0.001 0.0055 100
0.211 0.385 0.424
2000
White male 0.50 All 0.50
0.50 0.50
1.00 0.34
0.00 0.66
0.01 0.01
0.001 0.0055 34 0.001 0.0055 100
0.171 0.424
Social constructivist 1940
White male 0.43 0.43 White 0.43 All
0.57 0.57 0.57
1.00 0.49 0.43
0.00 0.51 0.57
0.01 0.01 0.01
0.001 0.0049 43 0.001 0.0055 88 0.001 0.0056 100
0.189 0.385 0.429
2000
White male 0.34 All 0.34
0.66 0.66
1.00 0.34
0.00 0.66
0.01 0.01
0.001 0.0041 34 0.001 0.0060 100
0.129 0.450
References Birnbaum, L. (1991). Rigor mortis: A response to Nilsson’s ‘Logic and Artificial Intelligence’. Artificial Intelligence 47, 57–77. Griffin, G. B. (2003). Teaching Whiteness: The End of Innocence. Unpublished manuscript. King, Jr., M. L. (1986a). The American Dream. In J. M. Washington (Ed.), A Testament of Hope: The Essential Writings of Martin Luther King, Jr., pp. 208–220. San Francisco: Harper and Row.
Martin Luther King and the “ghost in the machine” 353
King, Jr., M. L. (1986b). If the Negro wins, Labor wins. In J. M.Washington (Ed.), A Testament of Hope: The Essential Writings of Martin Luther King, Jr., pp. 201–207. San Francisco: Harper and Row. King, Jr., M. L. (1986c). Remaining awake through a great revolution. In J. M. Washington (Ed.), A Testament of Hope: The Essential Writings of Martin Luther King, Jr., pp. 268–278. San Francisco: Harper and Row. McCarthy, J., M. Minsky, N. Rochester & C. E. Shannon (1955). A proposal for the Dartmouth summer research project on Artificial Intelligence. Technical report. http://wwwformal.stanford.edu/jmc/history/dartmouth.html. McCorduck, P. (1979). Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence. New York: W. H. Freeman and Company. Newell, A. & H. A. Simon (1976). Computer science as empirical inquiry: Symbols and search. Communications of the ACM 19(3): 113–126. Russell, S. J. & P. Norvig (2003). Artificial Intelligence: A Modern Approach. Second Edition. Prentice Hall Series in Artificial Intelligence. Upper Saddle River, N. J.: Pearson Education, Inc. Schank, R. C. (1982). Dynamic Memory: A Theory of Learning in Computers and People. Cambridge, UK: Cambridge University Press. Schank, R. C. & R. P. Abelson (1977). Scripts, Plans, Goals and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale, N. J.: Lawrence Erlbaum Associates. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656. Turing, A. M. (1950). Computing machinery and intelligence. Mind 59, 433–460.
Name index
A Abell, F. 293 Abelson, R. P. 137, 271, 351 Abowd, G. D. 227 Achinstein, P. 52, 58 Adolphs, R. 293, 294 Agras, W. S. 218 Agre, P. 162, 228 Ahlsen, E. 244 Albertini, R. 219 Alcade, C. 267 Ali, S. M. 12, 333 Allanson, J. 304 Allen, J. F. 88 Allwood, J. 244 Alper, H. P. 53 Altabe, M. 179, 219 Amaral, D. G. 273, 274, 275, 294 Anderson, A. E. 217 Anderson, A. R. 56 Anderson, B. 2 Anderson, H. 308 Appelt, D. E. 88 Arbib, M. 141, 293 Arcidiacono, L. 276, 277, 285 Argyle, M. 278, 289 Ashburner, J. 293 Ashtroh, M. 59 Ashwin, C. 274, 294 Atala, K. 219 Atkinson, J. M. 307 B Bachevalier, J. 293, 294 Bacon, F. 44 Baddeley, A. D. 68, 270, 292
Badner, J. 268 Bailey, A. 268 Baizer, J. S. 71 Balasubramanian, N. V. 2 Baratoff, G. 114 Bar-Cohen, Y. 120 Bar-Hillel, Y. 56 Barnard, P. 326 Barnes, J. 12, 14, 64, 267 Baron, F. 90 Baron-Cohen, S. 64, 138, 270, 271, 274, 277, 278, 289, 292, 294 Barsalau, I. W. 84 Barton, R. A. 131 Barth, E. M. 59 Bates, E. A. 71, 118 Bateson, G. 244, 245 Battacchi, M. W. 276, 277, 285 Beach, K. D. 9 Beck, A. T. 190, 191 Beck, C. 156 Bedford-Roberts, J. 227 Bellotti, V. 234, 304 Bemporad, J. R. 273 Benari, M. 53 Bensaude-Vincent, B. 59 Bernard-Opitz, V. 267 Bernap, N. D. Jr. 56 Bernstein, B. 91 Berry, J. W. 91 Berstein, A. 345 Bertinardi, M. 293 Bettelheim, B. 269 Bever, J. 131 Beynon, M. 13 Biocca, F. 182
356 Name index
Birdwhistle, R. L. 244 Birmbaum, L. 350 Bishop, D. V. M. 269 Black, M. 52 Blackwell, A. F. 13 Blattner, W. D. 337 Block, N. 56 Boekhorst, R. te 113, 146, 267 Bolton, P. 268 Bonabeau, E. 33 Bonnier, P. 220 Booch, G. 308 Borchers, J. 241, 243, 252, 254, 255, 262, 263 Boumann, M. 293 Boussaud, D. 71 Bower, F. L. 179 Bowers, J. 307 Bowman, S. 12, 14, 64, 267 Boxer, A. M. 218 Bradbury, J. W. 156 Broca, P. P. 72 Brooks, P. J. 274 Brooks, R. 109, 110, 159, 170, 234 Britton, C. 13 Bruce, B. 218 Bruce, B. C. 88 Bruner, J. S. 30, 133, 134, 137, 138, 139, 140, 141, 144 Buck, R. 277 Bullmore, E. T. 274, 290, 294 Bumby, K. 138 Burgess, P. 274 Butensky, E. A. 220 Button, G. 304, 306, 307, 309, 314 Byrne, R. W. 130, 132, 143 C Cabrera, J. 58 Calder, A. J. 294 Calvert, S. 267 Campbell, D. T. 2, 95 Campion, P. 307 Cara, F. 67 Carberry, S. 88
Card, S. K. 3 Carey, D. P. 292 Carlson, N. R. 293 Carnap, R. 51 Carpentier, R. 57 Carvalco, de L. A. V. 294 Carver, C. S. 84 Cassell, J. 56, 148 Cawsey, A. 327 Celani, G. 276, 277, 285 Chalmers, D. 30, 334, 339 Chan, H-M. 2, 10, 12 Chandrasekharan, S. 13, 153, 160 Chambers, D. 28 Chaminade, T. 275 Chappell, J. 155 Charman, T. 139 Charniak, E. 93 Chen, S. H. 267 Cheney, D. L. 143 Chittka, L. 156 Christensen, C. B. 341 Churchill, E. 56, 148 Clancey, W. J. 109, 229, 232 Clark, A. 1, 5, 9, 10, 12, 13, 25, 28, 30, 33, 34, 96, 111, 118, 154, 165, 236, 267, 336 Clark, H. H. 244, 247, 250 Cleermans, A. 72 Clubb, O. L. 2 Cohen, M. B. 71 Cohen, P. 88 Cole, J. 179, 182 Coles, S. 137, 146 Condillac, È. de 43, 46, 59 Conrad, K. 220 Converse, T. 34, 162 Cook, M. 289 Cooley, M. 248 Corbett, B. A. 273, 274, 275, 294 Cox, A. 13 Cox, K. 2, 3, 69, 75, 76, 262 Cox, R. 32, 167 Coyne, R. D. 336 Critchley, H. D. 290
Name index 357
Cross, R. G. 278 Cutting, J. 75, 76 D Daly, E. M. 290 Damasio, A. R. 293, 294 Damasio, H. 293, 294 D’Arcy, J. O. 304 Darwin, C. 275 Dascal, M. 12, 37, 56, 57, 58, 59, 304, 305 Dautenhahn, K. 12, 14, 56, 127, 128, 136, 137, 138, 142, 146, 149, 229, 267, 268, 304, 278, 289 Davenport, T. H. 233 David, P. 304 Dawkins, R. 27 Decety, J. 275 Deleuze, G. 57 Dennett, D. 30, 32, 140, 148 Dennis, M. 276 Dertouzos, M. L. 43, 46 Descartes, R. 7, 43, 46 Desimone, R. 71 Devlin, M. 217, 218 Dey, A. K. 227 Dickerson, P. 278, 289 Dolan, R. J. 294 Donald, M. 131 Dourish, P. 33, 228 Draper, W. 4 Dresner, E. 56, 57 Dreyfus, H. 37, 228, 233, 334, 335, 336, 337, 338, 339, 340, 341, 342 Driver, J. 277 Dunbar, R. I. M. 130, 131, 132, 133, 147, 148, 149 Duncan, J. 274 Durranti, A. 232 E Eckerle, J. 120 Edelman, G. E. 118. Edwards, K. 234 Eggenberger, P. 114 Ehn, P. 4, 5
Ekman, P. 281 El Ashegh, H. A. 12, 64, 175, 268, 290 Elman, J. L. 71, 118 Engel, R. 245, 247 Engel, S. 133, 135, 136, 137, 140 Engeström, Y. 306 Erickson, T. 77, 232, 236 Erlbauh, J. 190, 191 Esfandiari, B. 160 Evans, J. T. St. B. 48, 58 Eysenck, H. J. 189, 190 Eysenck, S. B. G. 189, 190 F Fadiga, L. 141, 292, 293 Farnell, B. 134 Fazio, F. 293 Feldman, C. 138, 139, 140, 141, 144 Fell, J. P. 337, 341 Fergusson, G. A. 91 Ferrari, F. 114 Ferreira, C. 294 Fink, G. 294 Finkelstein, A. 327 Fisher, S. 179 Fishman, A. 294 Fitzgerald, W. 13, 345 Flores, F. 57, 109, 260, 336 Fodor, J. A. 64, 97, 177, 290 Fogassi, L. 119, 141, 292 Foner, L. 58 Foroohar, R. 233 Frackowiak, R. 293, 294 Fraser, N. 327 Friessen, W. V. 281 Friston, K. 293 Frith, C. 293, 294 Frith, U. 270, 271, 292, 293 Frohlich, D. M. 327 Frye, D. 274 Fuks, H. 327 Fukuda, H. 278, 290 Furniss, F. 273 Fuster, J. M. 274
358 Name index
G Gallagher, S. 179, 182 Gallese, V. 119, 141, 275, 292 Gardner-Rick, M. 219 Garner, D. M. 189, 190 Gauthier, D. P. 87, 89, 94 Geertz, C. 44, 45 Gershon, E. 268 Gibson, J. J. 154, 162, 169 Gilbert, C. N. 327 Gill, K. S. 12, 14, 241, 242, 243, 244, 245, 248, 249, 251, 252, 253, 254, 262, 263 Gillingham, G. 273 Gillot, A. 273 Girotto, V. 67 Globus, G. G. 336, 339, 342 Goffman, E. 316 Goldberg, E. 176 Goldman, A. 275 Goldstein, L. 2 Good, D. 244 Goodroad, B. K. 220 Goodwin, C. 232, 242, 261, 306, 307, 314 Goodwin, M. H. 306, 307, 314 Gopnik, A. 140 Gorayska, B. 1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14, 37, 55, 63, 68, 69, 75, 76, 88, 90, 92, 95, 100, 102, 124, 226, 228, 241, 262, 290, 303, 305 Gordon, R. A. 183 Gorsuch, R. L. 190 Gottesman, I. 268 Gottlieb, G. 71 Gould, J. 269 Grass, J. 34, 162 Greatbatch, D. 307 Green, T. R. G. 13 Greenfeld, D. 218 Grèzes, J. 275 Grice, H. P. 58, 65 Griffin, D. R. 128, 339, 340 Griffin, G. B. 347 Griffiths, P. E. 31 Gross, A. G. 58
Groth-Marnat, G. 190, 191 Gruber, T. R. 163 Guattari, F. 57 Guerini, C. 140 Guimbretiere, F. 255 Gullvåg, I. 50 Gupta, S. 225 Gurr, C. 13 H Hall, E. T. 134 Halligan, P. W. 294 Hamburger, S. D. 270, 292 Hammond, K. J. 162 Hampshire, S. 87 Hampton, M. C. 218 Happé, F. 270, 274, 292, 293 Hara, F. 112 Haraway, D. J. 333 Hare, R. M. 87 Harmon, G. E. 58 Harmon, P. 233 Harnad, S. 71 Harper, R. 306, 307 Hasin, D. 217, 218 Hatano, K. 278, 290 Hayashi, T. 58 Head, H. 220 Heath, C. C. 306, 307, 311, 314, 327 Heavey, L. 268 Heidegger, M. 44, 45, 57, 334, 335, 336, 337, 338, 339, 340, 341, 342 Heiling, A. M. 156 Hekkert, P. 27 Herbenstein, M. E. 156 Heritage, J. C. 307 Herman, L. M. 141 Hermelin, B. 277 Herskovits, J. M. 2, 95 Heunemann, R. L. 218 Hindmarsh, J. 306, 307, 311 Hinkley, L. 12, 14, 64, 267 Hinton, G. 29, 34 Ho, W. C. 146 Hoare, C. A. R. 305, 309. 310
Name index 359
Hobbes, T. 43, 57 Hobson, P. 139 Hobson, R. P. 273, 276, 277, 285 Hoffmann, W. 292 Holdrinet, I. 273 Hollan, J. D. 5, 155 Holmes, G. 220 Hood, B. H. 277 Horne, R. L. 217, 218 Horswill, I. 162 Holt-Ashley, M. 220 Howlin, P. 290 Hughes, C. 270, 292 Hughes, J. A. 304, 307, 308, 314, 322 Hull, R. 227 Hummond, K. 34 Hutchins, E. L. 5, 30, 33, 37, 155 Huttinger, P. 267 Hyland, M. E. 84 I Ip, H. 69, 96 Irvine, S. H. 91 Ito, K. 278, 290 J Jacklin, C. N. 214 Jacobs, G. A. 190 James, M. 186, 187, 188, 189 Janney, R. W. 13, 333, 334, 341 Jefferson, G. 307 Jenkins, J. 84 Jirotka, M. 12, 13, 14, 303, 305, 306, 307, 311, 318, 323, 327 Johnson, G. 69, 73, 76 Johnson, M. H. 71, 118 Jones, S. 13 Jordan, R. 138 Josefson, I. 248 Joseph, J. 120 K Kacelnik, J. 155 Kadodo, G. 13 Kagitscibasi, C. 91
Kahneman, D. 90 Kanner, L. 268, 272 Kant, I. 7, 88, 110 Karmiloff-Smith, A. 71, 118 Katagiri, Y. 241, 244, 245 Kato, T. 278, 290 Kaushlik, R. 304 Kawabata, T. 245 Kawamori, M. 241, 244, 245 Kawashima, R. 278, 290 Kay, K. 71 Keeley, B. L. 278 Keillor, G. 9 Kelso, S. 33 Kemper, T. L. 293 Kendon, R. 244 King, D. 233 King, Jr M. L. 346, 347, 350, 351 King, V. 308 Kinsbourne, M. 221 Kirksey, K. M. 220 Kirlik, A. 9 Kirsh, D. 155, 156, 159, 162 Kline, S. 304 Knoblauch, H. 306 Kohlberg, L. 85 Kojima, S. 278, 290 Kolb, B. 293 Konczak, J. 118 Kornbluh, R. D. 120 Korzybski, Count 51 Krams, M. 293 Krasnegor, N. A. 71 Krell, D. F. 339 Kreutel, J. 58 Kulkki, S. 262 Kusbit, G. 77 Kuniyoshi, Y. 119 Kutar, M. S. 13 L Ladrière, J. 340 Langdell, T. 276, 288 Latour, B. 30 Lave, J. 229
360 Name index
Lazenby, A. I. 276 Le Couteur, A. 268 Le Doux, J. E. 293 Lee, A. 273, 277, 285 Lee, W. C. 225 Leibniz, G. W. 43, 46, 57, 58 Lennenberg, E. H. 291 Leslie, A. M. 64, 138 Levy, D. M. 88 L’hermitte, J. 220 Lichtensteiger, L. 114 Lindsay, R. 2, 3, 12, 13, 14, 63, 64, 68, 69, 75, 76, 88, 90, 92, 95, 102, 175, 176, 267, 268, 290 Littrel, J. M. 218 Locke, J. 44, 46 Lockyer, I. 276 Lok, W. Y. 69, 76, 96 Loomes, M. J. 13, 136 Loveland, K. A. 139, 140 Lowen, C. B. 149 Lueg, C. 12, 225, 226, 227, 231, 234, 236, 237, 304 Luff, P. 12, 13, 14, 303, 305, 306, 307, 311, 314, 327 Lundholm, J. K. 218 Lungarella, M. 119, 122 Luria, A. R. 274 Lushene, R. E. 190 M McAlonan, G. 290 McCarthy, J. 345 McClelland, J. L. 28, 29, 34 McCorduck, P. 346 McDonald, D. W. 156 McDonald, K. 141 McDougal, W. 57 McEvoy, R. E. 139, 140 McGeer, T. 115 McHugh, M. 25 McLeod, P. 71 MacConchie, R. 278 Maccoby, E. E. 214 Mack, A. 71, 73
Macko, K. A. 71 Maglio, P. 166 Mandik, P. 165 Marcel, A. J. 99 Marchena, E. 267 Marcus, M. D. 217, 218 Marino, L. 141 Maris, M. 113 Marsh, J. 1, 2, 8, 11, 13, 55, 69, 76, 100, 226 Marsh, S. 229 Marshall, J. C. 294 Martin, M. 304 Martin, S. 263 Matelli, L. 293 Matheson, C. 58 Mattson, M. 77 Mayes, L. C. 278 Mead, G. H. 44 Meehl, P. E. 75 Meenan, S. 12, 64, 100, 175, 176, 268, 290 Meesters, C. 273 Meltzoff, A. N. 140, 275 Mendelson, M. 190, 191 Merckelbach, H. 273 Metcalfe, J. 58 Metta, G. 118 Mey, I. 2, 3, 7 Mey, J. L. 1, 2, 4, 5, 11, 12, 15, 37, 100, 228, 241, 244, 262, 303, 305 Mezey, R. 57 Mickley, D. 218 Middleton, D. 306 Miller, L. C. 128, 133, 138 Minsky, M. 345 Mishkin, M. 71 Mishori, D. 58 Mitchell, B. W. 218 Mitchell, J. 217, 218 Mithen, S. J. 12 Mock, J. 190, 191 Monahan, L. 276, 285 Moore, M. K. 140, 267 Moran, T. 3
Name index 361
More, T. 346 Morris, J. S. 294 Morselli, E. 218 Muris, P. 273 Murphy, D. G. 290 Myers, P. 182 N Nadel, J. 140 Naes, A. 50 Nakamura, A. 278, 290 Nakamura, K, 278, 290 Navarro, J. I. 267 Nealon, J. L. 69, 76 Neaves, P. 227 Ndumu, D. 234 Nehaniv, C. L. 13, 136, 146, 148, 149 Nelson, K. 133, 135, 137 Neumann, H. 114 Newell, A. 3, 69, 345, 347 Newton, N. 58 Nicolson, H. 276, 285 Nielsen, P. Q. J. 114 Nijholt, A. 289 Nivre, J. 244 Norman, D. A. 4, 5, 12, 29, 30, 32, 33, 68, 71, 83, 98, 154, 262 Norvig, H. A. 348, 351 Nöe, A. 96 Nwana, H. 234 O Oatley, K. 84 O’Brien, J. 304, 308, 322 O’Connor, N. 277 O’Driscoll, G. 273 Ogden, B. 229, 278, 289 Olafson, F. 337, 341 Olmsted, M. P. 189, 190 O’Louglin, C. 294 O’Regan, J. K. 96 Ouston, J. 277, 285 Ozonoff, S. 276
P Pacey, A. 55 Palfai, T. 274 Palferman, S. 268 Palfreyman, K. 304 Parakayastha, A. 225 Parisi, D. 71, 118 Parks, R. 346 Partridge, D. 69 Paulesu, E. 293 Pawlowski, B. 149 Peeters, B. 58 Pellegrino, G. di 71 Penfield, W. 75 Pennington, B. F. 139, 271, 276, 293 Pepperberg, I. M. 141 Perani, D. 293 Perline, R. 120 Perlis, A. 5 Perrault, C. P. 88 Perrett, D. I. 293, 294 Petterson, A. C. 218 Petre, M. 13 Peze, A. 140 Pfeifer, R. 12, 109, 112, 113, 119, 122, 305 Phillips, K. A. 190, 219 Phillips, M. 290 Picard, R. W. 334 Pinker, S. 31 Piven, J. 294 Plowman, L. 304, 306 Plunkett, K. 71, 118 Polanyi, M. 246, 248 Polivy, J. 189, 190 Polyshyn, Z. 228 Prem, E. 336 Premack, A. J. 64 Premack, D. 64, 270, 292 Prevost, S. 56, 148 Prior, M. R. 292 Prusak, L. 233 Pycock, J. 304 Pylkkö, P. 341
362 Name index
Q Quick, T. 229 Quinlan, D. M. 218 Quortz, S. 31 R Rae, J. 278, 289 Ramachandran, V. S. 99, 290, 293 Ramage, M. 304, 306 Randall, D. 308 Rassmussen, J. 4. Ratey, J. J. 273 Read, S. J. 128, 133, 138 Reber, A. 71 Rechenberg, I. 114 Reder, L. 77 Reed, E. S. 154, 162 Reidy, M. 58 Reiner, M. 252 Reisberg, D. 28 Richardson, R. 56 Richer, J. M. 278 Rimland, R. 269, 272 Ring, H. A. 274, 294 Rivet, C. 140 Rizzolatti, G. 119, 141, 292 Roast, C. 13 Roberts, L. 75 Robertson, D. M. 290 Robertson, T. 230, 231, 232 Robbins, T. W. 270, 274, 292 Rochester, N. 345 Rock, I. 71, 73 Rodden, T. 304, 308, 314, 322 Roe, C. 13 Rogers, S. J. 139, 271, 276, 293 Rogers, Y. 32, 304, 306 Roll, S. 219 Rolls, E. T. 71 Roloff, P. 218 Roscoe, A. W. 310 Rosenbrock, H. H. 248 Roth, M. 58 Rouncefield, M. 304, 308, 314, 322 Rowe, A. 290
Rowland, D. 294 Rubert, E. 141 Ruiz, G. 267 Rumelhart, D. E. 28, 29, 34 Rumsey, J. M. 270, 292 Russell, J. 270, 292 Russell, S. J. 348, 351 Rutter, M. 268 Ryle, G. 51, 347 S Saari, T. 262 Sacks, H. 307 Salber, D. 227 Salomon, G. 11 Sandini, G. 114, 118 Salter, T. 267 Samuel, A. 345 Samuels, M. C. 274 Savage-Rumbaugh, E. S. 141 Scaife, M. 32 Schank, R. C. 137, 351 Schatzki, T. 338 Scheflen, A. E. 244, 245 Schegloff, E. A. 307 Scheier, C. 109, 112, 113, 119 Schilder, P. 220 Schlosberg, H. 76, 77 Schneider, W. 71, 83 Schopler, E. 269 Sears, L. 294 Segall, M. H. 2, 95 Sejnowski, T. 31 Selfridge, O. 345 Sellars, W. 43 Sells, S. B. 48, 76 Sethi, R. 263 Sevcik, R. A. 141 Seyfarth, R. M. 143 Shahinpoor, M. 120 Shallice, T. 68, 71, 83, 98, 274 Shanks, D. R. 72 Shannon, C. 345 Shapiro, D. 308 Shaprio, L. R. 218
Name index 363
Sharrock, W. 306, 314 Sheir, M. F. 84 Shiffrin, R. M. 71, 83 Shiffrin, D. 245 Shimamura, A. P. 58 Shimazu, A. 245 Shimojima, A. 241, 244, 245 Shmueli-Goetz, Y. 139 Silberman, S. 156 Simon, H. A. 69, 345, 347 Simonoff, E. 268 Simpson, J. 120 Sinderman, C. J. 133 Singer, M. G. 94 Slade, P. D. 179, 180 Slagter, R. 289 Slovik, P. 90 Smith, A. 47, 57 Smith, J. 120 Smith, L. 33, 118, 122 Smolensky, P. 29, 34, 72 Solomonoff, R. 345 Sommerville, I. 304, 309, 322 Spielberger, C. D. 190 Squire, L. R. 72 Spearing, M. 217, 218 Sperber, D. 63, 64, 65, 66, 67, 68 Spiltzer, R. L. 217, 218 Srimani, P. 225 St. John, M. F. 72 Steerneman, P. 273 Stirling, J. 276, 285 Stojanov, G. 342 Stojanowsk, K. 342 Stone, M. 255 Stopka, P. 156 Stotz, K. 31 Strauss, M. S. 71 Stribling, P. 278, 289 Stunkard, A. 217, 218 Suchman, L. 229, 306 Suddendorf, T. 293 Sugiura, M. 278, 290 Sullivan, J. 56, 148 Sullivan, P. F. 217
Susi, T. 158 T Tantam, D. 276, 285 Teuber, H-L. 216 Thagard, P. 294 Thelen, E. 33, 118, 122 Theraulaz, G. 33 Thomson, J. K. 179, 219 Tilghman, B. R. 242 Tinbergen, N. 79 To, T. 241, 243, 252, 254, 255, 262, 263 Tobin-Richards, M. 218 Tockerman, Yale R. 219 Todd, P. 169 Toepfer, C. 114 Tse, N. 69, 76 Tsur, R. 53 Tunali, B. 139, 140 Turing, A. M. 5, 43, 345 Turner, M. 128 Tversky, A. 90 U Ungerleider, L. G. 71 Urmson, J. O. 87 V Vagg, P. 190 Van Amelsvoort, T. 290 Van der Veer, G. C. 289 Van Leeuwen, C. 27 Vehrencamp, S. L. 156 Vera, H. A. 69 Verbeurgt, K. 294 Verstijnen, I. 27 Vertegaal, R. 289 Viller, S. 304, 309, 314, 322 Vinkhuyzen, E. 109, 307 Volkmar, F. R. 278 Vygotsky, L. S. 30, 44 W Waal, F. de 142 Wadden, T. 217, 218
364 Name index
Wallen, J. D. 277 Walter, A. 273 Ward, C. H. 190, 191 Warrington, E. K. 186, 187, 188, 189 Watson, J. 44 Watson, J. B. 57 Webb, N. 58 Weekes, S. J. 276 Weir, A. S. 155 Weiser, M. 225, 237 Weiskrantz, L. 84 Weizenbaum, J. 58 Wende, M. 114 Werry, I. 267, 278, 289 Wenger, E. 229 Whalen, J. 307 Wheeler, M. 336 Wheelwright, S. 274, 294 Whishaw, I. Q. 293 Whiten, A. 130, 143, 293 Whorf, B. L. 43, 91 Wickens, C. 4 Williams, J. H. G. 293 Williams, S. C. R. 274, 290, 294 Wilson, D. 63, 64, 65, 66, 67, 68 Wing, I. 269
Wing, R. 217, 218 Winograd, T. 57, 109, 232, 255, 260, 262, 336 Wise, S. P. 71 Wittgenstein, L. 10 Wong, A. 13 Woodruff, G. 270, 292 Woodworth, R. S. 48, 76, 77 Wooffitt, R. 327 Wooley, O. W. 219 Y Yanovski, S. 217, 218 Young, R. M. 13 Youngh, A. W. 294 Yuzda, E. 268 Z Zadeh, L. A. 52 Zahavi, A. 156, 157 Zamir, T. 53, 59 Zelano, P. D. 274 Zhang, X. H. 69, 76 Ziemke, T. 158 Zue, V. 56
Subject index
A Action planning (systems) 63, 72–75, 69, 81–82, 85, 90, 98 parallel-coordinated 257 coordinated autonomy 257 collaborative activity 308ff Affordance 5, 17, 28, 154, 163ff, 167 Affordance View (see also methodology of design) 17 Anthropology 128 Artificial Intelligence (AI) (see also problem solving; robotics) 3, 63, 69–70, 73, 76, 81, 89, 92, 109, 122, 145, 146, 154ff, 159, 167, 225–227, 233, 334, 336–337, 345–348 Frame Problem 227 The Hard Problem 335, 338ff, 348–349 Autism 267ff autistic disorder/dysfunction 269, 272, 275, 293 autistic syndrome 268–269 Early Infantile Autism 268 emotional dysfunction in 270–273, 276, 284, 283–294 experiments for 279–289 face recognition in 276, 288–289, 294 gaze avoidance in 277–279, 289, 291 gaze aversion/avoidance 277–279, 288 gaze information 278, 290 neuropsychology of 274–275 theories of 268ff
Mindblindedness Theory 270–271, 274, 292 Executive Dysfunction Theory 270–271, 292 Weak Central Coherence Theory 270–271, 294 Imitation Deficit/Mirror Neuron Theory 270–271, 293–294 Stress Overload Theory 271–274, 289 autistic deficits 268–270, 272–274, 290 social deficit in 290, 292, 294 technological intervention in 291 B Body body dysmorphism 217 Body Dysmorphic Disorder (BDD) (see also eating disorders) 180–183, 213, 218–219, 220 Body Dysmorphic Disorder Questionnaire 190, 198 body image 175, 178ff, 204, 220 body image representation 180ff Body Image Generator (BIG) (see also natural technology) 181–183 Body Moves 241, 244ff Parallel Coordinated Move (PCM) 244, 248, 253–254, 256, 261 Sequential Body Move (SBM) 251 body schema 178ff, 220 Brain biological brain 26–29, 32–34
366 Subject index
C Carpentered World Hypothesis 2, 95 Cognition 2, 4, 10–11, 14, 29, 37, 45–46, 63ff, 72ff, 116ff, 156–157, 162, 166, 252, 244, 269, 272, 292, 334, 336 aims of 38 cognitive architecture 30–31, 72–78, 82–85 symbol-connection hybridism 74–78 connectionist systems 72–75, 98–99 symbol-processing systems (see also problem solving) 69–70 cognitive artifacts/tools (see also tool) 11 Cognitive Dimensions (see also methodology) 13 cognitive deficits 270, 278, 290 cognitive dysfunction 270 cognitive environments (see also wideware and Fabricated World Hypothesis) 2–3, 9, 18, 34, 44 cognitive heuristics 64–67, 90 cognitive goals (see also goals) 78, 84 cognitive modularity 177, 270 cognitive processing Initial Control System (ICS) 82ff Goal Management System (GMS) (see also goals) 83, 85, 88, 97 cognitive representation of body 209–210, 221 cognitive resource 44 cognitive scaffold 10, 31–33 cognitive schemata 4 cognitive subdomains 72ff cognitive systems 6, 29, 31, 270 cognition-technology interface 4, 7 distributed 155ff, 303 evolution of 29 Physical Symbol System Hypothesis 347 situated cognition 30 social cognition 267ff, 294 Technological Cognition (TC) 7–8 Cognitive Ergonomics 3
Cognitive Engineering 3 Cognitive Technology (CT) (see also humane technology) 1–2, 4, 6–8, 10–11, 18, 25, 37–40, 45, 63, 97–102, 109, 124, 145–146, 148, 175–176, 262, 241, 267, 291, 303, 305, 307, 326–327 CT agenda 11ff CT of goal management 97–102 Cognitive technologies 25–26, 28, 30, 37–38, 40, 53 NL-based technology 38 natural (cognitive) technology 11, 64, 175–177, 210, 267–268, 290–291 typology strong and weak 40 integral and partial 41 complete and incomplete 41 constitutive and non-constitutive 42 external and internal 42 Computational engines 28 Consciousness 334, 338ff Cyborgs 5, 25–26, 333 bio-technological symbionts 25–26 E Eating disorders anorexia nervosa 189, 217 binge eating 218 Body Dysmorphic Disorder (BDD) (see also body) 190, 218–220 Embodiment (see also robotics; Body Moves) 110, 112, 241, 244ff Emotion (see also autism) 334, 349 Engineering Psychology 3 Epistemic Technology (see also technology) 55 Epistemic structure 156–158, Ethics 87–95, 100–102 as cognitive heuristics 90 cognitive functions of 93 ethical engineering 100–102 humane research program 347
Subject index 367
F Fabricated World Hypothesis 2–3, 9, 18, 92, 95–97 G Goals (see also motivation) 69–70, 74, 78 cognitive goals 78, 84 goal enjoinment 91 Goal management 3, 9 Goal Management System (GMS) 68, 83, 85, 88, 97–102 metagoals 86 origins & functions of 78 terminal goals 79 I Intelligence Augmentation (IA) 334 Interface communicative interface 54 humane interface 14, 40 user-friendly 3–4 human-computer interface 4, 6 cognition-technology interface 7 K Knowledge 244, 248–249 acquisition of (see also learning) 246 explicit 248–249 parallel coordinated 257 repository of 153 representation of 153–154, action-oriented representations 166 ontology of 163–165 meta-tag pluralism 168 RFID tags 160 tacit 245–246, 248–249 transformation of 251 L Language as environment 44–49 as resource 44, 49–52, 54 as cognitive tool 52 Natural Language (NL; see also natural technology) 38ff
Narrative structure (see also narratives) 45, 133–134 utterance comprehension (see also pragmatics) 65–67, 88–89, 93 relationship to mind 43 Physical Makeup Language (PML) 164 Learning 71–72, 74, 246 connectionist learning 73, 99 implicit learning 72 learning systems 71, 72, 74 pre-symbolic 71 human development of infants 119 M Memory 3, 28, 72, 85 Methodology of design 11, 13–14, Affordance View (see also affordance) 13 agent design 162 active design 159, 161 active redesign 159 Cognitive Dimensions (see also cognition) 13 Communication Sequential Processes modeling language (CSP) 305, 309–314, 316–319, 321–322, 324 Computer-Supported Collaborative Work (CSCW) 304–306, 314, 326 Goal-Based Relevance Analysis (see also relevance) 13 Empirical Modeling (see also tool) 13 ethnomethodolgy in design 303–309 Human Factors 3 Human Computer Interface Design (HCI) 227, 236, 262, 303, 314, 326 methodological CT questions 11 modeling collaborative action/work practice 308ff passive design 159–160 Reflective Questioning Perspective 13 Synthetic methodology 110–111, 119–122
368 Subject index
tool design 10 User-Centered System Design (UCSD) 5 Workplace Studies 303, 314, 326 Mind (see also thought processes) 278 bio-technological mind/hybridisation 25–36 externalizing mind 7–9, 15 technologized mind (see also Technological Cognitiion) 7, 8 philosophy of 9, 43ff mental prostheses (see also prosthetics) 11 mind/tool co-adaptation 32 mindware 9, 267, 291 thought processes 7–9, 15, 28, 43 externalizing process 7–9, 15 externalization-internalizationexternalization loop 7 internalization processes 7–8 Theory of Mind (TOM) 138, 290, 292 off-loading 33 wideware 9, 18, 96 Motivation (see also goals) 80–81, 85 N Narratives (see also language) 127ff and Autism (see also Autism) 138 definition of 134–138 (transactional) format of 128, 133, 135, 139 Homo Narratus 147 in animal behavior 141, 146 Narrative Intelligence Hypothesis (NIH) 128–129, 132–134, 145–147 origins of 127–129, 133 robotic and computational models 145 social context of & meaning 134ff O Ontologies 163ff, 168ff affordance/property models of 165 formal
P Perception 3, 28, 71–72, 85 Philosophy 43ff, 334ff, 340, 347 Planning Systems (see also action) 63, 69, 72–75, 81–82, 85, 90, 98 Pragmatics 4, 9, 11, 39, 63–68, 88–89, 93, 291 context 226ff awareness 225–229, 234 indicators 231 modeling 233 Composite Dialogue Acts 245 dialectic interaction 5, 11 the inferential comprehension module 64, 65–67 metacommunication 245 metapragmatics 244, 251 pragmatic view of technology and cognition 9, 11 seeing (unmediated non-verbal understanding) 242 situatedness (see also situated cognition) 30, 229–230, 232 tool use (see also tool) 10 utterance comprehension 65–67, 88–89, 93 Presence 249 Primatology 128, 130 Problem solving (see also AI) 69–70 search space paradigm 69 states space 69 symbol-processing systems 69–70, 74 problem space 69, 290 Prosthetics 11, 14 mental prostheses (see also mind) 11 Psychometric tests Visual Object/Space Perception Battery (VOSP) 185, 198, 211–212 Eysenck Personality Inventory (EPI) 189 Eating Disorders Inventory (EDI) 189, 195, 198, 211–212 State-Trait Anxiety Inventory (STAI-X) 190 Beck Depression Inventory (BDI) 190
Subject index 369
R Relevance definition of 63–68, 69, 73 Goal-Based Relevance Analysis 13 ontogenesis of 68 relevance discontinuity 12, 98, 290 theory of 63–68, 69–78 Robotics (see also AI) 111–124, 234–235, 336 developmental robotics 116 S Schizophrenial Problem 13, 333–334, 340 Semantic Web 153 SMARTBoard 254ff Social evolution 127–128, 132–133, 141 intelligence 132 interaction 305, 309, 314–318 primate societies 131 policy 32 Social Brain Hypothesis (see also brain) 121 implications of 147–148 Symbol grounding 250, 268 T Technology 3, 12, 38 Collaborative 304 Epistemic Technology 55 high technology 346 humane technology 12, 14, 148
Information Technology (IT) 3–4, 38 Multi-media development 3 natural technology 12 NL-based technology 38 Technological intervention in the workplace 305–306, 318–319 Tool 4–6, 12 transparent tool (see also ubiquitous computing) 5, 12, 225 cognitive tool (see also cognition) 11, 53, 111 robots as cognitive tools (see also robotics) 12, 116 modeling languages (see also methodology of design) 12 Empirical Modeling (see also methodology of design) 13 Natural Language (see also natural technology) 12 tool overbinding 12 tool use (see also pragmatics) 10 Turing Machine 5 U Ubiquitous computing (see also transparent tool) 225 W World Wide Web 153–154 action-enabling space 154 world mediating system 154 stable environments 162