COGNITIVE TECHNOLOGY In Search of a Humane Interface
COGNITIVE TECHNOLOGY In Search of a Humane Interface
Edited by Barbara GORAYSKA Department of Computer Science City University of Hong Kong Kowloon, Hong Kong
Jacob L. MEY Department of Linguistics Odense University Odense, Denmark and
Northwestern University Evanston, IL, USA
1996 ELSEVIER Amsterdam
- Lausanne
- New York-
Oxford - Shannon
- Tokyo
Formal interest in Cognitive Technology at the City University ofHong Kong began its official life as the result of a growing interest amongst a group of colleagues, mostly in the Department of Computer Science, in exploring the ways in which developments in information technology carry implications for human cognition and, inversely, how human cognitive abilities influence the way we act technologically. This interest led to a proposal for the establishment of a Cognitive Technology Research Group which would draw in colleagues from a variety of departments within the University as well as from other institutions in Hong Kong. One of the early events organised to launch the Research Group was a series of Cognitive Technology lectures in 1993 (some of these are now available in print). For this purpose, invitations were extended to individuals in universities within and outside Hong Kong who were known to have an interest in this area of research. At the same time, plans were laid to stage an international conference in August 1995, for which an international programme committee was created, with participants from Australia, Canada, Denmark, Germany, Hong Kong, Israel, Japan, the UK and the USA. Many of the chapters in this volume have been written by people affiliated with the Conference in various ways, either as participants or plenary speakers, or as members of the conference programme committee. In addition, there are chapters authored by other experts, who were specially invited to contribute by the editors of the volume. The number of individuals who are ready to promote the aims of Cognitive Technology research and development by contributing to this volume and attending the Conference, reflects a growing concern among the scientific community about what it means to be human in an increasingly technological world. The volume contains many innovative ideas, all of them exciting and a number of them controversial. The reader will find its perusal a stimulating and rewarding experience.
N. V. Balasubramanian Head, Department of Computer Science, City University of Hong Kong.
The Editors of the Volume feel the need to make a pleasurable acknowledgement of all the help and assistance they were allowed to receive during the preparation of this book. First of all, thanks go to the management and staff of the two institutions that were involved in hosting and caring for the editors during their various periods of collaboration: City University of Hong Kong, and Northwestern University, Evanston, Ill., USA. Special thanks are due to Dr. N.V. Balusubramanian, Head of the Department of Computer Science at City University, who not only showed his vivid interest in the CT project from the very beginning, but did everything in his power to get our effort off the ground, and continued to follow up with good advice and support, making things possible that otherwise not would have happened (such as the one editor's three-months' stay at City University). At the other end, Professor Roger Schank, Director of the Institute for the Learning Sciences, Northwestern University, provided the proper atmosphere for an effort of this kind, and saw to it that the crossocean contacts between the editors could be tended to without disruptions of practical sorts. The ILS working group 'Video-Media' graciously put up with the Evanston editor's frequent and prolonged absences from the project, while secretarial and other staff (in particular Ms. Teri Lehmann) were extremely helpful in facilitating the necessary contacts. At the Hong Kong end, the General Office of City University's Department of Computer Science (in particular Miss Giovanna W.C. Yau, Miss Anita O.L. Tam, Miss Tiong C.W. Chan, Miss Winnie M.Y. Cheung, Miss Ada S.M. Wong, Miss Amy Lo, and Miss Candy L.K. Tsui) were incredibly helpful in handling our mail, fax, xerox and computer problems, in dealing with the accounts, and in countless other 'user-friendly' ways. We also want to thank the many Research Assistants and Demonstrators who sweated over photographs and diagrams, getting them into the proper computer format prior to print-out as camera-ready copy; in addition, they provided invaluable help in scanning documents that had got stranded in the vagaries of the various word processing systems and their avatars (Word 4, 5, and 6, Word for Windows 2 and 6, WordPerfect, MacWrite and what not). Some of the people we want to thank specially are, at the Hong Kong end, Mr. Jims C.F. Yeung and Mr. Ted Lee; at the Evanston end, Ms. Inna Mostovoy. Among our colleagues, Kevin Cox deserves the highest praise for having taken over the formatting of the book according to the style sheet provided by the publishers - a daunting task for which neither editor was properly prepared or mentally equipped, and which neither of us ever is going to undertake again unless we get princely remunerated! Jonathon Marsh, Laurence Goldstein, Roger Lindsay, Kevin Cox, and Ho Mun Chan were always ready to help with advice and good ideas in their areas of expertise, while Brian Anderson, Stevan Harnad, and Tosiyasu Kunii added new dimensions to many of our thoughts often by simply telling us how to express them better. Finally, we wish to express our gratitude to all the authors in this volume for their generous and diversified contributions to the major theme of Cognitive Technology
which bring forth its many subtle facets and hidden avenues. And, on the penalty of innuendo, the Editors themselves want to grab this opportunity to thank each other for a splendid cooperation- in sweat and blood, and almost no tears. People have sometimes felt that our title 'Of Minds and Men' is less then appropriate, as it carries with it (as one contributor expressed it) the connotation of male sexism, and besides (as some others pointed out) it disregards the one half of humanity. We would like to ask our well-meaning critics to leave their Steinbeck behind and look back to Robert Bums, who is the original source of the quotation. Burns' words are not only not sexist, they are certainly anything but macho. In fact he pokes fun at men (and mice as well), by commenting on their various hare- (or mice-) brained notions. Here are his words (more or less in the Scottish original): "Of mice and men The cunning schemes So often gang aglay. "
Here you are. No sexism, just plain old Bums. Apologies for any inconvenience caused to mice and men. Hong Kong & Evanston, July 1995 Barbara Gorayska Jac o b L. Mey
INTRODUCTION Barbara Gorayska and Jacob L. Mey
Cognition 1 Barbara Gorayska and Jonathon Marsh
Epistemic Technology and Relevance Analysis: Rethinking Cognitive Technology
2 0 l e Fogh Kirkeby and Lone Malmborg Imaginization as an Approach to Interactive Multimedia
3 Frank Biocca Intelligence Augmentation: The Hsion Inside Hrtual Reafity
Modeling and Mental Tools 4 David A. Good Patience and Control: The Importance of Maintaining the Link Between Producers and Users
5 Hartmut Haberland
"And Ye Shall Be As Machines" - Or ShouM Machines Be As Us? On the Modeling of Matter and Mind
6 Ho Mun Chan
Levels of Explanation: Complexity and Ecology
Agents 7 Margaret A. Boden
Agents and Creativity
xii 8 Myron W. Krueger
Virtual (Reafity + Intelligence)
Communication 9 Roger O. Lindsay
Heuristic Ergonomics and the Socio-Cognitive Interface
10 Alex Kass, Robin Burke, and Will Fitzgerald
How to Support Learning from Interaction with Simulated Characters
11 Richard W. Janney
E-mail and lntimacy
12 Robert G. Eisenhart and David C. Littman
Communication Impedance: Touchstonefor Cognitive Technology
Education 13 Kevin Cox
Technology and the Structure of Tertiary Education Institutions
14 Orville L. Clubb and C. H. Lee
A Chinese Character Based Telecommunication Device for the Deaf
15 Laurence Goldstein
Teaching Syllogistic to the Blind
16 Che Kan Leong
Using Microcomputer Technology to Promote Students' "Higher-Order" Reading
Planning 17 Mark H. Burstein and Drew V. McDermott
Issues in the Development of Human-Computer Mixed-Initiative Planning
18 David Heath, Simon Kasif, and Steven Salzberg
Committees of Decision Trees
19 Roger C. Schank and Sandor Szego
A Learning Environment to Teach Planning Skills
Applied CognitiveScience 20 Tosiyasu L. Kunii
Cognitive Technology and Differential Topology." The Importance of Shape Features
21 Alec McHoul and Phil Roe
Hypertext and Reading Cognition
22 Hiroshi Tamura and Sooja Choi
Verbal and Non-Verbal Behaviours in Face to Face and TV Conferences
23 John A. A. Sillince
WouM Electronic Argumentation Improve Your Ability to Express Yourself?.
24 Tony Roberts
Shared Understanding of Facial Appearance - Who are the Experts?
25 Stevan Harnad
Interactive Cognition: Exploring the Potential of Electronic Quote~Commenting
Introduction OF M I N D S A N D M E N Barbara Gorayska City University ofHong Kong csgoray@cityu,
Jacob L. Mey Odense University, Denmark Northwestern University, USA
[email protected],
[email protected]
et mihi res non me rebus subiungere conor 'and I try to adapt the world to me, not me to the world' Horace, Epistulae I.i: 19
This Introduction will be in two parts. The first part is a general statement about Cognitive Technology, its aims, and how it goes about realizing them. In this (the present) part, only some specific links with the individual authors' contributions to our volume will be highlighted. The second part consists of a 'guided tour' through the volume, briefly characterizing each of its chapters and familiarising the reader with its contents. A certain amount of thematic structure will tentatively be uncovered, and connections between the individual chapters will be suggested. COGNITIVE TECHNOLOGY AS A DISTINCT AREA OF INVESTIGATION What happens when humans produce new technologies? This question can be considered under two perspectives, each having to do with the how and the why of such a production. It may be concretised as a desire to explore: a) how and why constructs that have their origins in human mental life are embodied in physical environments when people fabricate their habitat, even to the point of those constructs becoming that very habitat; and b) how and why these fabricated habitats affect, and feed back into, human mental life.
B. Gorayska and J.L. Mey
The present volume initiates such an exploring of the human mind via the technologies the mind produces. As instances, consider problem solving devices such as algorithms, or mind-organizing devices such as metaphors. These mental constructs, when externalised, find their expression in the form and functionality of physical tools, defined as structured parts of our physical world which, becoming space-organizing devices, help us shape and manipulate our physical environment. Obvious examples here are a hammer, or a computer. Using the tool, in turn, binds human epistemology and, within the constraints inherent in the functional and structural characteristics of the tools used, determines our cognitive processes of adaptation. For this reason, all explorations of the mind via its self-produced technologies will have to consider the 'situatedness' (as Fogh Kirkeby & Malmborg call it in their contribution) of such constraints. This situatedness closely links our explorations to concerns about the environment. The process of externalising the human mind we will name Cognitive Technology, CT. Cognitive technological processes always take place in a particular environment. No technologies come out of the blue; neither are they created ex nihilo, as Alec McHoul aptly has observed in a recent contribution (1995). As an instance, he refers to the well-known example of the printing press: its origins are not just located in a general trend, some e-volutionary development of the human mind, but must be found in a particular de-volution of the human mind into some existing, contemporary technologies, such as metal-cratting and wine-pressing. A technology is always grafted onto another technology, says McHoul (1995: 14) - but the development of a particular technology is never a necessary, deterministic one. The printing press happened when pressing techniques and iron-mongering had reached a certain stage of perfection, so that in retrospect we can see that it happened, and could happen, at the time, and how it happened; but not why it had to happen, and why right there and then, in 15th century Mainz. Similarly, to understand CT, we need to understand the environment in which a particular cognitive technological development came about and was or is being developed; similarly, there is no causality involved here. Still, by itself, a mere understanding is not enough: the mind has to be consulted not just as an abstract faculty, but as a human characteristic that develops technology, and is developed by it. With respect to the environment, this includes both the physical and the mental world: we must investigate the environment both as a necessary precondition for CT and as conditioned by CT, taking the human mind into consideration under the perspective of this mutual relatedness. For this reason CT is allied with environmentalism, which brings us to another point. Environmentalism expresses our need to understand how we manipulate our physical environment by means of the tools we have created. However, it leaves out both the generative processes of the human mind by means of which the tools come into being, and the feedback effects such tools produce in the human mind, after these tools have become a part of the physical environment. Following a distinction proposed by Gorayska and Marsh (in their contribution to this volume), we can say that the processes of cognitive adaptation by which the human mind must deal with already externalised mental constructs, i.e., the physical tools at our disposal, constitute a domain of investigation which is different from, although complementary to and closely aligned with, CT. This domain, which Gorayska and Marsh call Technological Cognition, TC, focusses attention on how human cognitive
Of Minds and Men
environments are generated; these environments comprise sets of cognitive processes that are essential to human conscious thought, which they inform as well as constrain. At the same time, the authors say, the TC activities within this cognitive environment provide input to the externalisation processes of CT, and thus are complementary to the latter. Thus, apart from the need, expressed in current environmental concerns, to protect our physical environment from the unwanted or uncontrolled impact of technology, there also exists a need to understand how our cognitive environments can be, and are, manipulated by that very same technology (as Jonathon Marsh has observed; pers. comm.). Investigations within CT and TC are intended to satisfy that need. As Gorayska and Marsh further state in their contribution to this volume, the movement between the products of CT and the processes of TC is recursive. Giving due consideration to this recursive movement, they point out, is a necessary, and occasionally a sufficient, condition for the design and construction of what they call an Epistemic Technology (ET). ET tools are tools whose interfaces serve to amplify the processing capabilities of both humans and machines to a point which in a normal course of events is out of reach for either of them functioning alone. From this, it becomes obvious that ET tools can only come into being once we realise that CT products and TC processes are neither exclusively physical nor exclusively mental, but integrated in a spiraling, 'Heraclitean' relationship. 1 The CT tool constitutes the embodiment of a task (as do all tools), but this particular embodiment is seen as cognitively appercepted and organized in a piece of technology. Thus, while a problem solving algorithm is a mental tool, it only becomes a CT tool when it is realized in a material shape, such as when it is embodied in a computer program and runs on a real machine, or when it takes shape in a mechanical device, like one of those old, now mostly defunct National Cash registers or a mechanical calculator. In an epistemic technology, understood in terms of the C T / T C relationship, the physical and the mental are two sides of one and the same process: the
1The reference is to the well-known tenet formulated by the Greek philosopher Heraclitus, according to which 'bne cannot immerse oneself into the same river twice" (Diels 1954:Fragm. 11). Usually, this saying is interpreted in one direction only: the river changes for every person immersing him- or herself in it. This intepretation has its roots in the formulation given above, which, however, is not Heraclitus' own, but the one found in Plato and Aristotle, where they refer to (and misquote) the Heraclitean saying (Plato, Cratylus 402A; Aristotle, Metaphysica 111:5, Bekker 1010a30). The converse is namely just as important, but is often overlooked: nobody is ever the same after having taking a dip in the river. Hence, the human and the aquatic bodies are in a constant dialectic relation a relationship that is emblematic of the general relationship between humans and their creations (both the already given, and the ones emerging). Such an interpretation is, moreover, in harmony with Heraclitus' original text (admittedly obscure, but what else is new?), which does not say (with Plato, Aristotle and the rest) that 'one would have a hard time trying to get into the same river twice', but that 'different waters flowingly touch those who enter identical rivers' (potamo~si to~sin auto,sin embainousin h~tera kai h~tera h(tdata epirre~ ). Owing to the special construction of this dictum, one can also read it as meaning: 'different waters flowingly touch the same persons entering [identical] rivers', and it is this interpretation which jibes best with Heraclitus' notion of the 'soul as a humid exhalation', to be likened unto a flow of water, as well as with Arius Didymus' (to whom we owe this quote from Heraclitus) accompanying commentary. Dipping into the same river, one and the same person thus will perceive a (psychic) difference, perhaps even each time receive a different soul: he is touched by a 'humid exhalation' of an ever-changing nature, or: the river changes us more than we change the river.(For the Greek originals, see Diels & Kranz 1954; Kirk 1954:367) -
B. Gorayska and J.L. Mey
mental extemalises itself in the CT tool, but then the tool reflects back to set up a niche of its own within the cognitive environment: a TC space with its associated techne. The salient point in these reflections is the truly innovative character of ET tools inasmuch as they embody the CT/TC relationship. These innovative properties reside in the fact that the structures we see emerge are not merely ascribed to, and confined by, the worlds in which they arise, but are developed in response to a movement which is not only recursive or, as we have said earlier, 'spiraling', but properly dialectic. This dialectic movement does not only go 'from the inside out', as in the classical definition of the tool, or in modern approaches to Human Computer Interaction (HCI), but, more importantly, it goes 'from the outside in'. That is to say, the structure of our techne, of our mental constructs, originates in the impact that tool use has on our cognitive world, in a manner which parallels the way the physical tool is said to originate in the clash of the mind with a physical obstruction. Cognitive Technology, by turning the leaf, so to speak, and interlinking with Technological Cognition, at the same time turns itself from a branch of technology into a techne of cognition. In their unique ways, all contributors to the present volume seek to find an answer to the crucial question: 'What technologies can best tune human minds to other human minds and to the environment in which these minds must operate?' Such technologies will be characterised by what Tosiyasu Kunii (pers. comm.) has termed 'humane interfaces'. But if we are ever to discover what it really means to be humane in a technological world (a question which is at the heart of the proposed investigation), then there are other pertinent questions which must be asked. These questions emphasise the human aspect of how minds are externalised, and they include: 9 Why are human minds externalised, i.e., what purpose does the process of externalisation serve? 9 What can we learn about the human mind by studying how it externalises itself?. 9 How does the use of externalised mental constructs (the objects we call 'tools') change people fundamentally? 9 To what extent does human interaction with technology serve as an amplification of human cognition, and to what extent does it lead to an atrophy of the human mind? Why are human minds externalised?
Looking around us, we see the externalising of minds in full progress everywhere. People jot things down in notebooks, they write memoranda, articles, letters, books, they note down music, they cry for help or sympathy, they vent their anger, they paint their fantasies and imaginations on canvas, walls, and their own bodies, they erect statues, monuments, buildings, and so on and so forth. Externalising the mind seems to be one of the human race's most favorite pastimes; and in our externalisations, the seeds of language are sown. Human language, in whatever textual forms it happens to come (including, perhaps, art and music), is a spontaneous and ingenious product of this process. Externalised language is one of the first and best examples of ET: a human-made epistemic tool for mediating the dialectics between the CT product and the TC process (on this, see Good's contribution to this volume). ET is also a perfect externalised expression of, and a reflection upon, the characteristics of the human mind
Of Minds and Men
itself (Gorayska and Lindsay, 1989, 1993). Tremendous efforts have been expended in the cognitive sciences to date to understand how the human mind, mediated by language, maps onto, and reflects, the properties of the physical environment; in other words, how true propositions about the world come into being. By contrast, what has rarely been in focus (although, following Whorf (1969), it ought to have been, and constantly so (cf Gorayska, 1993)), is the pivotal role of language as an instrumental tool, which not only reflects, but also serves to shape and control, from the outside in, an organisation of the human mental world, grounded in motivation and sensorimotor action. This role goes well beyond a mere recovery of communicative goals or speech acts and enters the realm of pragmatic acts, as proposed by Mey (1993).2 What purpose does this process of externalisation serve? If we consider some of the items listed in the preceding paragraphs more closely, we may obtain a first clue as to the 'why' of these externalisings. The list contains, e.g., such items as 'monuments' and 'memoranda'. The latter term goes directly back to the Latin word for 'remember' (of. 'memory', 'memento', 'memorable', and other derivatives of the same root). The former term is even more instructive. It has to do with a root meaning 'remind' (as in 'monitor', 'admonish' and so on). Note how the word 'mind' itself is related both to 'memory' and to 'monument'; the latter being a 'reminder' in some externalised form, such as stone, bronze, concrete. Hence, the immediately plausible answer to the question 'Why?' is that we externalise our minds to make them more durable, to prevent them from going under in the general chaos that ensues when we leave our bodies (and our minds!) at death. Some people have been good at externalising in this fashion, and moreover they must have known that they were successful: how else would Horace been able to say that he had 'erected a monument more durable than bronze, one that neither biting rains nor violent hurricanes' would be able to destroy? (Odes III.i:l-5) This monument was nothing other than his externalised mind, his poetry. Apart from our desire for immortality, we externalise minds to share them with others. There is not a day in our lives when we don't benefit, one way or another, from the externalisations of our forebears' minds. Conversely, we ourselves do everything we can to ensure that our own minds will not only live on forever, but that others, too, will benefit from them. This latter desire to share and to influence may even extend to the point of the ridiculous, as when we send out our own images, our own externalised selves, into a universe whose possible inhabitants in all probability never will find, or, even if they do, understand our externalisations (as in the case of the U.S. space probe 'Explorer', carrying those notorious copper tablets depicting our 'civilisation' and its progress, on board). In the externalisation process, two sets of motivating tendencies operate in tandem: individuation and detachment on the one hand, belonging and uniformity on the other. The first have to do with expressing oneself in contradistinction to the mass of humanity, to erect a singular monument for the autonomous self, the second concern the desire for recognition by others, the wish to make sure that my externalisations are accepted as valuable and valid by my fellow-humans. As such, the latter desire borders 2 Unlike speech acts, pragmatic acts are not limited to utterances. They include a variety of action types, across different modalities of expression and processing, that are jointly performed by an individual in order to communicate within the constraints of his or her ecological environment.
B. Gorayska and J.L. Mey
on the urge to control my environment (including my fellow-humans), such that I may be sure that my externalisations will be acceptable to, and accepted ('internalised', if you wish) by the others. In the framework of our present discussion, one could say that the former tendency belongs in the domain of CT (an 'externalising' process), while the latter tendency pertains to a process of'internalising', included in TC. Harmonious mediation between these two tendencies is a hallmark of holism and a source for cooperative creation in all living organisms (Koestler, 1964). As part of a living organism, the human mind exhibits similar characteristics. A conspicuous failure to consider and satisfy either of these tendencies incurs the risk of fatal consequences for the organisms involved: limited exernalisation results in frustration, forced internalisation will lead to mind-control and all the horrors that Koestler saw developing in the totalitarian regimes he criticised. Through the complementary processes of CT and TC, the externalising/internalising human mind, being a vulnerable organism in an only partially controlled world, is equally confronted with the same potential and exposed to the same abuse. E x t e r n a l i s i n g - internalising- externalising ... an eternal loop When we externalise our minds, we create an object. This object, in its turn, is not just an object in space: it is something that we consider, relate to, love or hate, in short, work with in our minds, hence internalise. In very simple cases, the object is 'just' an object for vision (as Descartes seemed to think); more sophisticated 'considerations' include the mirroring that takes place when the child discovers its own image as separate from itself (as Janney points out in his chapter, where he treats of a particular mind-object, viz. email messages; see also Krueger, this volume), or when we evaluate a mind product as to its 'adequacy', and compare it to the original representation that we had 'in mind'. Conversely, removing this check can have some strange and unexpected effects, as in the cases where an artist loses the use of one of his senses: the near-blind Monet, the deaf Beethoven, who continued to externalise their minds, but with unmistakably different (but not necessarily artistically inferior) outcomes. The re-internalised object is different from the one that started the externalising process: it retains a tie to its origin, but has also become strangely independent. It now has a life of its own, and at a certain point of time, it is its turn to become externalised. This process continues until we think the result is adequate, and in the meantime, every new version interacts dialectically with the previous one. It supersedes it, but cannot replace it quite. 3 True, humans and the artifacts they produce are not cut of the same cloth, and in a sense, these 'twain shall never meet' -yet, in their disparities and dissimilarities reside the seeds of growth. Cognitive dissonance is the basis for creativity. It leads to progress. It arouses motivation. It is also the source for goal formation. It serves to put
3 This process we all know in its crudest form as the cycle of producing an article or report. Which is why one has to be very careful in taking the process of writing on the computer as being the 'same' as writing on a piece of paper: the externalisations in the latter case are not easily or accidentally wiped out, whereas in the computer case we often destroy entire files at the touch of a button, whether we want it or not, and certainly cannot afford to have our machines clogged up with innumerable earlier versions of our articles and other mental products. (But figure how difficult and frustrating the life of a literary critic must be in future times, when all the world's poets have gone on line and consequently no longer keep their scratch versions around...)
Of Minds and Men
in motion mental processes of adaptation. Once the novel problems have been solved, the techniques used in their solution may be externalised into the physical environment, so as to open up a cognitive space in our mind for further enhancement of our creative acts, very much like what happened when we de-linked the computer tool from the limited-purpose physical artifact it had been defined as earlier. On the basis of these newly formed physical environments, new dissonances arise that lead to the perception of new problems, and so on. Here, the perspectives visualised in the works of M. E. Escher become of relevance; one may also think of the paradoxes outlined by R. D. Laing in his famous 'Knots', or of the paradoxes of Zen which, if resolved, are believed by the proponents of this spiritual order to lead to deeper insights, and to take those who have succeeded onto higher planes of cognition, resulting in a more balanced and harmonious ecological integration. (In our volume, some of these aspects are reflected in the chapters by Kunii and Boden; also Biocca's notion of the 'evolution of cognitive abilities' and 'intelligence amplification' belongs here). What can we learn about the human mind by studying how it externalises itself?. It has long been the feeling of many people that the products of one's mind, one's mental 'externalisations', tell us something about their origins. Graphology is by many considered a science that, on the basis of handwritten text, can say something about the writer's personality. We consider Wagner's oeuvre to be the true expression of the Germanic mind, for better or worse (if we believe in such a thing, that is). Similarly, we think of Liszt's music as characteristic of the playboy type that he represents for us: brilliant, but superficial and emotionally shallow. The question of course is how many of these externalisations are in fact internalisations of earlier produced judgements judgements that may wholly or in part have been provoked by considerations that were external to the externalised product. We may or may not like Poles or Germans, and consequently we think of Chopin or Wagner as 'typical' for our likes or dislikes. In this way, the externalised mind becomes superordinate to the internalised one: we become the slaves of our own mental products. With this proviso, viz., that we quite possibly learn nothing new about the mind, but rather replicate what is already there, albeit in an implicit form (see Boden, this volume), perhaps the most important property we can identify by looking at the mind's ways of externalising itself is its enormous versatility and resourcefulness in dealing with obstacles. It is as if the human mind were some kind of amoebe: when it encounters an obstacle, it internalises it and represents it as something mental, no longer 'out there', and consequently tractable by a mental operation (often called 'wishful thinking'), just like the amoebe digests its adversaries by engulfing them and absorbing them into its own system. Conversely, what an externalised technique-cum-tool also tends to reveal is the existence of stages (cognitive or physical) inherent in human evolution. Thus, each tool reveals the particular human thresholds which it is designed to help us transcend, ot~en with a hidden vengeance. To this issue we now turn. How does the use of externalised mental constructs (called 'tools') change people fundamentally? What we said above is of the utmost relevance for our discussions on how to define the relationship between the humans and the tools they make (including the most versatile tool of them all, the computer). The tool is both an affordance (in the
B. Gorayska and J.L. Mey
Gibsonian sense; Gibson, 1979) and a limitation. It is an extension of the mind inasmuch as it is mind externalised. But insofar as it is externalised (that is, a material thing), it is also marked by the inherent limitations of matter. In other words, it is an object among other objects, and is treated as such. The tool is, then, not only a means of liberating the mind; as an object, it is liable to the same 'fetishising' (to use a Marxian expression) that other objects are. We believe objects to have power because we either have created them in our image, or (as objets trouv~s) have 'found them in our image', in the double sense of the word: 'found' them, like the primitive native who finds a stick and believes it' s a god, and 'found' them, in the sense of finding them to be like us: 'And ye shall be as machines', as Hartmut Haberland puts it in his chapter (cf. also Mey, 1984). The fundamental change in the human occurs when he or she no longer considers the materiality of the tool as a subordinate property, but is intent on making it shine in all its material splendor (like Aaron polishing the Golden Calf). 4 Or, worse still, when He or She becomes subordinated to It, with often quite unforeseen consequences (cf. Piercy, 1990). To what extent does human interaction with technology serve as an amplification of the human condition, and to what extent does it lead to an atrophy of the human mind?
Every device that has been invented to transcend human weaknesses has occasionally (sometimes as the rule) been perverted to promote, rather than cure, those weaknesses, or create other, related (and worse) weaknesses. Take a simple invention such as clothes. They were destined to keep people warm, hence more resistant to sickness. At the same time, clothes remove some of the natural resistance that the body has to temperature changes, and makes it more prone to illnesses such as colds and infections of various kinds (see also Goldstein, Biocca, this volume). Or take the automobile, originally invented to let people travel in comfort and with greater speed and efficacy to their destination. Today, the car is an instrument of purposeless torture for many people trying to get to their work in the morning and having to sit on the freeway in noxious fumes for hours on end, or take the car to the workplace an hour ahead of time and eat their breakfast in splendid isolation in the carpark, rather than in the bosom of the family. And think of what the car does to its regular occupant's physical fitnessT As far as the computer is concerned, the most egregious case of perversion of its purpose has been the so-called simplification of office routines. It was said that the computer inaugurated the 'paperless once': no more mindless copying by hand or by spirit duplicator, no more generation of reams and reams of useless memoranda and standard letters; everything would be kept in the computer, and only brought forth when the necessity arose. Now look what we've got: more paper than ever... Another instance of the computer's ambiguous delivery on its promises is the ease with which one now can produce relatively nice copies of one's work; this ease perverts into a need to produce perfect instances of whatever piece of insignificant office procedure one has to put out. Similarly, spelling checkers (which originally were intended to help one spell correctly) now tyrannize us into spelling everything the same way, and do not allow us to distinguish between a draft (where spelling errors are 4 This is, in a nutshell, computer fetishism, the inherent and endemic illness of all computer programmers and computer fans.
Of Minds and Men
irrelevam) and a final document (e.g. a project description that has to go to some Research Council or other authority). What was supposed to make life easier and more meaningful has made life much harder and much more meaningless. And the reason? We have not been able to distinguish between the different 'rationalities' that are built into the machine (to borrow, and expand on, Max Weber's (1923) classical distinctions): the machine's own limited 'object' rationality (Sachrationalitat: what can this machine do?), and our own, also limited, 'subject' rationality (what can we do, what do we want to do, and why: Zweckrationalitat)? Furthermore, we must ask ourselves: do we really want it, or do we just want it because it' s there, or because it's possible? Which leads us to the ultimate rationality: the unlimited 'common' rationality of society, also known as the common good, but most often perverted to stand for the good of one particular class of people, say computer manufacturers or network freaks or hackers or criminals of various kinds. Fearfully we ask ourselves: Will the same adverse fate await our expectations of an amplified intelligence, of increased creativity, and of any other similar promised cognitive improvemems of the Information Age? The computer as a tool: Catastrophe, turning point, or both? Karl Marx, in one of his caustic asides on the benefits of industrialisation, observes how with the advent of machines and increased productivity, the laborer not only is pressed to the utmost, but actually risks being killed by that super-tool, the machine: 'The tool kills the worker' (Das Werkzeug erschl~tgt den Arbeiter). How is this possible? Isn't it the case that the tool helps us achieve things easier, fulfil our duties with more precision and speed, allows us to have more free time on our hands (after all, the work is done faster, and with less expenditure of energy)? It behooves us to recall what has been said about that housewives' blessing, the vacuum cleaner. In the beginning, when people first acquired this new gadget, there was undoubtedly a whole bevy of benefits that followed in its wake: houses became cleaner than they had ever been before, cleaning times were but a fraction of what they had been earlier, no more bent backs and varicose veined legs. But with the advent of the clean house, the ante was upped, so to speak. And what earlier had been an exception (witness the expression 'Easter clean', meaning an exceptionally clean state of affairs, to be achieved only at Eastertime or Passover, a tradition which still exists in a number of cultures, such as orthodox Jewry), now becomes the rule and the standard. And that is not the worst part of it. Not only has the rule of the game been changed, the game itself has got a new definition. What earlier had been a merit, now becomes a duty. What had been a task, now is a chore, to be performed at least once a day, and by increasingly more laborious and complicated methods, as not only the mental ante is incremented, but also the tool itself increases its level of perfection and technical complexity. The toolness of the tool, measured either in abstract, calculable terms (size of RAM or ROM, 16/32 bit processor, various operating systems and 'development environments', and so on), or in terms of outer appearances ('sleek from', 'aesthetic 3D-look', 'photo-realistic graphics', 'advanced' whatever) becomes more important than the uses for which it was originally created. Furthermore, this 'toolness' passes itself off to the mind as the only natural state of affairs for humans as well: we are to be measured in relation to how well we function as appendices to our tool. For example,
B. Gorayska and J.L. Mey
it is no longer important just to have a computer that works, and serves as a tool for our purposes (however limited and modest): we need to have the tool's latest version, because that's what computers are at today (and besides, we can't get spare parts or service for our old dinosaur any longer, so we simply have to buy an expensive new, shiny monster). Even if we are rank and file amateurs, when it comes to buying a computer, we insist on purchasing, along with it, professional quality sot~ware - o r 'industrial strength C/C++ code", as one ad has it (Dr. Dobbs'Journal, May 1995) much of which the majority of us will never have the faintest chance of putting to any decent use. Contrary to what someone might think on reading this, the above is not a Luddite plea for more primitivity. Rather, it is a plea for reflection on what a tool is, and how the computer tool, if we want to use that metaphor (or for that matter, any other metaphor) should be conceived of. The word 'conceived' is used with a vengeance here: a conception, viewed as an act, a process, rather than as a product ('a concept'), is a human work of space and time and use. That which was conceived, needs to be borne until fruition; but the story does not end there. The 'right to life' of the concept, once born, does not terminate at birth: the concept, the metaphor must grow in the environment in which it was conceived and born, and in which it was destined to be used. The way a concept develops is in its use; and it is through its use that it gets 'worded' (see Mey, 1985:166f). After all, a thought is not a thought until it has been expressed in proper language, to quote Marx (and Engels) again (The German Ideology). Vice versa, once the thought has been worded, and the conceived notion has been 'given' something verbal to wear (this 'giving' should not be taken in too passive a sense), the words themselves become important metaphorical agents (not to say tools). It is otten said that 'words don't break bones': we beg to disagree. Words, in general, are the 'lineaments of our verbal gratifications', to vary Blake (from his Note Books); we kill for words, like we kill for partners and food, and conversely words may kill us, like gratifications do, when they are not kept in their proper time and space (as David Good remarks in his contribution to this volume). The historical vicisssitudes of the concept of geocentrism, with its associated metaphors, furnish a good illustration: in the abstract, Galileo's beliefs and his wordings were scientifically gratifying; in their concrete form, however, they were a matter of life and death, and he had officially to recant and swear to being convinced that the sun rotated around the earth rather than vice versa, in acordance with the geocentric metaphor. Conversely, the competing metaphor, heliocentrism, while it may have won out on the scientific battlefield, does not play any significant part in our daily lives: we still talk about sunset and sunup, and the sun rotates around the earth, as it has always 'done'. The reason? It's the way we have conceived of things way back, in a more 'primitive' stage of our existence; the primary sensation of seeing the sun rise has captivated our language, and subsequently language has captivated our mental perception. 'Once you start, you're fight inside the thing: the rhetoric has you, language implicates you in the lie right off" (Mclnerney, 1993:108) Does language, then, shape our minds? Not directly; neither is this what Whorf was thinking, when he formulated his famous thesis about language's influence upon the shaping of the human thought and mind (cf. Whorf, 1969). What we do have, and always have had, is a 'working relationship' with language: if it has us by the tail, the reason is because our tails are language-shaped, like the rest of ourselves. We came
Of Minds and Men
into being, we were conceived, in a linguistic environment, and our being carries the imprint of that original Language (which is not necessarily any particular idiom, but that which Marx would have called Gesamtsprache, the 'universal language', had he had his wits about him when he wrote on the subject). So Whorf was right about language, but only on condition that we make his 'concepts' work for their 'living': humans and concepts shape one another, because language and thought, being 'conceived' together, must needs live and work together. From
tools to words
One may ask how we came to shift our previous emphasis on tools to one on words and language? Consider the three distinct stages of tool evolution that have been progressively separating the human mind from its natural habitat, as illustrated in figures 1, 2, and 3.
n t.r ' I enera~ cognitive environment
" manipulation? " .
evolution~ h Y ? J
natural physical environment
Fig. 1. The original feedback loop As the human mind evolved (figure 1), natural cognitive environments generated tools for a direct manipulation of natural physical habitats. These modified environments then began to feed back into the natural cognitive environments. Since tool use was relatively minimal, the feedback effect from tools to minds was too. It was the human encounter with nature, characterised by its own, inherent dynamism, that was originally responsible for our mental growth. With tools getting more and more sophisticated, and increasing in number, they became themselves the immediate environment for the mind's dialectic encounters, as shown in figure 2. This state of affairs led to an ever more pronounced detachment and a more forceful alienation of humans from the living matter which earlier had been their predominant partner in interaction, thus entailing a growing gap in their emotive, cognitive and biological adaptation. Alienation is the predominant condition of urban people in their fabricated worlds of everyday utilities enhancing their human physical or mental characteristics. Our manual skills and many bodily functions, once directly responsive to the rhythmic dynamics of nature, now thrive on the sounds, the looks, and the behaviour of purely technological devices. Here, we're all in the same boat, on
B. Gorayska and J.L. Mey
our way to the controlled environments of a 'virtual reality': latter-day 'feelies' of a brave, new world and a perhaps not-so-future 1984-ish universe.
natural cognitive
d q manipulation? ]
L environment p t~,~
amplification? /
fabricated physical environment natural physical ......
Fig. 2. The intermediate feedback loop One of the most notable creations in this phase was the introduction of (precious) metals as market exchange tools, followed some fifteen centuries later by the invention of bank notes. This paved the way for the competitive invention of other tools, useful to society, with the inventor becoming an investor, whose reward was stored in the form of added exchange value. The monetary tool (technically called the 'general equivalent') thus also unleashed greed. Greed created the need for increased profitability and started the mad rush for effectiveness; both were successfully taken care of by all-pervasive and all-embracing business enterprises. As far as the human mind was concerned, the monetary tool affected and distorted our sense of values: sellable things (now called 'commodities') stopped being appreciated for any other values than their market value. ('The value of a thing/Is the price it will bring', as the Classsical Economists used to say). Money, having become a precious object for possession, established itself as the greatest asset in its own right, ungracefully subordinating everything else, including our morality, to itself. (One of the first to draw attention to this 'consciousness-perverting' influence of the invention of money by the 8th century B.C. Greeks was the German-British philosopher Alfred Sohn-Rethel (1972, 1978). See also Lindsay, this volume, on the importance of these 'other' values for a satisfactory socio-cognitive interaction). It is beyond any dispute that neither emotions nor intimacy go hand-in-hand with greater profits and increased productivity. The fabricated worlds for mass consumption ensure little of the former, and instead make the mind concentrate its attention on the tasks at hand formulated by the latter. (The cognitive effects of fabricated worlds are discussed by Gorayska and Lindsay (1989 & 1994) within the framework of their
Of Minds and Men
'Fabricated World Hypothesis', and by Gorayska and Marsh, this volume.) Such a fabrication has increasingly done away with warmth-generating, hand-crafted aspects in design. Unexpected asymmetries and imperfections, unpredicted gentle curves or crooked lines - the hallmarks of life and character- have given way to machinegenerated, straight and square, uniform, predominantly sky-scraping, lifeless- you name i t - jungles of plastic and concrete for human dwellings and work: everlasting monuments of optimal rationality. 5 Where standardisation and transportability of skills across tool-use rule the day, cultural diversity disappears from view, and many travellers no longer derive their creative inspiration from visiting 'foreign' lands. Next, following the expansion of mechanical tools, the computer arrived on the evolutionary scene. Unlike the other mechanical devices up to that time, the electronically mediated information tool externalises some of our known cognitive abilities. This tool, therefore, fabricates human cognitive environments, as illustrated in figure 3. The human mind, finally having found a way of turning upon itself, in so doing turned against itself, as it were. natural 1 cognitive L environment ~
~ ~ evolutio~
LI q manipulation? I amplification? atrophy9~
/ manipulation
fabricated cognitive environment fabricated physical environment
physical environment
Fig. 3. The modern feedback loop. Humans express themselves through words and bodies alike (Arndt and Janney 1987; see also Krueger, this volume). Verbal language directs attention to the relevance of largely unconscious, sensory exchanges which it cannot substitute for, only complement (Lindsay and Gorayska, 1994). Inputs and outputs, transmitted by s Remember what happened to the cheerful hues of colour in the former totalitarian regimes of the East? They all became strange shades of grey: blue grey, green grey, pink and red grey, yellow grey; dull and subdued.
B. Gorayska and J.L. Mey
the senses of touch, smell, hearing, and vision, need to be integrated in meaningful ways so that appropriate responses in contexts can be generated (Sperber and Wilson, 1989; Mey, 1993). The task of the conscious mind has been an active, cognitive, search for congruity in this sensory intake, a concern with what the Scholastic philosophers, following Aristotle, called the sensus communis, or 'common sense'. Our sensitivity to the varying degrees of such a congruity, which previously allowed us to use our common sense to distinguish reality from fiction, now takes on quite the opposite value; in modern, computer-driven environments, the implied denotation of 'common sense' no longer is to do with congruity in variety, but has come to stand for
uniformity in singularity. Note that here, too, optimising rationality has taken its toll. It used to be the case, as we said earlier, that our handwritten symbols, with their varying shapes, served as the paramount tool for expressing human emotions and personalities. The same can be said of the vast richness of tones in the spoken medium; today, these riches, too, are a matter of the past. What we are left with is a unified type 6 (in all senses of the word), good for nothing more than the mere exchange of information. Adopting the role of exchangers of information, we have adapted ourselves to the very name coined for the Age. And there is more: By exchanging and manipulating electronic information over long distances (sometimes called 'telematics'), we are able to connect people, and connect with people, in all sorts of distant places. A true slogan for our times could be: Telecommunicators of the world, uniteT But how many of us stop to consider that this modern facility also makes possible, on a global scale, a separation where previously none existed, nor should, or would have been? Unless we exercise proper care, our global village, McLuhan's dream, will be turned before our eyes into a Searlian 'SuperChinese Room', the very 'Hermeneutic Hall of Mirrors '7 that Harnad (1990) warns us against. There, nothing is found except ungrounded symbols which, even if we were able to interpret, we could not really understand - f o r the precise reason that such symbols would not have been acquired through a shared, real world experience (see also Good, Biocca, and Janney, this volume). Nobody would wish to deny that the tools we use have originated in acts of human creation, or that many of them embody great scientific achievements. We also grant that those inventions have been mostly well-intended. The typist first got a typewriter, then a flexowriter, and finally a word processor; the bookkeeper got a Hollerith, then an electric book-keeping typewriter, and finally a spreadsheet and other sophisticated software; the manager kissed his secretary goodbye and got a decision support system and a laptop; the accountant got a spreadsheet; the learned got their files and archives; the readers got their Hypertext, and any old artist (self-styled or officially recognized) can now create, at the touch of the keyboard, shapes and colours previously undreamt
6 We must not let ourselves be fooled here by the recent invention of the 'notepad' computer, which supposedly learns to recognise our handwriting on a touch screen. As many have observed, our handwriting tends to adapt quickly to the expectations built into the machine. This process reminds one of the opposition that exists between what has been called 'adaptivity' (adapting humans to tools) and 'adaptability' (adapting tools to humans; Mey 1994). 7 Compare to the glass walls of many modem skyscrapers, in which all you will see is at best your own reflection, or the reflection of other skyscrapers (which, in fact, may be a lot more interesting, as anyone knows who has strolled the streets of downtown Toronto on a sunny day).
Of Minds and Men
of. (On the cognitive benefits of electronically-mediated communication, in particular among scientific communities, see Harnad's chapter; for a critique of the advantages of Hypertext, see the chapter by McHoul and Roe). In all this, there is a 'but': by using these tools, we have tacitly said 'farewell' to both our control of the mental means of production (consult Roberts, this volume, for some experimental evidence), as well as to our sole ownership of the externalised objects that are the result of that production; none of these creations can any longer be said, or seen, to be of our own making, or reside within the domain of our personal decision-making. The quality of our creative and analytical thought becomes increasingly dependent on the availability and skills of technicians, support people, software engineers, and providers of electrical power, to mention but a few. Take these away, and where would we be? We gained the world, but lost our souls (to paraphrase the Bible), if we didn't outright sell the latter down the river, just like olim Doctor Faustus. And rather than adapting the media to us, we have adopted every one of its quirks and idiosyncrasies. Few are today the people who are able to think, and form their thoughts sequentially, in sentence form; it all has become a matter of jotting down and sorting out on the screen, with the help of a thought organiser or e v e n - God forbid- a thought generator. 8 (Examples of how computerised tools can be used with a minimal cognitive dependency trade-off for the benefiting individual can be found in Kunii's contribution to this volume.) Tool-generated deficiencies in human make-up have always, and often quickly, been tool-corrected. The evident lack of natural nutrients in machine-produced, artificiallyfertilised food has led to the invention of synthetic substitutes. Rather than stopping the process of refining our flour, we are putting its original roughage back in as a precious extra. Lead-free petrol was sold as an innovation, hence used to be more expensive; but why did we put the lead in in the first place? Our waning physical condition is corrected by the invention of 'fitness centres'; but why did we stop walking? And so on and so forth. But our thinking depends, as it always has done, on the senses; hence, in order to obtain the proper food for our thoughts, we have to rely on our natural, different sensual demands, rather than settling for the impoverished fare that we are standardly offered ever since modern society has forced us to rely on its artificially diversified input sources. With computers arriving on the scene, we are witness to (not surprisingly) the prompt advent of multimedia delivery systems, or so-called 'virtual realities', which promise to repair, by artificial and not-always-advantageous means (Biocca, this volume), the fading senses, and restitute our last, vital, 'missing link' to the outer world by our total, symbol-free immersion in a faked sensory experience (as described by Fogh Kirkeby and Malmborg, this volume). And the final result of it all? Not only does Harnad's Hermeneutic Hall replace the familiar Tower of Babel, but this development, being uniquely solitary and only falsely gregarious in character, turns all of us into solipsists in reverse. A corollary of the above is the emergence of a new perception of the Universal Mind: No longer is it the Big Unknown: it has taken shape before our very eyes as an externalisation of our own minds. No longer are we talking about the 'mind in the machine'; the vital question on the agenda is now that of the effects of the machine on
8 In the early days of AI, one of us had a friend who, in the Preface to his dissertation, remarked that, since this was a dissertation in AI, it properly should have been written by an intelligent machine...
B. Gorayska and J.L. Mey
the mind, and the resulting symbiosis of the mental and the physical: 'Of Minds and Men' ... in the Machines! A GUIDED TOUR THROUGH THE INDIVIDUAL CHAPTERS Based on the above, we want preliminarily to single out the following themes among the topics selected by the contributors to our book: 9 using technology to empower the cognitively impaired (Goldstein, Leong, Clubb & Lee) 9 the ethics versus aesthetics of technology (Krueger, Lindsay, Fogh Kirkeby & Malmborg, Gorayska & Marsh) 9 the externalisation of emotive and affective life and its special dialectic ('mirror') effects (Janney) 9 creativity enhancement: cognitive space, problem tractability (Boden, Good, Hamad, Krueger, Kunii, Chan, Tamura) 9 externalisation of sensory life and mental imagery (Biocca, Fogh Kirkeby & Malmborg, Krueger) 9 the engineering and modelling aspects of externalised life (Burstein & McDermott, Haberland, McHoul & Roe) 9 externalised communication channels and inner dialogue (Good, Hamad, Heath, Kasif & Salzberg, Krueger, Lindsay, Littman & Eisenhardt, Roberts, Sillince, as well as Clubb & Lee) 9 externalised learning protocols (Cox, Gorayska & Marsh, Kass, Burke & Fitzgerald, Schank & SzegO, Sillince) 9 relevance analysis as a theoretical framework for cognitive technology (Gorayska & Marsh, Lindsay) The above list is just a first approximation; more details will be provided below, where we take the readers on a guided 'walk' through the book's chapters, as these are grouped together in their appropriate sections. The chapters fall more or less naturally into two groups: one of a more general, theoretical type, the other dealing with specific, concrete cases and problems. Of the altogether 25 chapters (not counting the Introduction), almost one third (8) fall into the first group, while the remaining 17 make up the second. Each group of chapters has been divided into a number of thematically coherent sub-sections.
Theoretical issues of cognition, modeling, mental tools, and agents Cognition Barbara Gorayska & Jonathon Marsh (City University of Hong Kong and Hong Kong University), in their chapter 'Epistemic Technology and Relevance Analysis: Rethinking Cognitive Technology', raise the issue of changing goals in a quasi-familiar environment. What is 'new' in the new technology, they ask, and how does the mind react to the new 'superimposed structures'? They raise this issue from the point of view of the 'technologised mind', rather than (as has been done so far) from the angle of the human-friendly tool with its affordances on action (as in HCI, 'Human-
Of Minds and Men
Computer Interaction'). Both Gibson's (1979) direct realism in accounting for ecological perception, from where the idea of action affordance has been imported to HCI, and the current trends in HCI to treat action affordance in purely functional terms, leave some fundamental questions unanswered, viz.: (1) 'What causes a perceiving agent to attend to a particular set of stimuli to begin with?', and (2) 'How are affordance characteristics mapped directly onto the process of cognitive formation itself?.' Unless we answer these questions, the authors maintain, we will not gain real understanding of the process that enables meaningful interactions of agents with environments, nor will we be in a position to understand how environments shape our thinking. The theme of innovation is also one that haunts Ole Fogh Kirkeby and Lone Malmborg (Department of Computer and Systems Science, Copenhagen Business School, Denmark). In their contribution 'Imaginization as an Approach to Interactive Multimedia', they insist on the necessity of reflection in order to be able to produce innovation. This reflection takes the shape of 'mental images' that can be stored interactively, and anchored in what they call 'situated cognition', using multi-media technology. As there can be varying degrees to which multi-media technology supports reflection and image creation, the question arises whether it is at all possible to combine these different modes of interaction without one destroying the cognitive effects of the other. Frank Biocca (University of North Carolina, Chapel Hill, N.C., USA) raises the question: 'Can Virtual Reality Amplify Human Intelligence?', and considers, as part of the answer, the problems of 'Cognitive Extension, Adaptation, and the Engineering of "Presence"'. The crucial issue to be raised in this connection is whether this kind of 'presence' is a matter of technology only, as many proponents of Virtual Reality seem to believe; the problem is that nobody has yet defined what 'amplifying intelligence' really means. Modelling & Mental Tools In his contribution 'Patience and Control: The Importance of Maintaining the Link Between Those who Produce and Those who Use', David Good (Department of Social and Political Sciences, Cambridge University, England), observes that we must be careful to distinguish between 'indulging' the user and truly benefiting him or her. The problem is that the wrong technology (as also observed by Barbara Gorayska & Jonathon Marsh) may turn out to be detrimental to the user, not only individually, but also on a broader social scale. The new technologies lead to an ever diminishing authority and control of the speaker/writer over how technologies structure the environment, which context they are interpreted in, and which needs of the hearer/reader they therefore are able to satisfy. What can be learned (if anything at all) by those who use, Good asks, when the normative effect of direct and immediate social interaction with those who produce is gone? Hartmut Haberland (Department of Language and Culture, Roskilde University, Denmark), in a take-off on an old adage, asks himself whether it is more fruitful to model the human on the machine, or the machine on the human ("~And ye shall be like machines"- or should machines be like us?'). He points out the importance of distinguishing between simulation and emulation, and shows how all analogy, if not checked, in the end will turn out to be a circular process. Models are meaningless
B. Gorayska and J.L. Mey
unless they are grounded in direct experience. In our metaphorical effort to further understanding of both humans and machines, it is possible to model theories about the former by analogy to our perception of the latter, and vice versa. But the price we may have to pay for such visibility, Haberland warns, is that we will no longer know where to look for the meaning of either. In his contribution 'Levels of Explanation: Complexity and Ecology', Ho Mun Chan (Department of Public and Social Administration, City University of Hong Kong, Hong Kong), observes that the daunting complexity of many tasks and the seemingly paradoxical ability of the human mind to cope with them, contains a lesson for us when we are planning our cognitive environment on the computer: viz., by generalizing our assumptions about that environment, we are able to make it less complex, and easier to deal with. Machine-implemented general problem solvers are not possible for the same reason that no single human has ever been a general problem solver. What we can reasonably achieve, and should therefore strive for within the Cognitive Technology agenda, is the type of human-machine interaction that can solve a range of tractable problems in specific environments.
Agents Margaret Boden (University of Sussex, Brighton, England), in her chapter 'Agents and Creativity', discusses aspects of creativity in a computerized environment. Her thesis is that true creativity consists of making new use of already existing components, rather than creating things ex nihilo. Since human agents are best at the former activity, our construction of a cognitive environment should aim at stimulating human creativity by facilitating access to new, unpredictable, conceptual formations generated by the computer, rather than force the user to adapt his/her creative dan to the machine's limitations. Myron Krueger's (Artificial Reality Corporation, Cambridge, Mass.) chapter is called 'Virtual (Reality + Intelligence)'. Exploring the relationships that exist, or may come into being, between humans and machines, the author focusses on the relation of intelligence to physical reality, including the role that intelligence technologies can play in virtual realities. For Krueger, aesthetics is a higher measure of performance than efficiency, and he therefore chooses to consider success in establishing such relationships as a form of art. (Compare with the stance taken by Gorayska and Marsh). In contrast with what most computer scientists, and indeed intellectuals of all persuasions, believe, it is Krueger's thesis that much of our cognitive intelligence is rooted in our perceptual intelligence, and that one therefore from the very beginning should seek to reintegrate the mind and the body: one should experience a computer program with one's body, rather than through the medium of keyboard input or interaction with a data tablet or mouse. Thinking along these lines, Krueger arrives at many of the ideas developed in what is now called 'virtual reality'; he also is able to predict a variety of ways in which virtual reality and cognitive technologies (including traditional AI) are going to interrelate in the next few years.
Of Minds and Men
Applying insights from CT to individual problem areas Communication
Roger Lindsay (Psychology Unit, Oxford-Brookes University, Oxford, England) has named his contribution 'Heuristic Ergonomics and the Socio-Cognitive Interface'.. He takes his point of departure in early approaches to the 'human factors' problem in HCI, and shows that such approaches fail, because they only focus on the machine end of the problem- the impediment also discussed at length by Gorayska and Marsh. What is needed is an interactive approach in which the machines are allowed to interact with the human user on the latter's premises. Such a notion is close to the idea expressed by Haberland in his contribution: 'Whoever said that humans should be like machines?'; why not rather take the machines seriously as potential cognitive agents that humans can react to, and interact with, on human premises? For Lindsay, communication on human premises necessarily involves an ability to engage in a cooperative dialogue governed by a normative, ethical heuristic. Providing examples of ethical language and norms, the author defines the challenge to Cognitive Technology as the need to develop a 'social ergonomics'. The necessary parameters must be found not primarily in the physical, but in the socio-cognitive interface. Research into the potential of human interaction with computers through simulation has targeted on how to produce the cognitive changes that are necessary for proper learning. Alex Kass, Robin Burke & Will Fitzgerald (Northwestern University, Evanston, I11., USA and University of Chicago, Chicago, II1., USA) suggest in their contribution: 'How to Support Learning from Interaction with Simulated Characters' that interfacing students with practices and experiences that are embodied in a computer based learning environment can open the way for the natural acquisition of communicative skills in everyday situations; they also report on results obtained with 'educational interactive story systems'. For these authors, the first and foremost undertaking for Cognitive Technology, if it is to maximise the benefits arising from the effects of tools on human cognition, is to build computer systems that match, in a fundamental manner, the ways people learn. Richard W. Janney (Department of English, Johann Wolfgang Goethe University, Frankfurt am Main, Germany), in his chapter 'E-mail and Intimacy', suggests that the apparent lack of restrictions on communications that are observed in a medium that otherwise imposes severe restrictions, may be explained by a special type of interaction in communication: the 'virtual partnership' that is exercised in electronic mail, and which allows us to cross an 'email-intimate' threshold that normally would not allow us to interface with other users this closely. If this partnership is to realise the strong hopes formulated by McLuhan (of which Janney reminds us), viz., that one day electronic technology will follow directions which are not only socially unifying but above all humanly satisfying, the need, and promise, of today's Cognitive Technology is to find the fight balance between technology and experience. The subject of thresholds of communication is also the subject of the next contribution: 'Communication Impedance: Touchstone for Cognitive Technology', by Robert Eisenhardt and David Littman (SENSCI Corporation, Alexandria, Va., USA, and Advanced Intelligent Technologies, Ltd., Burke, Va., USA). The authors ask themselves: What can go wrong in computer communication? For an answer, they hypothesize that computers lack the human capacity of detecting potential
B. Gorayska and J.L. Mey
communication failures before they arise, thus preventing the occurrence of 'impedance' in the communicative chain. The problem, being computer generated, needs to be solved by means of the computer, which is what the authors set out to do: a practical Cognitive Technology, they claim, has to result in development tools that would take it far beyond a mere theoretical curiosity or a handbook of design heuristics. Education
Among the applications of CT to problems of daily life, endeavours in the educational sector have a high standing, both historically and content-wise. Kevin Cox (Department of Computer Science, City University of Hong Kong, Hong Kong), in his chapter 'Technology and the Structure of Tertiary Education Institutions', takes up the challenge thrown out by Kass, Burke & Fitzgerald in their chapter: how can the computer assist us in making education better, and more accessible to users? Computers, he answers, have the ability to help structure cognitive environments which are both closer to the users and allow them to be physically absent (both in space and in time) from the location of the educational practice, thus revolutionising our concept of'schooling' as bound to a particular phase or location of a person. This favorable view is in contrast with David Good's more cautious outlook on computer assisted learning. Orville L. Clubb and C. H. Lee (Department of Computer Science, City University of Hong Kong, Hong Kong) are involved in a project aimed at developing a telecommunication device that will allow Chinese hearing impaired users access to the information networks available to users of Roman characters. In their contribution, 'A Chinese Character Based Telecommunication Device for the Deaf (TDD)', they investigate how the appropriate infrastructures can be provided in order to develop an interactive telecommunications service for Hong Kong, and perhaps in the future, for Mainland China as well. A prototype for such services has been developed and is described. The next contribution deals with aspects of another impairment, blindness, when viewed from a cognitive technological viewpoint. Laurence Goldstein (Department of Philosophy, Hong Kong University, Hong Kong) investigates the theoretical implications of 'Teaching Syllogistic to the Blind' - a teaching which normally (in the case of sighted people) is done with the help of visual aids, such as Venn diagrams. The author introduces Sylloid, a tactile device invented by himself, and discusses practical problems arising from its application. The important question to which Goldstein draws our attention is what such an effort can teach us with regard to the normal functioning of the human cognitive/sensory system, and what pedagogical inferences can be drawn. C.K. Leong (Department for the Education of Exceptional Children, University of Saskatchewan, Saskatoon, Canada) discusses the implications of computer-mediated reading and text-to-speech conversion systems, designed to enhance reading. His chapter 'Using Microcomputer Technology to Promote Students' "Higher-Order" Reading' consists of a theoretical part, in which certain fundamental notions are discussed (such as the principles of 'automaticity' and 'compensation'), and a practical study of the results obtained in using an advanced computerized text-to-speech system (DECtalk) in working with below-average readers in grade school. The author
Of Minds and Men
believes, along with others quoted, that, due to the 'unnaturalness' of reading on-line and the complexity of reading and listening comprehension (among other factors that may also intervene), the pros and cons of computer-mediated reading will have to be appraised carefully before we can be certain of the conditions under which this particular mediation is helpful.
Planning Mark Burstein and Drew McDermott (Bolt, Beranek & Newman, Cambridge, Mass., USA; Department of Computer Science, Yale University, New Haven, Conn., USA), discuss 'Issues in the Development of Human-Computer Mixed-Initiative Planning'. Mixed-initiative systems allow humans and machines to collaborate in planning, and mainly, they allow the machine to suggest possibilities that the human user may not have thought of. In a productive synthesis, humans and machines can obtain 'synergistic improvements' in the planning process. The authors discuss what kind of multi-agent technology is most suitable from a cognitive-technological viewpoint. They believe that, in contradistinction to the world view of traditional AI, designers of cognitive technology tools must recognise and accept the fact that real life mixed initiative planners operate in unstable environments; the participants will fight back if they need to, but most of all they can be made to actively collaborate. In their contribution 'Committees of Decision Trees', David Heath, Simon Kasif, and Steven Salzberg (Department of Computer Science, John Hopkins University, Baltimore, Md., USA) attack the problem that besets the decision maker when he/she is dealing with pieces of evidence that have to be assigned different weights. In such a case, expert opinion is invaluable; but what to do if the experts disagree? A 'committee approach' is suggested that allows us to proceed with greater accuracy than when we have to rely on a single expert opinion. Learning how to deal with your problems, and how to plan, not so as to prevent them from coming up, but to learn from them while you look around for a solution, is the theme of Roger Schank and Sandor Szego's chapter, entitled 'A Learning Environment to Teach Planning Skills'. It is the authors' conviction that the usual school teaching only serves to suppress and kill any desire for true learning that the students may have had; the computer can help us restore the old learning environment, favoured also by Good, where teacher and student interacted on a one-to-one basis. The particular instrument for this teaching planning is called a 'goal-based scenario' (GBS); a concrete application is worked out in some detail.
Applied Cognitive Science Tosiyasu Kunii (The University of Aizu, Aizu-Wakamatsu, Aizu-chi, Japan) remarks that human cognition has suffered from computer dominance for as long as we have had computers. It is time, he says, in his contribution on 'Visual Recognition Based on a Differential Topology of Feature Shapes', to reverse the roles, and examine how cognitive technology can help and enhance human cognitive processes. It is shown that the most effective technology is also the most abstract one; several examples are discussed. 'Is There a Natural Readability?' is the question authors Alec McHoul and Phil Roe (School of Humanities, Murdoch University, Murdoch (Perth), Western Australia) ask themselves in their chapter on 'Hypertext and Reading Cognition'. It turns out that this
B. Gorayska and J.L. Mey
notion is open to serious questioning, and that readability as such does not exist prior to the technologies that facilitate reading and make it possible. However, since reading itself is a (cognitive) technology in its own right, it is over-optimistic and at any rate premature to expect saving graces to be inherent in pure technology-inspired efforts at enhancing readability (such as Hypertext). Hiroshi Tamura (Department of Information Technology, Kyoto Institute of Technology, Kyoto, Japan) has done a comparative study of 'Verbal and Non-Verbal Behaviors in Face-to-Face and TV Conferences'. His finding is that, contrary to expectation, the use of TV in remote conferencing has not enhanced communication; more factors need to be explored, such as the difference between private end business communication, the role of the non-vocal channel, and so on. A model has been developed for the analysis of conference participants in various modes. The question which John A. A. Sillince (Department of Computer Science, University of Sheffield, England) invites us to consider is: 'Would Electronic Argumentation Improve Your Ability to Express Yourself?.' He points out that the advent of electronic environments raises the challenge for us to discover in what ways, and to what extent, humans can gainfully use computer support in order to enhance their quality of argumentation. There always is a trade-off, when new technologies enter the human working-space: more knowledge may result in overload, multifarious connections in confusion, and so on. Several hypotheses are drafted, intended to capture the pros and cons of technological assistance in arguing. Special attention is given to problems of'asynchronicity', especially in remote discussions. In 'Shared Understanding of Facial Appearance- Who Are The Experts?', Tony Roberts (Department of Psychology, University of Southampton, England) explores the effect of introducing an 'expert' computer into a situation where people are trying to communicate about facial appearance, e.g., where a witness to a crime may be trying to help the police by looking at mugshots. In the experiment reported, the assumed level of involvement of the computer system used was varied systematically between two groups of participants. Those in the 'expert system' group were significantly less effective in identifying the correct face. Roberts argues that we rely on shared understanding of categories of facial appearance in such situations, and that assumptions about the role of the computer in the loop serve to disrupt this subtle aspect of communication. The book closes on an optimistic note from Stevan Harnad (Cognitive Sciences Centre, Department of Psychology, Southampton University, England), thus directly counterbalancing the pessimism expressed by McHoul & Roe, as well as Sillince's scepticism. In his contribution: 'Interactive Cognition: Exploring the Potential of Electronic Quote/Commenting', he draws attention to certain unnoticed, subtle but potentially revolutionary, changes that have evolved with the advent of electronic communication. In the traditional forms of communication, the speed of exchange is often either too fast (oral medium) or too slow (written medium). Email, and what Harnad has dubbed 'scholarly skywriting' (i.e., email discussion lists), together with hypermail archives and links to the virtual library, have opened up new doors for learned inquiry as well as for education, and blazed new paths in the exploitation of the human brain's potential. Among these new features, several are found that no prior medium has made possible; this holds in particular for the 'text-grabbing' option, called 'Q/C', that allows one to quote, and comment on, pertinent excerpts from previously
Of Minds and Men
read texts. Harnad describes a possible series of studies that would need to be done in order to convincingly demonstrate the potential of the Q/C feature. In many respects, knowledge building, though cumulative and ot~en collaborative, has been largely the work of 'cognitive monads'. 'Skywriting' facilitates a form of interactive cognition in which the monadic boundaries rapidly dissolve in Q/C iterations that have the flavour of a fast-forwarded recapitulation of the ontogenesis of knowledge; in this process, the identities of the individual thinkers get too blurred to be sorted back into monadic compartments. REFERENCES
Arndt, Horst and Richard W. Janney, 1987. InterGrammar: Towards an integrative model of verbal, prosodic and kinesic choices in speech. Berlin: Mouton de Gruyter. Diels, Hermann. 1899. Fragmente der Vorsokratiker. Berlin: Teubner. (7th ed. in 3 vols. by W. Kranz, 1954) Gibson, James J. 1979.The ecological approach to visual perception. Boston, Mass.: Houghton Mifflin. Gorayska, Barbara, 1993. Reflections: A commentary on 'Philosophical implications of Cognitive Semantics'. Cognitive Linguistics 4(1):47-53. Gorayska, Barbara & Roger Lindsay, 1989. On Relevance: Goal dependent expressions and the control of action planning processes. Research Report 16. School of Computing and Mathematical Sciences, Oxford Brookes University, UK. Gorayska, Barbara and Roger Lindsay, 1993. The roots of relevance. Journal of Pragmatics 19: 301-323. Gorayska, Barbara and Roger Lindsay, 1994. Towards a general theory of cognition. Unpublished MS. Harnad, Stevan, 1990. The symbol grounding problem. Physica D 42:335-346. Kirk, G. S., 1954. Heraclitus. The cosmic fragments. Cambridge: University Press Koestler, Arthur, 1964. The act of creation. London: Hutchinson & Co. Reprinted by Penguin Books: Arcana 1989. McHoul, Alec, 1995. The philosophical grounds of pragmatics (and vice versa?). (Submitted for publication, Journal of Pragmatics). McInerney, Jay, 1993. Brightness falls. New York: Vintage. Mey, Jacob L., 1984. 'And ye shall be as machines...' Reflections on a certain kind of generation gap. Journal of Pragmatics 8:757-797. Mey, Jacob L., 1985. Whose Language? A Study in Linguistic Pragmatics. Amsterdam & Philadelphia: John Benjamins. Mey, Jacob L., 1987. CAIN, and the transparent tool, or: Cognitive Science and Human-Computer Interface. In Proceedings of the Third Symposium on Human Interface, Osaka 1987, pp.247-252. (Japanese translation in Journal of the Society of Instrument and Control Engineers (SICE-Japan) 27(1), 1988) Mey, Jacob L., 1993. Pragmatics: An introduction. Oxford: Blackwell. Mey, Jacob L., 1994. Adaptability. In: R. Asher & J.M.Y. Simpson, eds., The Encyclopedia of Language and Linguistics, Vol. 1, 265-27. Oxford & Amsterdam: Pergamon/Elsevier Science. Mey, Jacob L. & Hiroshi Tamura, 1994. 'Barriers to communication in a computer age'. AI & Society 6:62-77 Piercy, Marge, 1990. He, She and It. London: Fontana.
B. Gorayska and J.L. Mey
Sohn-Rethel, Alfred, 1972. Geistige und kOrperliche Arbeit. Zur Theorie der gesellschaffiichen Synthesis. Frank~rt am Main: Suhrkamp. [ 1970] Sohn-Rethel, Alfred, 1978. Intellectual and manual labour: A critique of epistemology. Atlantic Highlands, N.J.: Humanities Press. Sperber, Dan and Deirdre Wilson, 1986. Relevance: Communication and cognition. Oxford: Blackwell. Weber, Max. 1950. The Protestant ethic and the spirit of capitalism. New York: Charles Scribner's Sons. (Engl. tr. by Talcott Parsons of: Die protestantische Ethik und der Geist des Kapitalismus. Archiv fiir Sozialwissenschaft und Sozialpolitik 2021, 1904-1905.) Whorf, Benjamin L., 1969. Language, thought and reality. (Selected Writings, ed. John B. Carroll). Cambridge, Mass.: MIT Press. [ 1956]
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 1 EPISTEMIC TECHNOLOGY AND RELEVANCE ANALYSIS: RETHINKING COGNITIVE T E C H N O L O G Y Barbara Gorayska City University ofHong Kong csgoray@cityu, edu. hk
Jonathon Marsh Hong Kong University
[email protected]
It is a disturbing thing to find oneself attempting to describe a set of novel ideas. It is impossible to avoid a strong sense of self-doubt and a nagging feeling that one is really just reworking old ground. At times the notion that there is nothing really new about the ideas one is struggling with seems inescapable. Then atter much reexamination of the possibilities the sense of novelty not only persists but continues to grow. Such has been the case with our exploration of the idea of cognitive technology as a distinct field of enquiry. The difficulty is that, while similarities to the widely studied areas of ergonomics and human computer interaction (HCI) are inescapable, the differences seem equally obvious Gorayska and Mey (1995) have made an attempt to detail these differences.
The paradigm we are going to propose takes as its key focus a specification of how and to what extent, human construction of environmental artifacts bears on the operations and structure of the human mind. Notable is the change of direction of the influence from a) the mind shaping the external worm by virtue of its mental interpretation processes, to b) the external worM's acquired structure shaping the mind. (Ibid.) However much more is needed if the idea of cognitive technology as a new field of enquiry is to come to fruition. To that end Gorayska and Mey have outlined the discipline by indicating four primary areas of investigation:
1) The nature of and changes in, the processes of access to information now made available through technological advances;
B. Gorayska and J. Marsh
2) How the interaction between humans and technological devices in the realm of information processing influences, from a pragmatic point of view, cognitive developments in humans; 3) Social and moral issues underlying cognitive developments as affected by modern delivery systems; 4) The feedback effect of such influences and interactions on future advances in Information Technology. While not denying the importance of these issues, we want to expand their frame of reference and further define the novel aspects of the approach, but only in so far as we place greater emphasis on the direct and generative relationship between mind and technology. We begin by taking a closer look at the adopted terminology. The term cognitive technology may be too narrow for our intentions. It serves well to describe those issues which deal with determining approaches to tool design meant to ensure integration and harmony between machine functions and human cognitive processes. Unfortunately, it does not adequately describe a number of other issues, in particular, those concerns which relate to the identification and mapping of the relationship between technological products and the processes by which human cognitive structures adapt. We see these two types of issues as constituting related but distinct areas of investigation which are best kept separate but must be given closely aligned treatment. We therefore reserve the term Cognitive Technology to refer to methodological matters of tool design, and propose the term Technological Cognition to refer to theoretical explorations of the ways in which tool use affects the formation and adaptation of the internal cognitive environments of humans. Human cognitive environments are constituted by the set of cognitive processes which are generated by the relationships between various mental characteristics. These environments serve to inform and constrain conscious thought. Under the new schema theoretical developments in Technological Cognition would find concrete expression in the constructed artifacts produced by Cognitive Technology. It is this dichotomy which forms the basis for our argument and the grounds from which we develop a framework for analysis. Taken together Technological Cognition and Cognitive Technology (henceforth referred to as TC/CT) involve the study and construction of human - tool interfaces which exploit and/or amplify the processing capabilities of one or the other such that the cognitive capabilities of the pairing involve a radical departure from those inherent to each separately. They invoke an Epistemic Technology concerned with outputs from the relationship between the structure of thought and the form of the environment in which thought occurs. THE COGNITIVE ENVIRONMENT, AFFORDANCE, AND TC/CT The assertion of Gorayska and Mey (1995) that a) The human mind anal the worm are interrelated in intricate and inseparable ways,
therefore b) the structure given to the human fabricated environment must have a profound influence on the structure of the mind
Epistemic Technology
remains central to the purposes of this paper. However, their argument further implicates the need for greater consideration of the processes which govern current approaches to designing the fabricated environment such that the construction of our internal cognitive environments is optimally benefited. It is arguable that enquiry into the manipulation of cognitive environments by technological means ought to begin with the premise that every tool is an embodiment of all the tasks which can be successfully accomplished using that tool. The critical underlying idea that there is a recursive effect from the fabricated environment on the structure of mind is not new. Joseph Weizenbaum (1983) argued for the importance of considering this effect in the study of Artificial Intelligence. In the study of perception, notably visual perception (discussed extensively in Bruce and Green, 1990), connectionist models have been built which, based on the work of Mart and Poggio (1976), attempt to map the cognitive ability to represent and recognize external objects directly to patterns of neuronal stimulation. Such work has produced some interesting and useful ways of analysing the mechanisms by which inputs to human perceptual faculties are acted upon cognitively to form recognizable constructs and ultimately abstract conceptual frameworks. Similarly work in psychology, notably that of the early Gestalt psychologists (Wertheimer, 1923; Koffka, 1935; K6hler, 1947), has provided us with useful models of how information is processed and sorted once it has been attended to. However, what is of greater interest to Epistemic Technology, understood in terms of the TC/CT relationship, is the fundamental question of what causes a perceiving agent to attend to a particular set of stimuli to begin with. Without an answer there can be no understanding of the process by which meaningful interactions with the environment are enabled. It is easy to attribute greater or lesser degrees of attention to obvious imperatives such as hunger, survival, comfort, or sexuality. The problem lies in trying to establish the mechanisms by which these imperatives themselves become consciously recognized and responded to in increasingly more purposeful ways. The assumptions underlying our approach to this problem conform with those of connectionist thinking only in so far as we accept that the environment of a perceiving agent dictates to one degree or another the perceptual constructs which can be elicited from it. Koffka (1935) expressed the idea clearly when he wrote of the 'demand character' within perceivable objects which depicts and communicates inherent functionality to a perceiving agent. Gibson's (1979) ecological approach elaborated the concept further by arguing that within any perceived environment only a finite set of perceptual experiences are possible. He proposed that inherent within any environment there exists a set of'affordances' whereby the characteristics of that environment allow certain possibilities for action to exist (for a detailed account see Warren, 1984; for applications see Norman, 1988). These in turn, when instantiated, serve to condition the characteristics of the environment. Such affordances can be said to be operational whether or not they are actively perceived to be so by a perceiving agent. Consequently, the process of analysing affordances becomes essential to gaining an understanding of the functional value of an environment or sets of objects within that environment (i.e., tools). The notion of affordances, and the ecological model of perception it embodies, remain interesting and useful to TC/CT. They support the analysis of tool use and artifact construction in terms of perceived functionality and disposition of mind. The
B. Gorayska and J. Marsh
TC/CT approach likewise assumes that environments are commonly perceived in terms of their potential for action. However, it is further concerned with how perceptual capabilities are themselves modified by environmental constraints on action. It should be made clear that we do not accept all aspects of Gibson's thinking. Notably we disagree with his claim that all perception can be understood without reference to linguistic or cultural mediation. Such direct realism does not allow for the relationship between perception and mental representation as a constructive effort mediated by cognitive processes. Instead it leads to the idea of perception as a direct phenomenon of mind which occurs, strictly reactively, as a result of exposure to the environment. This limits any attempt (Norman, 1988; Gaver, 1991; Eberts, 1994) to use analysis of affordances to forward understanding of how environments shape our thinking. Exploration of optimal approaches to tool design is further limited by the inability to map affordance characteristics directly to the process of cognitive formation itself. Ultimately, these limitations must restrict analysis of human/technology interactions to an examination of how affordances relate to the brain's ability to perceive the inherent functionality of a set of tools, leaving unaddressed the issue of how tool use serves to fundamentally alter the shape of the mind. TC/CT: IMPLICATIONS FOR INFORMATION SYSTEMS
TC/CT is naturally concerned with the effect of conducting the examination of affordances solely in terms of functionality. Of particular interest is how such a restriction has influenced our approach to the design of electronically mediated information tools. This restriction must condition the way system designers perceive their aims with respect to providing usable systems. Perhaps more importantly it must also condition the way they envision themselves in their role as providers of such systems. Heuristics for designing human-computer interactions have become dominated by an apparent concern for 'human factors' (Van Cott and Huey, 1992). This is evidenced by the fact that the rhetoric of design practice has become focused on facilitating and improving human ergonomics, human cognition, and human to human dialogue in cooperative task performance. The design imperative is to "devise new and better ways to expand human capacity, multiply human reasoning, and compensate human limitations" (Strong 1995: 70). This assumes the idea of the user as central to system design (Norman and Draper, 1986). We believe this assumption is in conflict with the above mentioned functionality driven approach and hence is rendered unrealizable. System designers are not system end users. By virtue of their adopted role, and their functionality oriented perceptions of that role, system designers can only deal in matters of construction. This situation must constrain their thinking about what they do and what system users expectations of them are. Even if system users are actively involved in matters of design (as in Gould and Lewis, 1985; Kyng, 1994) their contributions are only considered in terms of system usability; hence the users themselves become designers who contribute to the interests of the system. This concern for the usability of system products must cause the concern for human issues to become quickly reduced to engineering issues (Norman, 1986; Rassmussen, 1988; Bailey, 1989). These in turn must reduce to machine issues. The immediate consequence of these unintentional reductions is a growing tendency to perceive the end user strictly as a customer of the computer industry. Hence the benefits of
Epistemic Technology
improved information tool interfaces are increasingly marketed solely in terms of functional benefits such as 1) faster response time, 2) reduced learning time, 3) fewer errors, and 4) improved customer service, all of which are globally justified by an appeal to improved cost/benefit ratios (Curtis and Hefley, 1994). Such product oriented thinking ultimately reflects a tacit determination of value as improved efficiency in the workplace. Unfortunately it makes no reference to specific ways in which the individuals who must actually use the products may benefit. The situation is reminiscent of the critique made by Ellul (1965) of the cosmetic industry as providing a real solution to an artificially constructed need. Cognitive models may be considered with reference to the design process; however, these models tend to be considered only in terms of machine ends. That is to say, users are seen to be transformed by machine use only in so far as they become more adept at that use. Ironically, despite rhetoric to the contrary, consideration of the ways in which human capabilities themselves may be amplified rarely finds concrete expression in machine functions. Another unavoidable consequence of a functionality driven approach to humancomputer interaction designs is that the computer product, if it is to be usable, must look and feel good. This demands the construction of computer-mediated environments which closely reflect the perceptual interactions we are normally at ease with. A lot of effort is currently being expended on generating feelings of comfort. The route generally taken is to incorporate a variety of modalities, such as sound, graphics, video, text, or animation, and to explore the use of common metaphors (desktop, blackboard, workbench, etc.) in order to ensure that system functions are not only easily understood but also entertaining. In this context the design focus shifts onto the nature of interactivity itself and how it is controlled/conditioned by successful communication. On the one hand, resulting designs often involve the user as an "actor" (Laurel, 1991) within the machine-mediated environment, while on the other hand, they cause the machines themselves to be perceived as social agents by their users for reasons well explained by Nass, Steuer and Tauber (1994). It is obvious how this relationship is further reinforced when the interface begins to simulate human linguistic behaviour supported by human facial expressions (as in, e.g., Walker, Sproull and Subramani, 1994; Takeuchi and Nagao, 1993). The computer industry is thus involved in making business more competitive, omen by either exploiting an illusion of human human interaction or appealing to the mechanisms of social play. This situation represents an explicit reversal of the stated aims of TC/CT, which are ultimately concerned with amplifying the effectiveness of interactions between humans and not simply between humans and machines. Without wishing to appear overly dramatic, we wonder if there is an indication here that ethics are in danger of being traded in for aesthetics. (cf. Krueger, this volume.) With respect to the value of analysing the affordances projected by tools, it is important to consider the fact that the computer industry is also involved in, and to some degree depends on, the production of new knowledge. This calls for increasingly more powerful technologies "to significantly augment the skills that are necessary to convert data into information and transform information into knowledge" (Strong, 1995:70). Once again the argument for a design process driven by usable outcomes is invoked. "[T]his knowledge and these skills must be translated into effective design, design not merely of graphical displays, but initial design that takes into account users and constraints in such a way that the later changes are not necessary and users can
B. Gorayska and J. Marsh
immediately employ the products" (Ibidem). Seen this way, the user is at risk of being forced into the role of a mere consumer of knowledge and not one of an active participant in the process of constructing knowledge. Consequently, in spite of the rhetoric extolling the virtues of interactivity, a contradictory assumption remains operational in the design process: that is, that information which is machine-generated and passed to a user, will miraculously, by virtue of contact alone, become that user's knowledge. When in fact it may remain nothing more than another piece of information to be dealt with. Such an approach to system design cannot be effective. The logical outcome must be a proliferation of information pollution. For example, despite ever improving interfaces, it is becoming increasingly more difficult to find one's way around in a coherent and meaningful way on the internet. Distraction from purpose is commonly experienced by users who find that the wealth of information and readiness of access render selective searching problematic. Paradoxically, the task of becoming well informed for the purpose at hand is often hindered more than it is helped. The problem lies with the explosion of usability factors precipitated by a product oriented approach to system design. Functionality is interpreted in terms of a one to many relationship between a developing system and its users. Hence any consideration of the mental models constructed by users to accommodate the system's functional value is conducted strictly with reference to how well they relate to the system and how well they meet its intended purpose. Typically during test phases numerous users are observed in order to determine how effective they are at taking advantage of the system's functions. On the basis of the information gained the system is then modified to narrow the margin between the systems functions and the conceptual models which the users have of that system: further reifying the need for those functions. The approach thus remains cyclical and self fulfilling. Concern for how the design of the system works to transform the user is trivialized, if not lost entirely, and is understood only in terms of the system itself and not in terms of the users. We contend that in reality an inverse relationship is at work. There is a many to one relationship between systems (or tools) and users. TC/CT is about orchestrating the influences of those systems on the cognitive modeling capabilities of users so as to optimize human benefits. TC/CT AND THE PHENOMENON OF ATTENTION Gibson's theoretical framework, which appears to underpin current approaches to usability, cannot serve to explain the phenomenon of attention as it relates to perception because 1) even the simplest environment can be perceived in a variety of ways albeit according to its affordances, and 2) it lacks reference to the internal processes of cognitive formation. Attention is governed by what matters most to the perceiving system at the precise moment of perception. It may fluctuate rapidly within an apparently stable communication event and, consequently, may appear to be unfocused and disjointed, perhaps giving the appearance over time of 'inattention'. However, on a moment to moment basis there is always something which is capturing the 'processing' attention of the perceiving system. Attention then can be described first in terms of longevity (i.e., the length of time a set of perceptions remains in focus) and second in terms of intensity (i.e., the degree to which cognitive processing capabilities are brought to bear on the object of perception). Within this scheme, by
Ep&temic Technology
adopting a connectionist model of cognitive processing, intensity can be determined through a binary representation of the presence or absence of response across a varying number of processing nodes. It need not be thought of in terms of varying degrees of response or activity within a given perceptual or cognitive faculty. We believe that a consideration of the manner and degree to which any given perceptual input gains and sustains attention is fundamental to the development of heuristics for information tool design which successfully account for human factors. This process we hold to be determined by the degree to which the system is able to assign relevance to perceptions. Inversely, since attention can be said to signal relevance, the question as to what determines relevance becomes pivotal. R E L E V A N C E AS THE ANVIL OF A T T E N T I O N By virtue of the way it is structured, any tool carries a potential for releasing and mediating the mental processes of association which construct varying motivational states within humans. It follows that it also contains the potential for triggering a search for effective action sequences which are perceived or known to be able to satisfy those states. Lindsay and Gorayska (1989, 1993, 1994) have proposed a framework well suited to the analysis of these processes of association. It gives primacy to the notion of relevance as an essential theoretical construct which underpins all human symbolic-action planning and goal management processes. 1 Relevance can be defined simply as the relationship between goals, action plans, and action plan elements:
E is relevant to G if G is a goal and E is a necessary element of some plan P which is sufficient to achieve G Within a relevance driven analytical framework, the emergence of rational, purposeful behaviour is thus accounted for as an output of fine tuning of goals (i.e. cognised motivational states) to effective action sequences (i.e., connecting cognised plans for achieving goals to appropriate motor movements). Such tuning is further conditioned by the extent to which a perceiving agent is able to recognize the utility of all objects and events necessary to the occurrence of an effective action sequence. It is this process of fine tuning, which we hold to determine attention. From a generative perspective, the relationship which governs the instantiation of relevance can best be understood in terms of a governing global relevance metafunction (RMF) (Gorayska et. al., 1992). The purpose of the RMF is to act as an interface control mechanism (or possibly as a narrow bandwidth communication channel) between various cognized associative groupings and/or related search processes within functionally distinct cognitive subsystems. Simply formulated as:
[subjectively] relevance (Goal Element-of-plan, Plan, Agent, Mentalmodel) the RMF can return values for all of its parameters, depending on the initial inputs. When supported by goal management, external feedback, and hypothesis
1 How this framework differs from the widely accepted Relevance Theory of Sperber and Wilson (1986) has been explained in Gorayska and Lindsay (1993, 1995), Mey (1995), Lindsay and Gorayska (1995), Zhang (1993), and Nicolle (1995). Furthermore, Zhang (1993) has produced a formalised account of optimal relevance visa vis goal satisfaction and activation, using this framework.
B. Gorayska and J. Marsh
formation/confirmation, the function can account for the positive adaptation of minds to minds or minds to environments. Interestingly it is also possible to envision the more fundamental process of cognitive representation itself being represented in terms of a recursive application of the RMF. Unfortunately, despite its importance to TC/CT investigations, elaboration of this point is beyond the scope of this chapter. The utility of the RMF is immediately obvious when we consider mere recognition of the goals, plans for action, and environments captured by a perceiving agent in cognitively represented world models. What is less obvious, but much more important, is that, due to its iterative and recursive nature, the RMF also allows for the initial cognition of the motivational states, motor movements, and environmental percepts from which goals can be derived. Necessary to investigations of TC/CT is the realisation that cognitive goals, so derived, are not stable over time but are constantly generated, modified, clarified, specialised, prioritised and forgotten. It is our contention that, fortified by the RMF, relevance analysis provides sufficient adaptability as a theory to allow for such instability without losing any of its explanatory value. As such, it provides an ideal framework within which to situate the study of TC/CT as it applies to activity within a variety of disciplines. The assumptions underlying the above have been reflected in, and supported by, work in cognitive science in general, and Artificial Intelligence (AI) in particular. Both these disciplines find little difficulty in successfully accounting for goal seeking behaviour, once the goals of an organism or device are known and the relevance of individual objects and events which contribute to effective action sequences are established; that is, once problem spaces (Newell and Simon, 1972) have been generated. It is not a coincidence, we believe, that nearly all the endeavours in AI to date have focussed on human and/or machine action plan generation. The questions which still remain unresolved are more fundamental. They are 1) 'Where do our goals come from?' (Wilensky, 1988), and 2) 'How is the relevance of elementary objects and events established prior to the formation of effective action sequences that satisfy these goals?' (Lindsay and Gorayska, 1994). Through an application of relevance analysis, TC/CT seeks to answer these questions by providing a method for examining human sensitivity to the structures superimposed on our cognitive apparatus by the fabricated environment. This can only be done in conjunction with feedback mechanisms which register changes in degrees of satisfaction with respect to currently detected needs. Such feedback is necessary because structure in the environment guides the formation of mental schemata by dictating what can be accomplished successfully within the limits of that structure. Without the presence of such feedback mechanisms thought would be entirely conditioned by the affordances supplied by the environment. All capacity to modify environmental constraints towards meaningful ends would be negated. The fabricated environment must output feedback which primarily affects, positively or negatively, the generation and modification of the perceiving agent's goals and not only plans for action. Such goals are instrumental for the wants and needs which serve to construct human conscious awareness. In this context, the RMF constitutes a base construct from which cognitive formation mechanisms can be derived. These in turn generate the mental schemata needed to account for the ability to cognise problem spaces, activate goal seeking behaviour, and transform the problem spaces into the corresponding solution spaces.
Epistemic Technology
Such schemata can subsequently be understood as a direct result of the RMF interfacing and filtering the outputs/inputs of two systems running in parallel: 1) an unconscious relevance seeking connectionist system driven by genetically mediated motivational processes (accounting for order being imposed on perception) and 2) a conscious goal directed action planning system which uses relevance relationships as a basis for establishing symbolically represented goals and the plans sufficient to achieve them. (Lindsay and Gorayska, 1994; cf. a hybrid system, proposed by Harnad (1990), in which the role of motivation is not considered.) At this point we are able to consider how goals are actually generated. Several important factors must be noted. First, goals are not simply symbolic descriptions of motivational states. Rather, they are procedural objects interconnecting goal related mental constructs (Gorayska et. al., 1992) such as: 9 projected future states of the agent, 9 different objectives of either attaining, sustaining, stopping, or preventing those states, 9 activation conditions 9 satisfaction conditions *
additional constraints which themselves may be embedded negative goals
Second, activation and satisfaction conditions can be states in either the internalcognitive or external-physical environments. The former must exist and be perceived for the agent to activate goal seeking behaviour. The latter must exist and be perceived by that agent for her or him to attain a projected future state. Activation and satisfaction conditions for a given goal, when attended to, initiate problem solving in search for, or construction of, the set of operations which can affect a transition between them. Finally, humans integrate into the environment by cognizing its invariant features as activation and satisfaction conditions for goals. According to Gibson (1979) any environment contains features, referred to as invariants, which remain consistently recognizable from a variety of viewpoints. These invariants can be understood as satisfaction conditions for perceptual object recognition. Inversely, higher levels of cognized satisfaction conditions can be seen as invariants within the internal cognitive environment. (cf. the symbol grounding problem in Harnad, 1990) These invariants provide navigation points for spatial or temporal orientation within solution spaces (Gorayska and Tse, in prep.). To be effective, it is essential that invariants be salient and readily perceived. Across cultures, this has led to the construction of fabricated habitats that facilitate the reduction of sensory noise, thus highlighting the relevant invariants within them. The Fabricated World Hypothesis put forward by Gorayska and Lindsay (1989, 1994) extends Gibson's affordance theory by proposing not only that a) most of human memory is in the environment, but also that b) the human fabrication of habitats is such as to ensure activation and satisfaction of very few goals within them at any one time. This eliminates unnecessary mental processing, serves to make complex problems tractable, and makes simple algorithms sufficient for effective ecological interaction. In this context it is plausible to believe that external control of invariants activating and satisfying goals leads to an iterative and recursive application of the RMF. This in turn
B. Gorayska and J. Marsh
may lead to a formulation of cognitive goal chains that ultimately interface with the motivational states of participating agents, inducing their symbolic realisation. More importantly, this process is as valid for the formulation of domain specific goal/plan/situation correlates as it is for the formulation of meta-goals which embed and control cognitive processes themselves. Foundations can be laid here for significant manipulation not only of what people think about but also how they think about what it is that they are thinking about. TC/CT research takes the Fabricated World Hypothesis to its extreme. It attempts to address the issue of how the human fabrication of externalized, environmentally situated memory outlets dictates or prescribes which goals people will pursue most of the time, hence changing behavioral norms. It acts to investigate the way in which changes in perceived satisfaction conditions, affected by goal changes, serve to modify any previously generated related goals, thus modifying the internal cognitive environment. It considers how such modifications must induce changes in the perception of affordances. In turn these must precipitate changes in the structure of mind. Consequently, within the TC/CT approach, the perception of affordances, and ultimately the processes underlying cognitive formation, are seen to be dependent on that which determines goal generation and attentiveness through the perception of satisfaction conditions, namely relevance. 2 CONCLUDING REMARKS We have tried to illustrate what we think is novel about the TC/CT approach to the analysis and design of tools, particularly information tools, and the fabricated environment. Unlike other approaches (HCI, ergonomics, cognitive engineering, etc.) in which tool development and environmental fabrication are driven primarily with reference to functionality within the artifacts they produce, the TC/CT approach is foremost concerned with understanding how human cognitive processes interact and are partially formed by interactions with such artifacts. It is particularly concerned with how tools can be constructed which will best serve to amplify the cognitive capabilities of humans and enhance human to human communications. It is interesting to note that throughout our analysis two related streams of interest have emerged. One has to do with examining generative process outputs and is product oriented, the other looks at the nature of these generative processes themselves. The former emphasizes the need to understand functionality within the fabricated environment. The latter emphasizes an understanding of the processes by which that environment comes into being. Favouring development in one stream to the detriment of the other leads to an imbalance in our understanding of the relationship between humans and the environment. We have proposed the development of epistemic technologies, framed by relevance analysis, as a way to integrate the two. We have made a distinction between the terms Technological Cognition (TC) and Cognitive Technology (CT) which reflects the dichotomy between product and process. However, to generate an effective epistemic technology, each must be studied with reference to the other. The ensuing need to understand the generative processes 2 The connection between Gibson's affordances and relevance has also been noticed by Mey and Gorayska (1994), but they do not discuss the generative relation between the two, nor do they consider the mediating role of attention in this process.
Epistemic Technology
associated with Epistemic Technology led to a discussion of environmental affordances. It yielded the same dichotomy. On the one hand we noted a functionality driven approach to the analysis of affordances commonly leading to an emphasis on system usability and product orientation. On the other hand we identified the need to address the generative processes by which affordances are determined. Epistemic Technology reconciles the two by focussing scrutiny on the underlying factors which cause a perceiving agent to pay attention to one set of environmental invariants over another. In considering the nature of attention we began to discuss relevance analysis as a possible framework for enquiry. As we tried to illustrate how the relevance metafunction could be used as a basis for building a method of analysis, the influence of the same dichotomy became evident once again. Relevance analysis provides a credible way in which to approach the mapping of effective action plan sequences for the purpose of satisfying existing goals. However it also points to an explanation of the ways in which new goals and action plan elements, at various levels of cognitive functioning, can be generated from any combination of raw percepts and previously acquired concepts. Epistemic technology derives value from both these aspects of relevance analysis through a cyclical process whereby the outputs of one are continuously conditioning the inputs of the other in a recursive and self-regulating fashion. We believe that the degree to which the generative aspects of this cyclical process influence our understanding of human interactions with the fabricated environment has in the past been largely unaddressed. We further believe that a deeper examination of these aspects is needed before Epistemic Technology can provide us with the means by which to effectively control the ways in which we are affected by the products of our own ingenuity. REFERENCES
Alben, Laurelee, Jim Faris, and Harry Sadler, 1994. Making It Macintosh: Designing the message when the message is design. Interactions 1(1): 11-20. Bailey, Robert W., 1989. Human Performance Engineering. Englewood Clifffs, N.J.: Prentice Hall. Bruce,Vicky and Patrick Green, 1990. Visual Perception: Physiology, Psychology, and Ecology. Hillsdale, NJ: Erlbaum. Curtis, Bill and Bill Hefley, 1994. A WIMP No More: The Maturing of User Interface Engineering. Interactions 1(1): 22-34. Eberts, Ray E., 1994. User Interface Design. Englewood Clifffs, N.J.: Prentice Hall. Ellul, Jacques, 1965. Propaganda: the Formation of Men's Attitudes. New York: Knopf. Gaver, William W., 1991. Technology Affordances. Human Factors in Computing Systems, Conference Proceedings CHI'91, 79-84. New York: ACM. Gibson, James J., 1979. An Ecological Approach to Visual Perception. Boston: Houghton Mifflin. Gorayska, Barbara and Roger O. Lindsay, 1989. On Relevance: Goal Dependent Expression & the Control of action planning Processes. Research Report 16. School of Computing and Mathematical Sciences, Oxford-Brookes University, UK.
B. Gorayska and J. Marsh
Gorayska, Barbara and Roger O. Lindsay, 1993. The Roots of Relevance. Journal of Pragmatics 19(4): 301-323. Gorayska, Barbara and Roger O. Lindsay, 1995. Not a reply - more like an echo. Journal of Pragmatics 23(6). Forthcoming. Gorayska, Barbara, Roger O. Lindsay, Kevin Cox, Jonathon Marsh, and Ning Tse, 1992. Relevance-Derived Metafunction: How to interface the intelligent system's subcomponents. Proceedings of the Third Annual Conference of AL, Simulation and Planning in High Autonomy Systems, Perth, Australia, July 8-10, 64-71. IEEE Computer Society Press. Gorayska, Barbara and Ning Tse, in preparation. A Goal Satisfaction Heuristic in the Relevance-Based Architecture for General Problem Solving. Gorayska, Barbara. and Jacob L. Mey, 1995. Cognitive Technology. In: Karamjit S. Gill, ed., New Visions of the Post-Industrial Society: the paradox of technological and human paradigms. Proceedings of the International Conference on New Visions of Post-Industrial Society, 9 - 10 July 1994. Brighton: SEAKE Centre. Gould, John D., and Clayton Lewis, 1985. Designing for Usability: Key Principles and What Designers Think. Communications of the ACM 28:300-311. Harnad, Stevan, 1990. The Symbol Grounding Problem. Physica D 42: 335-346. Koffka, Kurt, 1935. Principles of Gestalt Psychology. New York: Harcourt Brace K6hler, Wolfgang, 1947. Gestalt Psychology: An introduction to new concepts in modern psychology. New York: Liverwright Publishing Corporation. Kyng, Morten, 1994. Scandinavian Design: Users in Product Development. Celebrating Interdependence, Conference Proceedings CHI'94, 3-10. Boston: ACM. Laurel, Brenda, 1991. Computers as Theater. Reading, Mass.: Addison-Wesley. Lindsay, Roger O. and Barbara Gorayska, 1994. Towards a General Theory of Cognition. Unpublished MS. Lindsay, Roger O. and Barbara Gorayska, 1995. On putting necessity in its place. Journal of Pragmatics 23: 343-346. Lohse, Gerald L, Kevin Biolsi, Neff Walker, and Henry H. Reuter, 1994. A Classification of Visual Representations. Communications of the ACM 37(12): 3649. Marr, David and Tomasso Poggio, 1976. Cooperative Computation of Stereo Disparity. Science 194: 283-287. Mey, Jacob L., 1995. On Gorayska and Lindsay's Definition of Relevance. Journal of Pragmatics 23: 341-342. Mey, Jacob L. and Barbara Gorayska, 1994. Integration in computing: An ecological approach. Systems Integration '94, 594-599. (Proceedings IIId International Conference on Systems Integration, Sao Paulo, August 15-19, 1994). Los Alamitos, Calif.: IEEE Computer Society Press. Nass, Clifford, Jonathan Steuer, and Ellen R. Tauber, 1994. Computers are Social Actors. Celebrating Interdependence, Conference Proceedings CHI'94, 72-78. Boston: ACM. Nicolle, Steve, 1995. In defence of relevance theory: A belated reply to Gorayska & Lindsay, and Jucker. Journal of Pragmatics 23(6). Forthcoming. Norman, Donald A., 1986. Cognitive Engineering. In: D. A. Norman and S. W. Draper, eds., User Centered System Design, 31-61. Hillsdale, N.J.: Erlbaum.
Epistemic Technology
Norman, Donald. A., 1988. The Psychology of Everyday Things. New York: Basic Books. Norman, Donald. A. and Stephen W. Draper, eds., 1986. User Centered System Design. Hillsdale, N.J.: Earlbaum. Newell, Allan and Herbert Simon, 1972. Human Problem Solving. Englewood Cliffs, N.J.: Prentice Hall. Rasmussen, Jens, 1988. Information Processing and Human-Machine Interaction: An approach to Cognitive Engineering. New York: North Holland. Sperber, Dan and Deirdre Wilson, 1986. Relevance: Communication and Cognition. Oxford: Blackwell. Strong, Gary W., 1995. New Directions in HCI Education, Research and Practice. Interactions 11(1): 69-81. Takeuchi, Akikazu and Katashi Nagao, 1993. Communicative facial displays as a new conversational modality. Proceedings of INTERCHI'93, 187-193. Conference on Human Factors in Computer Systems, Amsterdam 24-29 April 1993. Van Cott, Harold P. and Beverly M. Huey, eds., 1992. Human factors specialists' education and utilization: Results of a survey. Washington, DC: National Academy Press. Walker, Janet H., Lee Sproull and R. Subramani, 1994. Using a Human Face in an Interface. Celebrating Interdependence, Conference Proceedings of CHI'94, 85-91. Boston: ACM. Warren, William H., 1984. Perceiving Affordances: Visual Guidance of Stair Climbing. Journal of Experimental Psychology: Human Perception and Performance 12: 259266. Weizenbaum, Joseph, 1976. Computer power and human reason: from judgment to calculation. San Francisco: Freeman. Zhang, Xio Heng, 1993. A goal-based relevance model and its application to intelligent systems. Ph.D. Thesis. School of Computing and Mathematical Sciences, OxfordBrookes University, UK.
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 2 IMAGINIZATION AS AN A P P R O A C H TO INTERACTIVE MULTIMEDIA Ole Fogh Kirkeby & Lone Malmborg Institute of Computer and Systems Sciences Copenhagen Business School, Denmark ofk/dasy@cbs, dk; dslone@cbs, dk
ABSTRACT Recently, it has become an important issue in human computer interaction how to conceptualize humans' spontaneous interaction with the multimedia interface, and how we can design this interface so as to satisfy the demands of communicative competence in the dialogue. Using phenomenological philosophy, Kirkeby and Malmborg give an interpretation of how metaphors are created and used in common language, and extend this interpretation to comprise also our cooperation with the computer. In its ideal realization, such a spontaneous creation of metaphors is called imaginization. The authors also show how to categorize the new media in relation to the dimensions of closeness and situatedness. le
Currently, interactive multimedia (IMM) are generally conceived of as a kind of computer based technology, characterized by the cooperation between discursive text, commands (including command-driven vehicles), icons, and images (static and moving). In contrast, virtual reality (VR) techniques are regarded both as a supplement and as a substitute for multimedia. It seems appropriate to try and give a more profound diagnosis and definition of these concepts. IMM, as a technology, mediates between the acting consciousness and the world. By referring to an outer world this medium presupposes a cognitive distance, a distance which implies a reflexive consciousness of the representative functions of the media. IMM conveys a central cognitive significance to the image as intermediary to, and structuring, information. Virtual reality (VR), on the other hand, is a technology whose purpose it is to substitute for the experience of reality characterized by its interactive, meaningful, senses-based relation to this very reality. We can state the following conditions for VR: 9 Images as such do not exist in successful VR;
O.F. Kirkeby and L. Malmborg 9 In successful VR, one cannot presuppose theoretical, reflexive consciousness, but only practical reflexivity: the actor is immersed into the reality in which s/he exists; 1 9 Considered as an epistemological ideal, the VR-world should not presuppose a reference to 'another reality'. As 'reality', VR should at most be a parallel (in the sense of a 'possible world') to 'real reality'; 9 Only such consciousness of the body, and such practical consciousness as realized through the media, is able to exist in VR. We are the reality that we experience in VR. Similarly, any theoretical consciousness that we might develop in VR is also bound to this reality, and ideally ought not to interfere, or be 'confronted' with requirements of adjusting itself to knowledge grounded in our familiar 'real world' reality.
One may conceptualize the relationship between VR and IMM and their possible combinations in the following ways: 9 IMM as embedded in VR: VR has priority over IMM inasmuch as it is a substituting reality, whereas IMM are still media, forced to duplicate reality. Here, IMM is itself a VR-function and must refer to the VR-world. 9 VR is coordinated with IMM. Here, VR does not function as virtual reality, even though, from a cognitive theoretical as well as from a pedagogical point of view, the two are difficult to compare. 9 VR is embedded in IMM. Here, we will find VR functioning, among other things, as a research space for identifying and handling files. As we have mentioned above, IMM is located half way between symbolic media and discursive media, because even moving images, video-sequences, and the like have an inherently symbolic character: they must exemplify reality to a far higher degree than merely referring to it. For our purpose, however, the crucial point is not whether there exists a reference to, or an exemplification of, a reality. On the contrary: we are concerned with the possibility of creating a readiness in the user that enables him/her to act in this reality, which means the handling of such knowledge as is capable of creating this readiness. This process of handling knowledge we shall call imaginization.
Imaginization is not restricted to a single form of representation; it is thus neither bound to a single sense, neither to seeing (images and text) nor to heating. Imaginization implies the total bodily-mental reality and is thus an embodiment: one could say it is 'incorporated'. The history of the concept of imaginization throughout phenomenology has been a rather checkered one; it has only acquired a certain unity starting from its development in the oeuvres of Martin Heidegger and Maurice Merleau-Ponty, where it expresses the 'human condition' of language being a sensus communis, a 'sixth sense', allowing the five 'regular senses' to combine in relation to some theme (or noema), or to some semantically unambiguous notion. This would mean, in a computer context, that incorporation first and foremost should be negotiated in accordance with the criterion of closeness.
1 The concept of practical reflexivity is developed in Kirkeby (1994a).
Imaginization and Interactive Multimedia
With respect to closeness, we have to consider speech as the primary, active ability to which the sense of hearing, as well as the senses activated in the gestural space, belong. Furthermore, it involves seeing, the ability which, in its dialectical relation with speech, creates our image of the other person. Just as does speech, seeing, too, has its own 'style', its own different cognitive process (Merleau-Ponty, 1964), a fact that manifests itself in our construction of the image of the other person, as well as in the image we create of the situation structuring the communication, not only between the I and the other person, but also between person and machine. Moreover, if we inject the concept of incorporation into that of 'situatedness', this latter concept will come to denote the historically given horizon of meaning. We will come to realize that situatedness is the condition of the realization of meaning, while being itself conditioned by, communication as the 'suprema differentia specifica' of man (Kirkeby, 1994a). In other words, speech and seeing have a characteristic, common 'style' inasmuch as they rise from the expectations in our cognitive abilities and in our senses which, as a prefigured readiness toward meaning, hide in our bodies in the form of habits and socially governed concepts. These expectations are precisely the 'tacit conditions' of which Wittgenstein speaks. 2 The concept of incorporation, when brought to bear upon situatedness, thus implies the existence of a readiness towards meaning at a level that is not reducible to a mental and a bodily system; a level one might term 'the inter-world' between the systems. This readiness is not least realized through metaphor, a fact that is naturally of general importance for any reflection upon the relation between humans and machines (computers); it is of special importance for any interpretation we may attach to the concept of interface in IMM. It should be mentioned that the combination of situatedness and incorporation alters the classical concept of 'intentionality' as developed by Husserl, heavily reducing the possible autonomy and cognitive range of the reflexive cogito. In the following, we will interpret situatedness through the concept of interactivity. This concept is a regulative idea expressing the possibility of imaginization, if the maximum of incorporatedness and situatedness are realized in relation to the system. In this perspective, imaginization appears as something which is very different from mere 'knowledge representation': it becomes the activity of creating symbols- or, as Merleau-Ponty has put it: it is language letting you catch a meaning which did not exist earlier (Merleau-Ponty, 1964). By the same token, imaginization is different from learning by example, because the prototypical relation between user and the IMM is one of interactivity: imaginization implies the combination of learning by example and learning by doing. Thus the process of imaginization refers back to the dynamic process in which new metaphors are created, it refers to the user's practic-reflexive activity, and hence to a sense of possibility, which dissolves any conventional meaning inherent in examples. One might even say that such a process has the character of a positive self-reference, i.e. a process where both knowledge and the self referred to change during the interaction.
2 "Dann ruht der Vorgang unseres Sprachspiels immer auf einer stillschweigenden Voraussetzung." (Wittgenstein, 1989)
O.F. Kirkeby and L. Malmborg
Imaginization, as a special kind of user practice, first of all appears in and through one particular activity: the ability to create symbols. Symbols express what I know, what I am able to do, and what I want to be or have; the symbol points both to the case, to my cognitive relation with it, to my intentionality, and to the very context in which case and intentionality both acquire their meaning. We see how already at this stage, a purely epistemological attitude towards metaphors becomes problematic: we cannot strategically reduce metaphors to an ontological level that stretches beyond the historically given media. Here, we meet a further difficulty: other (e.g., discursive) media, too, are able to refer symbolically; the language of poetry embodies the essence of this practice. Thus the symbolic element in the media need not be bound to the image-form itself, to the image-sign: images may exemplify the symbolic element, but the discursive statements refer to it. It is a fact that images only in a very broad sense are able to refer discursively - as in the case of a comic book without a text. The crucial opposition, then, is not between discursive vs. image-bound reference to the symbolic dimension; rather it is between two ways of using language: one in which one refers symbolically, and one in which one does not. Carried to its extreme, the latter opposition implies a difference in linguistic practice: we either refer by using images (or symbols), or we do not. The opposition is thus between 'image-language' and a language without images. A general problem regarding IMM as a cognitive environment and as a technology is the question whether it is at all possible to combine discursive language with imagelanguage. Will the image-language not destroy the potential symbol-creating power in the discursive language? Here, we may have overlooked another distinction: Actually, there are two different ways of using and creating symbols: One that could be called 'overdetermined'; it will be discussed below, section 2. And one that could be called spontaneous, which is the main ingredient in the ability we have called imaginization, and which, while subject to the limitations imposed by each individual case, as well as by the general constraints inherent in intentionality and context, still is able to transcend its own limitations. The spontaneous creation of symbols is at the core of situated and incorporated cognition, whenever the maximum demands on interactivity are fulfilled. As an activity characterizing a particular relation between user and system, interactivity can only be conceived of as originating in human interaction and characterizing a particular relation between two people. This relation distinguishes itself by the fact that the one person cannot be a means to the other; neither can any kind of authority be legitimized (Hegel, 1807). In modern philosophy of language, this view is codified through the concepts 'illocutionary' versus 'perlocutionary' in relation to speech acts (Austin, 1962, Searle, 1974; Habermas, 1981). Ideally, illocutionary speech acts-pace Habermas- should be the dominating ones. Illocutionary speech acts distinguish themselves by expressing a particular intention in its 'raw', unconcealed form (such as a promise, a statement, an emotional manifestation). But illocutionary speech acts are only a necessary, not a sufficient condition of interactivity. Interactivity only happens when, and in the way that, the other person reacts to the illocutionary act.
Imaginization and Interactive Multimedia
On the one hand, this reaction (the 'answer') should express the fact that the other person has understood the speech act's formal character. On the other hand, and in addition to this, the answering person should be able to relate to a number of speaker's properties: his personality, his basis of experience, his level of knowledge; to the probability and truth of the subject addressed by the speaker; to his sincerity; and to the way in which the possible content of his speech is created, altered, or annihilated by the situation. 3 Optimal interactivity thus consists in a maximally reflexive openness of mind towards all these facts, towards the style, the truth, and the person (Kirkeby, 1994b). Imaginization expresses an ideal, prototypical horizon, on the background of which the possibility of a creative use of multimedia must be seen.
What characterizes imaginization, or spontaneous symbol-creation, is its relation both to our habit of exemplifying through images (as discussed in the previous section) and to our practices of representation in discursive languages. Crucial for the latter is what we have called its 'overdetermined' relation to symbols characterizing a particular type of both the image-media and the discursive language. As such, however, this relation cannot be considered to be inherent in any individual media. In the image-media, such an 'overdetermined' application and creation of symbols are characterized by icons, i.e., static and dynamic images exemplifying by means of a conventional cognitive frame, a frame that is most often not consciously acknowledged. Examples may be found in mental models that classify our faculties into cognitive, conative, and emotive, and which naturalize this classification through illustrative images taken from the world of science (such as the white coat and, in earlier times, the slide ruler); from the world of politics (such as the stony face, symbolizing ruthless power); or from the private world of intimacy (symbolized by the caring mother). Alternatively, one could think of our communicative structures, such as they are illustrated by the common metaphor of sending and receiving through some channel. Or of the use of metaphors in information processing, which rely on the imagery and symbolism of manufacturing industries. This overdetermination, however, does not characterize the image-media alone; it is also an inherent quality of discursive language. Frequently, unconscious reference to types of metaphorical scenarios, similar to the ones discussed above, is made. In discursive language, such semantic qualities typically cannot, most of the time, be related to a unique existential domain, whether senso-motoric, sexual, or that of family life and work. In this connection, it behooves us to recall what has been stated by the German phenomenologist Hans Lipps as the primordial condition of symbol understanding, viz., that there exists no "original meaning, but only the origin of a meaning". According to
3 Habermas' formal pragmatics cannot cope with the fact that the situation is the final mechanism creating meaning by canceling it. 'We do not know what we mean until we have heard ourselves saying it'.
O.F. Kirkeby and L. Malmborg
Lipps, all names ought to be considered as being "the result of a baptism in circumstances of urgency ('Nottaufe')" (Lipps, 1958: 73). Thus, spontaneous creation of meaning is hard to spot and resists critical analysis: the reason is that overdetermination blocks our v i e w - due, among other things, to the fact that symbols are integrated into historic-social settings and thus themselves (have) become active ways of reproducing a given social reality. As examples, consider the fact that only certain types of work or family life can provide a framework for cognitively viable metaphors. For this reason, bodily metaphors, as epistemological means of attacking the very problem of metaphors, their semantics and pragmatics, should be handled with the utmost c a r e - a reservation which we will come back to in our critique of Lakoff' s and Johnson's theories in the next section. But what, if in spite of all this, one wants to stick to the idea of a spontaneous use and creation of symbols? In that case, the question must be asked: where and how could such a creative use be practiced? The answer is that this happens first and foremost in dialogue; but not just any old dialogue. This dialogue has to be of a very special kind, characterized by a maximum of interactivity. This means that for the interactive agents, the dialogue is characterized by the actual possibility of referring to the interlocutor's style, his conception of truth, and the quality of his person. This rules out any (explicit or implicit) acceptance of 'the compulsion of the best argument' - as held by formal pragmatics (Habermas, 1981). On the contrary: arguments must be understood as embedded in both rhetorical and poetical reality, the pragmatic dimension of which implies that no agent is ever fully informed about the content of his or her own arguments unless, and until, they are uttered. No argument, therefore, can be taken to be the carrier of a unique, abstract rationality; on the contrary, all rationality is 'bounded' by its context, and dependent on the constraints of the utterance (in time and space) as to what can be 'expressed'; hence, an argument might 'win out' precisely because of qualities transcending the rational. These are the qualities which traditionally are treated in rhetoric or poetics, especially with respect to arguments that can be validated (or: whose relation to truth can be established) only at a later time; still, these qualities may actually carry the day in virtue of their ability to influence the opponent. Another way of saying this is that spontaneous symbol-creation is nothing but the insistence on the 'non-identity' of concept and reality- as T.W. Adorno used to put it (Adorno, 1966). Similarly, in the words of Ernst Bloch, it could be called the promise of an as yet unrealized, but possible Utopian and primordial experience (Bloch, 1969). 4 The pragmatic aspect of the linguistic proposition dominates dialogue, and thus dominates the reference to the restrictions enforced on to its usage through incorporation and situatedness. Propositions become provisional. They become projections of worlds ('ways of world-making' as Nelson Goodman put it); and
"Und auch die Symbolik, die zum Unterschied yon der mehrdeutigen Allegorie v6Uig einsinnig eine Realverhiilltheit des Gegenstandes bezeichnet, ist eben in der dialektischen Offenheit der Dinge angesiedelt; den an diesen Bedeutungsr~indern lebt das Fundierende jeder Realsymbolik: die Latenz. Und die Einheit fOr Allegorien wie Symbole ist dieses, dass in ihnen objektiv-utopische Archetypen arbeiten, als die wahren Realziffern in beiden". (Bloch, 1969:1592.) 4
Imaginization and Interactive Multimedia
attempts at meaning-catching, in an endeavor to make them carry meanings that at most exist sporadically. In a context like this, the validity of a metaphorical proposition is to be determined in accordance with what Aristotle has to say about the rhetorical and the poetical. Aristotelian thought is unique in that it considers rhetoric as an interdisciplinary matter, touching on dialectics at the one end, and on ethics and politics at the other, with poetics taking the 'lead', though, at least from an aesthetic point of view. Exactly this insistence on the tension between rhetoric and poetics in the classical Aristotelian sense will show itself to be of importance for the development and analysis of IMM, because here the tension is built into the very 'media', i.e. the common ground between dialectics (including logic) and aesthetics. Q
First, let us nail shut a popular escape-hatch. Those who do not accept the idea of a formal meta-language will turn to dialogue in the hope of finding a non-symbolic metalanguage. This is because symbols here function provisionally and can always be transcended by the very intensity of the words, through overt reference to context, to style, truth, and person. Similarly, the concepts expressing the body-phenomenological dependency of language (which is Lakoff's and Johnson's point of departure in cognitive science; Lakoff, 1987; Johnson, 1987) have an obvious metaphorical reference. The same goes for Lakoff's concept of 'idealized cognitive model'. This concept has its origins in several sources; here, only those with the most typical metaphorical reference will be mentioned: Minsky's 'frames', Fauconnier's theory of 'mental spaces', and AI-related concepts of'scripts' and 'schemata' (Lakoff, 1987: 68), as developed, e.g., by Schank and Rumelhart. These concepts refer to conventionalized complexes of images which themselves have no further discursive reference; their legitimization stems from the mini-worlds of theater, cinema, and architecture. In other words, they are themselves symbols. Against the background of modern American linguistics, psycholinguistics, and experimental psychology, Lakoff (1987: 127)treats the phenomenological concept of intentionality using the slogan: "Seeing typically involves categorization". It does not seem to bother Lakoff that by 'objectivizing' the very concept of intentionality, he lets in objectivism through the back door. Implicitly accepting Husserl's evidence-criterion of intentionality, he more or less explicitly spurns any reference to 'situatedness' (on which see below) and its concomitant historicity, as they have been stressed by Heidegger, especially through the latter's distinctions within the category of 'being-in' (In-sein): 'Befindlichkeit', 'Geworfenheit', and ' Verfallen' (lit.: 'disposition', 'thrownness', and 'deterioration'), distinctions that capture the quintessence of nonauthentic reasoning and acting. Precisely for this reason, we cannot expect to find any primordiality or authenticity in metaphorical speech (Heidegger, 1927). On the contrary: it is history which makes and breaks metaphors: it makes them into a vehicle of power, as Nietzsche has shown us in the last century, or reduces them to trite commonplaces without other than 'historical' interest. Lakoff develops the cognitive theoretical model that forms the basis of his categorizations in two tempi: One is kinesthetic reference, based on the
O.F. Kirkeby and L. Malmborg
Schopenhauerian concept of 'body-image', in our century so brilliantly developed by Merleau-Ponty (Merleau-Ponty, 1945), and grained by Lakoff onto Eleanor Rosch's so-called 'basic level categories'. The other is a rather erratic insistence on the notion of the mental image, a concept which in the end destroys the basis of his own fundamental paradigm. This critique of Lakoff also applies to his partner, the philosopher Mark Johnson (see M. Johnson, 1987). Johnson says: Our reality is shaped by the patterns of our bodily movement, the contours of our spatial and temporal orientation, and the forms of our interaction with objects. It is never merely a matter of abstract conceptualization and propositional judgments. (Johnson, 1987: xix.) Even though there can be no doubt about the last part of the Johnson quote, it still is the case that our reality is shaped by the historically transferred, linguistically given possibility of concrete bodily experience, and that the 'object' having the greatest significance for our experience in its various form, is the other person. Hence the paramount importance of the historic and social dimensions. In particular, Lakoff's naturalization of the famous Wittgensteinian concept of 'family likeness' restricts the so-called 'metonymic effect' (which links the more or less representative exemplars of the category to the prototypical carrier of the familylikeness) to a crude concept of categorization (whether we call this concept 'anthropologizing' or 'universalistic', makes no difference). By doing this, Lakoff skirts the entire issue of the historical character of meaning. From another point of view, one might say that Lakoff lacks a feeling for the 'unhappy consciousness'; that is, he lacks the fundamental critical distance which would enable him to unveil the body as the area of alienation, of unreality, of lack of originality, and thus unmask our body-image as the product of historical and social forces - as Michel Foucault has made us aware of (Foucault, 1975). Lakoff does not transcend the naturalistic concept of use, as instantiated in the Heideggerian concept of 'das Zeug' (from his book Sein und Zeit (1927): literally, 'the thing in its pure materiality', but carrying all sorts of other connotations such as 'trash, nonsense', and also 'tool, outfit'). Nowhere does Lakoff show himself to be conscious of the influence which the technological-scientific complex exerts on the creation of the modern b o d y a consciousness which has been emphasized in Heidegger's works ever since the forties. 5 Furthermore, Lakoff's idea of incorporating (in the literal sense of the word: placing in a human body, 'em-bodying'), language's power of creating reality (cf. Wittgenstein's earlier mentioned 'tacit conditions' of language use) remains naively naturalistic in that his concept of'embodiment' parallelizes (not to say: simply equates) sensual perception and linguistic (social) experience. Abandoning this simplistic parallelism would require Lakoff to reflect on the fact that in modern society, all experience is a social and historical construction; only by doing that would he be able
5 The theme is first played in Ober den Humanismus from 1946, and then fully orchestrated in Die Technik und die Kehre from 1962. If Lakoff had made himself familiar with (in particular) Ober den Humanismus, he might have discovered that there was such a thing as an anthropological frame of reference (and even used it).
Imaginization and Interactive Multimedia to cross the boundary from his phenomenological realm of thought.
49 sterile
Here are Lakoff' s own words:
Cognitive models are embodied, either directly or indirectly by way of systematic links to embodied concepts. A concept is embodied when its content or other properties are motivated by bodily or social experience. This does not necessarily mean that the concept is predictable from the experience, but rather that it makes sense that it has the content (or other properties) that it has, given the nature of the corresponding experience. Embodiment thus provides a non-arbitrary link between cognition and experience. (Lakoff, 1987: 154) Lakoff is correct in maintaining that incorporation excludes all non-arbitrary relationships between cognition and experience; however, this does not imply - as Lakoff seems to think - that this relation is not also (and indeed necessarily) one that has developed historically. For him, the social dimension is glued onto a bodynaturalistic idea of how concepts are created, whereas the historical dimension is conspicuous by its absence. As if to remove any possible doubts, Lakoff's presentation of his perspective on incorporation explicitly omits any reference to a theory of communication. One is tempted to ask why he does not mention Merleau-Ponty, whose theoretical approach to 'incorporated cognition' in essence was developed long before Lakoff's, and who formulated the necessary constraints that such an approach would have to obey in order to be consistent with a phenomenological perspective on cognition. Perhaps the reason is that on Merleau-Ponty' s view, the concept of communication implies that we can neither allow a kinesthetic level of conceptualization to be subject to ontologizing, nor accept the Husserlian, pre-phenomenological idea of mental images in the form of a pre-linguistic language, even if this language - as in Lakoff's case - is founded on our bodily praxis and not - as in Fodor's work - somewhere in thin air. In a way, Lakoff' s dilemma reproduces the very crux of the cognitive paradigm that he wants to reform. There is, of course, the possibility that the problem is one of different traditions: one has to remember that phenomenology only came to America in the disguise of its wake, constructionism and deconstructionism, now themselves on the wane, as Barbara Johnson has pointed out (Johnson, 1995). In this connection, it may be of significance that Derrida took his central ideas principally from Heidegger's later writings; as to Merleau-Ponty's radical thinking, the case can be made that it probably was overshadowed by the existentialist movement. However this may be, it seems rather obvious that in any critique of Lakoff and Johnson, by far the most difficult problem is how to speak about metaphors in a nonmetaphorical language. Leaving aside the strictly 'meaningless' logico-mathematical languages, we must admit that a non-metaphorical metalanguage covering all dimensions (semantic, syntactic, pragmatic) necessarily has the character of a 'regulative idea', as Kant called it. Here the idea of imaginization, of spontaneous symbol creation in dialogue, may be useful, since it insists both on our being conscious of the necessarily non-reductionist character of any theory of symbols, and on our realizing that non-identity is a normative constraint on any theoretical explanation of the relation between language and reality.
O.F. Kirkeby and L. Malmborg
So far, we have not made any explicit distinction between the different kinds of interactive multimedia systems. Usually, IMM systems are simply defined as collections of different media within a single integrated system. In our conception, IMM are primarily characterized by their focus on the interaction between the user and the computer. The notion of 'imaginization', as defined earlier, captures our readiness to create images: it allows our language to catch a meaning that did not exist previously. Imaginization is a complete mode of expression, an ideal, prototypical horizon for creative application of multimedia systems, as we have noted earlier. The question is now which qualities a multimedia system must have in order to support the users' access to creating their own images. This question can be addressed by describing the manner of interaction between the user and the multimedia system (called here agent I and agent II respectively). We suggest to use imaginization as a means of characterizing multimedia systems by a typology based on their degrees of ability to support incorporation and situatedness. Doing this makes it possible to identify a number of systems, differing significantly as regards user/system interaction; they can then be related to what are loosely called 'multimedia systems'. INCORPORATION A multimedia system is called more or less 'incorporated' in terms of its closeness of interaction, i.e. according to the user's perception of the distance to the reality represented by the system. Perceived closeness is particularly connected to visual experience (as well as to speech, as mentioned earlier). Thus, the spatial dimension is an important determining factor in the visual perception of closeness, whereas the other senses, in this respect, are inferior to seeing (even though they, too, may influence the spatial perception of the user). Piaget's notion of 'intuition' assumes that any original thinking requires an intuitive basis. 6 Imaginization is a means for original thinking. The most important characteristic of an intuitive process is that it is based on sense impressions; that means: it always refers back to an ontogenetically prior constitution of reality through the senses. This 'sensitized' or 'sensualized' reality is contained in our perception as an ever-present readiness towards alternative meanings. Intuitive thinking is ruled by context and by the discourse of the perceived meanings. It is hard to imagine how this could be supported by a computer, for the simple reason that the computer does not possess any devices that makes perception possible. For several reasons, a computer will never A crucial distinction in Piaget's thought is that between pre-conceptual and conceptual thinking. For ages 4 to 12, it has been established that the child can move back and forth between these levels: preconceptuality alternates with conceptuality. We see this illustrated not only by the formation of the linguistic concept through speech; it is also possible to move in the opposite direction, connecting the linguistic concept with the mental image. Here, the image acts as a cognitive tool compared to the word. It is this function, the possible interaction between concept and image, that the multimedia focus on, basing themselves on a more 'primitive' way of perception, and allowing for a 'ready', 'incorporated' way of coping with new or not well-known situations. The notion of incorporation is crucial for an understanding of Piaget's concept of 'readiness' (Piaget, 1923).
Imaginization and Interactive Multimedia
have sense impressions in the proper sense of the word; computer simulation of sensebased perception is impossible. First of all, there is the meta-theoretical knowledge about the situation and its typology, which is a necessary condition for constructing prototypical scenarios of experience; such a knowledge is not within the capacities of the computer. Second, the learning algorithms referring to the individualization processes themselves are not too well known, and hence their simulation on the computer presents unsurmountable difficulties. Thus, multimedia systems owe their cognitive strengths to their close connection with the user in interaction; hence, it is the experience and the perception processes of the user that form the object for the interface. By contrast, virtual reality systems ideally simulate - as mentioned earlier - an exchange of sense impressions with the user. Laurel (1993: 204) writes that "by positing that one may treat a computer-generated world as if it were real, virtual reality contradicts the notion that one needs a special-purpose language to interact with computers." It is only a simulation of true exchange, since any analogue sense expression sent from the human user is converted into digital signals which can be processed by the VR-sottware. The other way around any digital 'sense expression' sent from the VR system is converted into (today still rather primitive) analog signals (i.e. poor graphical resolution). To the degree that multimedia systems are dependent on the symbolic dimension, they have a built-in cognitive and media-based limitation. In virtual reality as a technological-cognitive Utopia, this limitation seems to have been overcome. Complete incorporation seems to have been reached, when there is no longer any need for a specialized way of communication, and expressions and means of perception, as they are intuitively used in human contexts of communication, are sufficient. However, when it comes to 'situatedness' (as expressing the connection of perception to the historical media; see below), it is doubtful whether this ever can be simulated by virtual reality: the medium constitutes already in and by itself, so to say, a violation of the continuous reality of the individual. On the other hand (and mainly on account of the technological development in the multimedia area, as also pointed out by Frank Biocca elsewhere in this volume), one may consider the possibility of treating IMM as a device for support of original thinking. Figure 1 shows the changing applications of incorporation to certain aspects of (multimedia) systems. The degree of incorporation is, as mentioned above, determined by the degree of closeness that we perceive in it - a perception which primarily is dependent on vision. For this reason, the degrees of incorporation should be characterized in terms of interface: does the system have a character-based, nongraphical interface, a graphical, multimedia interface, or a synkinesthetic interface to the user? The interfaces themselves can then described as one-, two-, or threedimensional, respectively. Such a description of the degrees of incorporation takes its point of departure in technology, as applied to the senses through which the interaction between system and user takes place. Crucial to this interaction are sight and speech; the latter taken as the central sense (the sensus communis of the Scholastics) grouping and combining in its functions all the other senses. In the first interface dimension, interaction typically takes place through the activation of sight (the user reading characters), the user's response being given in the form of keyboard commands. In the second dimension of the interface, all of the senses
O.F. Kirkeby and L. Malmborg
may be affected. Since typically, the IMM systems involved here contain sequences of text, pictures, video sequences and sound, the user's experience of reality still remains two-dimensional: the user does not feel that he or she is really interacting with the system, the way we will see it to be the case in the third dimension. In the second dimension, too, the users may respond to the system not only by means of keyboard strokes, but also by using pointing devices such as mouse, digitizer, joystick, sensitive screen, or the like. In this dimension, the users still have a clear feeling of the limits of the system, and of the boundaries between themselves and the system, in the guise of the screen itself, even in cases where the system is capable of realistic simulation (e.g. of depth, as in computer games like DOOM). Finally, in the third dimension, the interaction takes place through affection of, and perception through, all the users' senses; vice versa, the users are able to apply all their senses in responding. In principle, this holds true for the system as well, though in this case the interaction using all the 'senses' is a simulated one, based on the transformation of analog signals to digital representations. The boundaries between user and system are dissolved, thus creating the impression in the users that they are moving into the system's reality, while similarly, the system moves into theirs (e.g. by affecting their sense of balance in what is often called 'simulation sickness'; see Biocca, this volume). SITUATEDNESS As was the case for incorporation, so situatedness, too, is closely tied to the form interaction between the user and the multimedia system takes. And just as when we talked about degrees of incorporation, so situatedness, too, comes in degrees. These degrees are related to a system's flexibility, and to its ability to perceive and 'understand', as well as react to, the user' s patterns of activity and intentions. Earlier, we claimed that interactivity can only be understood if one starts out from its origin in human activity, which is based on 'equal rights' for participants: no participant is superior to any. However, in normal interaction between a human and a computer, the human user is superior. Computers are not, and have never been, expected to have or develop, on the basis of their knowledge of the individual user, any assumptions about that user's intentions. The computer is simply expected to react to in an appropriate way to the user's unambiguous commands. But where does situatedness come in? Simply like this: the more a computer 'knows' about the user, or the better it 'understands' the user's intentions, the greater the system's flexibility, and the more adaptable the interface. Thus, we have a high degree of situatedness in the case of the so-called 'autonomous agents' which, applying AI-related techniques, are basing their behavior on a superior knowledge of the user; here, the user "is engaged in a cooperative process in which human and computer agents both initiate communication, monitor events and perform tasks." (Maes, 1994:31; see also Lindsay's chapter, this volume, for a rather divergent view). Such agent-based systems, of course, invoke some important issues of authority and jurisdiction; they presuppose a relation of trust between the user and the system (as an example of how badly things can go when that mutual trust is absent, we only have to think of supercomputer HAL's rather arrogant dispositions in Stanley Kubrick's classic
lmaginization and Interactive Multimedia
movie '2001: A Space Odyssey'; truly a prophetic vision, some will say). The system's 'sensing' of the user's readiness towards meaning is crucial to its success or failure in supporting situatedness as a means towards imaginization. 'Readiness towards meaning' is used here in the sense which 'intentionality' acquires in modem phenomenology, as opposed to its use in Husserl's early writings (Husserl, 1980). As to situatedness (as has already been mentioned), it should be understood as what Heidegger had in mind, when he, in his 1927 masterpiece, Sein und Zeit ('Being and Time'), defined the concepts of 'Befindlichkeit' (lit. 'disposition'), 'Geworfenheit' (lit. 'thrownness'), and ' Verfallen' (lit. 'deterioration'). Incorporation and situatedness are thus the very qualities of intentionality, and they are augmented through the thematic and reflexive relationship that the intending person has to his or her own existence. A question of crucial importance is whether intentionality can, or should, be defined on the basis of this reflexivity. Since incorporationand situatedness are united in a conceptual reciprocity, where 'intentio' constitutes its 'intentum' by realizing the individual's as well as the collective's tacit conditions, it follows that this cannot be the case. The problem is that the system ought to reinforce the user's consciousness of his or her own intentional basis; this is the core of any logic of autonomy. Here, it is of importance whether the user wants to emphasize situatedness at all, if it entails certain parts of the system being restricted in their autonomous enacting of knowledge. Obviously, there are different types of rationality, and their relations to the individual user deserve to be brought out into the open. Is it, then, possible to speak about different rationales having different ontological status? Can such non-mainstream rationales continue to function in secret, coming out into the open only at a later date, as it is often argued in psychoanalysis and Marx-inspired sociology? From another point of view, the criterion of introspection that we called upon above, along with the very consciousness of this reflexivity, of this turning inward to oneself, are likely to inhibit spontaneity. We might perhaps again refer here to the 'practical reflexivity' (Kirkeby, 1994a), that is able to express a continuous sense of what steps to take next to go where, and which does this, to a higher degree than is the case in mere abstract introspection, in a conceptual emphasis on the steps of the process. Finally, we should be aware that the phenomenology of the gestural space does not always seem to be well suited to support the system's diagnosing of the user's situatedness. Take the case of a system whose functioning is based on eye movement tracking: if I, while working professionally at the computer, keep gazing towards the picture of my lover, the machine might get the idea I'm in love with it! Figure 1 shows the three degrees of situatedness, determined in accordance with the different types of interaction that are possible from both the user's (Agent I) and the IMM's (Agent II) point of view. As to the first degree and its type of interaction, Agent II has no possibility of acting independently: it only responds to the commands of Agent I. However, Agent I's possibilities are restricted as well: this means that the flow of information is unidirectional only, and furthermore that Agent I must learn to use a formal code, and/or is restricted to choosing commands from a limited menu only, in order to be allowed to retrieve information. In the second degree of situatedness, the same restrictions as to its form of interaction are placed on Agent II (the IMM) as those constraining Agent I in the first
O.F. Kirkeby and L. Malmborg
degree. By contrast, Agent I's possibilities of acting are supposedly unrestricted, in the sense that there is free access to information, and that this access is provided in such a way as to suit the needs of the user, as defined by him/herself. One could say that in this case, situatedness is brought about through the interaction of the agents, I and II. In the third degree of situatedness, the type of interaction differs radically from the two previous ones in that Agent II now has acquired autonomy. Ideally, however, this autonomy should happen entirely on Agent I's conditions, as Agent II's behavior ought to reflect its task (supporting Agent I by simulating the latter's behavioral patterns) by reading the behavioral pattern of this Agent, even without Agent I's active cooperation, and accept the fact that there are certain autonomous possibilities of action that unambiguously deserve to be called communicative competence, and that these serve as a pre-condition to creative competence. SOME EXAMPLES How do degrees of incorporation and situatedness manifest themselves in these kinds of systems? Below, we shall give examples of specific applications within all of
Incorporation / closeness - > Character based, non b~raphical interface - I D
Graphical multimedia (audio/video)-
Synkinestetic interface
interface - 2 D
- 3D
Menu- or
Pr':sentation of text in i - D : 'Flag text
Hyper-based one-way interaction (agent I's ations open)
Free choice of text in l - D : Hypertext
interaction (agent i's actions restricte,4)
Presentation of text in: Multimedia
Information 'played 3 - D : Virtual reality:film'
Free choice of information in 2-D: Hypermcdia
Free choice of information in 3 - D : Virtual reality
Hyper-based mutual interaction (agent r s actions open and coordinated with agent ITs open action)
Mutual interaction in l-D: Text agents
Mutual interaction in 2-D:
Mutual interaction in 3-D: Virtual reality-agents
Figure 1. IMM and related technologies categorized by incorporation (horizontal dimension) and situatedness (vertical dimension). Agent I is the user and Agent II is the IMM. the nine categories in order to illustrate this from an interface-technological point of view. The categories are examined from leR to right, beginning from the upper row. We claim that the closer a certain system is to the lower right corner of the figure, the better it will support imaginization. The reason is that imaginization presupposes the highest possible degrees of situatedness and incorporation, combined with a maximum of interactivity. Ten years ago most systems could be allocated to the upper leR corner of our model. All of us have probably tried working with a word processing system where we
lmaginization and Interactive Multimedia
had to remember the meaning of the function keys, and where we had to go through quite a lot of menu layers before we got to where we could do what we wanted. As an example of a multimedia application with restricted possibilities of interaction for the user, consider a menu in which the user is led to a certain piece of information by being given a choice among a variety of options (such as e.g. manifested by icons). An example of a three-dimensional application in multimedia would be that of 'virtual movies' - i.e. movie 'watching' in three-dimensions, where we (primarily by the use of audio-visual effects) obtain a synkinesthetic experience, such as a sensation of falling forward that is so real that we actually fall. There are very few or no possibilities of interaction with this type of systems. In the category of systems that do offer the user a possibility of acting, the best known today are hypertext systems (actually a one-dimensional version of the systems mentioned in the previous paragraph). In hypertext, we are not restricted to a fixed way of'reading' the text, but we are allowed to use the text freely in accordance with our needs and our level of experience - just as we normally go about reading an encyclopedia (see McHoul and Roe, this volume). Rather than reading an encyclopedia from beginning to end, we consult it selectively: our reading is determined by the need to know, and by the wish to have additional more information presented to us as we go. In this way, we let the text adapt itself to us, whereas we in the first case had to adapt ourselves to the text as it was presented to us. (On the question of adaptation, especially 'who adapts to who?', see also the Introduction to this volume by Gorayska & Mey, as well as Mey, 1994). Other hypermedia systems allow us to adapt the information we need (not just in the form of text, but of sound and images as well), as we navigate through the system. When we non-technically speak of interactive multimedia we are olden referring to this type of hypermedia. Well-known examples are computer games like DOOM. 7 VR systems, as they are known today, border on the category possessing a true synkinesthetic interface, one in which all the user's possibilities of action are wide open. That is, the user does not have to adopt a particular way of communication in order to interact with the system, but is able to apply all of the senses 'naturally' during interaction; similarly, the system is able to influence all of the user's senses. The relationship between VR and synkinestheticality is a 'borderline' case, because most VR systems activate only a few of the senses for their operation. From a technological point of view, systems that are based on unrestricted possibilities of action, both for the user and for the system are still in a provisional state, although a limited number of 'primitive' text based systems of this type are well known and well tested. As an example, consider electronic agents such as MAXIM (Rosenschein & Genesereth, 1985) that assist users in sorting their mail on the basis of the latter's registered filing habits. Some hypermedia systems are based on a similar form of interaction: e.g., there are agents that present different choices of entertainment on the basis of their knowledge 7 DOOM simulates - rather convincingly - 3D effects. However, the decisive feature in determining whether or not we are in the presence of a synkinesthetic interface, is not just the experience of bodily motion in a three-dimensional space, nor is it the fact that that we are able to interact with the system through our body movements.
O.F. Kirkeby and L. Malmborg
of the 'taste' of the user in music, theater, literature, etc. An example is RINGO, a system that supports the user in her choice of music (Maes, 1994). The most advanced systems with regard to incorporation and situatedness would be those in which mutual interaction in a three-dimensional space takes place between the user and the system; however, we are not familiar with any examples of such systems. We can imagine, though, a system such as a virtual, intelligent office which would be able to adapt to, and act on behalf of, the user in an ongoing interaction. CONCLUDING REMARKS In our discussion of Multimedia and Virtual Reality in relation to our radically different way of interacting with these, the crucial problem of the cognitive possibilities and consequences of this combination presents itself. On the one hand, we have treated some of the problems inherent in combining discursive text, which refers to its objects, and images, which mostly exemplify. The conclusion here was that images in themselves do not offer any guarantee of a cognitive gain unless they are used as a means to imaginization. Discursive language might here show much more flexibility and possible depth due to its high degree of freedom in relation to the world mediated by our senses than do images. But what about the differences in the way we interact with IMM and VR, respectively? In the beginning of the article we showed some scenarios describing these differences. In relation to these we may conclude: If the VRsystem operates only at a technological level, where IMM is 'embedded' in VR, then we cannot be sure of any cognitive gain. One might here try - very tentatively - to state the following hypothesis. A powerful VR that comes very close to simulating sense experience, or perhaps even a VR still marked by 'artificiality', but rendering vivid, dynamical, expressive and colorful experiences of interaction, might cause traditional images, static or dynamic, with or without sound, to come close to discursive language, or at least to change into some kind of discursivity. That means that visual images as such, by losing their power of fascination, will lose both their imaginative and suggestive power. They will degrade into an unsuccessful version of a referring- in opposition to an exemplifying - media. Unsuccessful, because they not have the power of the spoken language: they are still 'dense', as Goodman used to call it, and they oppose codification. On the other hand, they lack the flexibility of discursive language because they are still bound to sense experience. In a way one might say, such visual images would degrade into some kind of all too complicated, cognitively unwieldy, iconographic language. Or put in another way: who would be willing to watch pictures of wine, when he could drink it? And who would draw pictures of her thoughts, when she could speak them out aloud? Perhaps the ultimate, ideal VR would play the film of civilization back to the place where the culture of literacy has not even begun; before symbolic representation, and hence, the possibility of generalizing over your own practices and over the reappearing patterns of nature which are the conditions of reflexivity. To summarize, we have - by using a phenomenological approach (meaning here: 'continental' phenomenology)- endeavored to cast some light on the phenomena of IMM and VR. The phenomenological approach provides us with some useful ways of conceptualizing interactivity in relation to the IMM interface, and it may give us some ideas as to how metaphors function cognitively. In this way, it may help us in
lmaginization and Interactive Multimedia
determining the true character of imaginization as a regulative notion - a notion that hopefully will clarify the issue of our spontaneous interaction with the IMM interface, and thus may inspire further development of these new media. We also hope that the two criteria we have suggested for categorizing these media, viz., the dimensions of closeness and situatedness, will contribute to establishing some criteria for evaluating the overall relation between humans and machines. REFERENCES
Adorno, Th., 1966. Negative Dialektik. Frankfurt a.M.: Suhrkamp Verlag. Austin, J. L., 1962. How To Do Things with Words. Oxford: Oxford University Press. Bloch, Ernst, 1969. Das Prinzip Hoffnung. Vol. III. Frankfizrt a.M.: Suhrkamp Verlag. Foucault, M., 1975. Surveiller et punir; Naissance de la prison. Paris: Gallimard. Habermas, Jtirgen, 1981. Theorie des kommunikativen Handelns. Frankfurt a.M: Suhrkamp Verlag. Hegel, G. W. F., 1807 (1952). Phanomenologie des Geistes. Hamburg: Felix Meiner Verlag. Heidegger, Martin, 1927 (1967). Sein und Zeit. Ttibingen: Max Niemeyer Verlag. Heidegger, Martin, 1955. Die Technik und die Kehre. Pfullingen: Neske. Husserl, Edmund, 1900 (1980). Logische Untersuchungen. Vol. I-III.Ttibingen: Max Niemeyer Verlag Johnson, Barbara, 1995. The Wake of Deconstructionism. Cambridge: Harvard University Press. Johnson, Mark, 1987. The Body in the Mind. The Bodily Basis of Meaning, Imagination and Reasoning. Chicago: The University of Chicago Press. Kirkeby, Ole Fogh, 1994a. Event and body-mind. A Phenomenological-Hermeneutic Analysis.Aarhus: Modtryk. Kirkeby, Ole Fogh, 1994b. World, word and thought. Philosophy of Language and Phenomenology. Copenhagen: CBS Publishers. Lakoff, George, 1987. Women, Fire and Dangerous Things. What Categories Reveal About the Mind. Chicago and London: The University of Chicago Press. Laurel, Brenda, 1993. Computers as Theater. Reading, Mass.: Addison-Wesley. Lipps, Hans, 1958. Die Verbindlichkeit der Sprache. Frankfurt a.M.: Vittorio Klostermann. Maes, Patti, 1994. Agents that Reduce Work and Information Overload. Communications of the ACM, July 1994/Vol. 37(7): 31-40. Merleau-Ponty, Maurice, 1945. Ph6nom6nologie de la perception. Paris: Gallimard. Merleau-Ponty, Maurice, 1964. Le visible et l'invisible. Paris: Gallimard. Mey, Jacob L., 1994. Adaptability. In: R. Asher & J.M.Y. Simpson, eds., The Encyclopedia of Language and Linguistics, Vol. 1,265-67. Oxford & Amsterdam: Pergamon/Elsevier Science. Piaget, Jean, 1923. Das Erwachen der Intelligenz beim Kinde. Stuttgart: Kohlhammer. Rosenschein, Jay S. and Michael R. Genesereth, 1985. Deals among Rational Agents. Proceedings of the Ninth International Joint Conference on Artificial Intelligence. AAAI Press, Menlo Park, Calif., 91-99. Searle, John R., 1974. Speech Acts. Cambridge, England: Cambridge University Press. Wittgenstein, Ludwig, 1989. Philosophische Untersuchungen. Werkausgabe Bd.1, Frankfurt a.M.: Suhrkamp Verlag.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 3
Communication Technology Group University of North Carolina at Chapel Hill, USA
VIRTUAL REALITY AS A COGNITIVE TECHNOLOGY Can any computer truly enhance the functioning of the human mind? Can steel and silicon be so harmonized with the chemistry of the brain, that one amplifies the other? If human intelligence is partially shaped by the environment, can a highly enriched virtual environment augment human intelligence? At its essence, this is almost the same as asking, "Is there such as thing as a cognitive technology?" The very title of this book - the very history of print itself-suggests that we want to answer, "yes". In this chapter I will take a glance inside the 3D world of virtual reality (VR) designers and observe them impelled by a vision of intelligence augmentation through immersive VR technology. From the very beginning,VR engineers and programmers have conceived of the medium as a cognitive technology, a technology created to facilitate cognitive operations (Brooks, 1977, 1988; Furness, 1988, 1989; Heilig, 1955/1992; Krueger, 1991: xvii; Lanier and Biocca, 1992; Kheingold, 1991; Sutherland, 1968). For a large segment of computer graphic engineers and programmers, virtual reality technology marks a significant milestone in the development of computer interfaces (Foley, Van Dam, Feiner, and Hughes, 1994). Fulfilling a long term goal in the history of media (Biocca, Kim, and Levy, 1995), VR promises to finally create compelling illusions for the senses of vision, hearing, touch, and smell. In the words of a respected VR designer who has helped pioneer systems at NASA and the University of North Carolina, "The electronic expansion of human perception has, as its manifest destiny, to cover the entire human sensorium" (Robinett, 1991 : 19). Like a bright light just out of reach of their data gloves, VR designers stretch their arms to grasp an enticing vision, the image of virtual reality technology as Sutherland's "ultimate display" (Sutherland, 1965), a metamedium that can augment human intelligence. Engineers and programmers attempt a masterful orchestration of electricity, LCDs, hydraulic cylinders, and artificial fibers. With these they hope to so dilate the human senses that waves of information can pour through this high bandwidth channel into the brain. In full union with the user, virtual reality might
F. Biocca
emerge to be a universal "tool for thought". In this vision virtual reality would extend the perceptual and cognitive abilities of the user. The claim that virtual reality may augment human intelligence is based on the increasingly compelling, sensory fidelity of virtual worlds. Computer graphics and kinematics capture more and more of the physical and sensory characteristics of natural environments. Immersive VR simulations increasingly perfect the way the virtual environments respond to user actions: the link of physical movement to sensory feedback increasingly simulates human action in a natural environment (Biocca and Delaney, 1995). The designers' confidence in the cognitive potency of these environments results in part from the very experience of the medium, the deep gut level reaction that designers and users feel when immersed in high-end VR systems. This experience suggests to some that VR has crossed a threshold never reached by older media. More than any other medium, virtual reality gives the user a strong sense of "being there" inside the virtual world. The senses are immersed in an illusion. The mind is swathed in a cocoon of its own creation. The word, "presence", (Sheridan, 1992) has come to mean the perceptual and cognitive sensation of being physically present in a compelling virtual world. In this chapter I would like to consider the design agenda that motivates VR designers' claims that virtual reality is a cognitive technology. More specifically I want to look at the goal of intelligence augmentation that beats in the heart of VR. I will consider the following question:
What are the claims implicit in the idea of intelligence augmentation through the use of VR technology? What are they? How are they conceptualized? Are they valid? In what way? INTELLIGENCE AUGMENTATION INTELLIGENCE (AI)
Looking at the whole human enterprise of computer design, we can pick out three competing visions of the computer. Each goads the efforts of engineers and programmers: 1) the creation of an artificial mind, Artificial Intelligence (AI) 2) the creation of a mind tool, Intelligence Augmentation (IA), 3) the control of nature, machines, and telecommunication, Control and Communication (CandC).
Computer Design Goals
Figure 1
Intelligence Augmentation
Researchers, ideas, and money have flowed through the three streams of research, rushing out through our desert of ignorance towards three points on the horizon. Researchers and ideas have often drifted from stream to stream. Over time, shifts in human energy and interest have made each stream rush ahead. The streams have sometimes flowed into each other, for example, they have sometimes made use of similar developments in computational, display, and storage devices. But there remains a fundamental gap between these streams. They flow through different terrains and overcome different obstacles as they meander forward. The separation between these streams is sometimes slight, but it is always there. Within each stream the currents of thought that power the flow of research are propelled by a different understanding of the relationship between human, artifact, and environment The opposition between artificial intelligence and intelligence augmentation is particularly revealing of the motivation behind the design of VR. VR pioneers like Fred Brooks of the University of North Carolina are fond of saying that when computer science was fixating on AI, the eyes at his lab were all focused on the mirror image, IA. 1 The clever reversal of the letters suggests something more profound. In each, M and IA, there is an inversion of the relationship of humans to machines. Each is building a mind: one is human, the other silicon and electricity. But M and IA emphasize different cognitive operations. Building an artificial mind is a very different goal from artificially amplifying the human mind. The success of one may come at the expense of the other. In Table 1, I have tried to list some of the key points where the goals and understandings of M and IA designers diverge.
Points where Artificial Intelligence (AI) and Intelligence Augmentation (IA) Diverge
Artificial Intelligence
Intelligence Augmentation
AI seeks to create an intelligent other. IA wants to create an intelligence tool. AI wants to internalize artificial IA wants to externalize human consciousness in a machine consciousness in a machine AI focuses on the detached mind. IA focuses on the mind/body in a context. AI emphases abstract decision making. IA emphasizes the thinking through the senses. AI engineers mind through products of the IA engineers mind through the body mind AI simulates cognitive operations IA simulates cognitive environments AI wants to produce an independent IA wants to produce a dependent machine machine Table 1
1Fred Brooks calls it " intelligence amplification". I have called it intelligence augmentation to connect the program of VR research to the longer tradition of interface design traced back to the work of Vannevar Bush and Douglas Englebart.
F. Biocca
Sir Francis Bacon saw in technology a "relief from man's burden". AI tries to produce a silicon slave to perform mental labor; IA tries to produce a mind tool to enhance the same labor. This notion of relief from labor has often been accompanied by a related thought, the idea that relief from drudgery elevates the human mind for higher things. In the early days of computer design when VR, hypertext, and the World Wide Web were but phantasms floating above a hot noisy box of vacuum tubes, Vannevar Bush wrote an early form of the proposal for computer-based augmentation of human intelligence in his classic article, "As we may think" (Bush, 1945). He looked at the emerging mind tool and articulated four key goals: a) relief from the "repetitive processes of thought" (p. 4); b) improved methods for finding, organizing and transmitting information; c) "more direct" means for "absorbing materials...through...the senses" (p. 8); d) improved means for "manipulating ideas" (p. 4). Bush's dream of a computer tool he called "Memex" was to be more than a hypertext engine. It was also designed to be a VR-like device for augmenting intelligence by channeling electrical information through the senses: In the outside world, all forms of intelligence, whether sound or sight, have been reduced to the form of varying currents in an electric circuit in order that they may be transmitted lnside the human frame exactly the same sort of process occurs. Must we always transform to mechanical movements in order to proceed from one electrical phenomenon to another? (Bush, 1945: 8). In the work of later designers Bush's ideas evolved. The machine would not only liberate the mind for higher things, it would augment it. Like a vacuum tube it might amplify the neuronal currents coursing through the brain. With the invention of the m o u s e - a simple 2D input device - the body entered cyberspace (Bardini, in press). In the work of its inventor, Douglas Engelbart, we see the most explicit expression of the goal that VR has inherited, his project for the "augmentation of the human intellect". By "augmenting the human intellect" we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions to problems that before seemed insoluble. Augmenting man's intellect...can include.., extensions of means developed., to help man apply his native sensory, mental, and motor capabilities - we consider the whole system of the human being and his augmentation means as proper fields of search for practical capabilities. ~ngelbart, 1962: 1-2) VR is now a major site where the "search for practical capabilities" attempts to apply our "native sensory, mental, and motor capabilities". Engelbart's project takes
Intelligence Augmentation
place at the cusp of the 1960's, a decade known for the pursuit of human and social transformation including the use of chemical technologies for "mind amplification". These cultural themes of human transformation and perfectibility achieved further expression in the human potential movement of the 1970s and 1980s. By the 1990s human potential enthusiasts like Michael Murphy, co-founder of the Esalen Institute, were cataloging massive lists that purported to show "Evidence of Human Transformative Capacity" (Murphy, 1992). But this movement dwelled on the older technologies of eastern ascetic, religious, and medical practice. This cultural thread very much alive in places like Silicon Valley- would come to rejoin virtual reality technology in the early days of its popularization. The mixture of these themes was welcomed and echoed in such cultural outposts as the magazines Mondo 2000 Wired, The Well, and Cyberpunk culture. It is on the borders of this frontier that VR research rides out towards the forward edges in pursuit of intelligence augmentation. But the earlier notions that the machine would free the mind for "higher" things were sometimes born of a disdain for physical labor. This sentiment was tinged by a Cartesian distrust of the body and the evidence of the senses. But VR's research program embraces the body and the senses with Gibsonian notions (Gibson, 1979) of the integration of the moving body, the senses, and the mind. Its most ardent enthusiasts promise to augment the mind by fully immersing the body into cyberspace. VR promises to take the evolutionary time scale both backwards and forwards by immersing mind and body into a vivid 3D world, from the open savanna to fields of data space. VR promises to take the external storage system, which was born when the first human symbol was stored in sand or clay, and immerse each sensory channel into the semiotic fields of human communication activity. Reflecting the interaction of technology and the body, Jude Milhon, an editor of Mondo 2000 proclaimed, "Our bodies are the last frontier". (Wolf, 1991). Standing on the edge of that frontier, we ask: Will the sensory immersion afforded by V R - this multisensory feedback loop between social mind and its creations- amplify, augment, and adapt the human intellect? Can such a vision guide a research program? How do VR designers conceptualize this outcome they pursue? HOW IS INTELLIGENCE AUGMENTATION CONCEPTUALIZED? TWO PHASES: AMPLIFICATION AND ADAPTATION Ideas about a VR-like machine that can augment intelligence have been advanced primarily by computer scientists and rarely by psychologists (e.g., Brooks, 1977; Bush, 1945; Licklider and Taylor, 1968; Heilig, 1955/1992; Krueger, 1991; Sutherland, 1968). The conceptualization of intelligence augmentation has sometimes been wanting the technology was claimed to somehow assist thinking or augmented human performance. How it will assist thinking is not always specified. The conceptualization has been, for the most part, sketchy- more a design goal than a psychological theory. But the incomplete conceptualization is partially compensated by its concrete operationalization in the actual designs. These designs embody theoretical postulates. These postulates and hypotheses are sometimes made more explicit in studies of the value of simulation and virtual reality technology for cognitive operations. Let's briefly -
F. Biocca
explore what intelligence augmentation might mean for media technology in general and for VR specifically.
! !
Adaptatiol "11 I ~
Figure 2. The interaction of mind, medium, and environment can be seen in two phases: 1) amplification of the mind and body, and 2) adaptation of mind and body. Most technologies, but especially communication media, interact with cognition in one of two ways. Figure 2 illustrates these two phases in the interaction of mind, medium, and environment: (a) amplification, tools that amplify the mind; (b) adaptation, mediated environments that alter the mind. This distinction not only captures two phases in the interaction of humans with technology, it also suggests two types of theoretical claims. When theorists say that a medium like virtual reality amplifies cognitive operations, it is implied that those operations are not fundamentally altered. The mind remains as it was before contact with the technology. When theorists argue that a medium alters mental processes, then a stronger claim is made: the mind has adapted in some way to the medium. Many theorists would argue that cognitive amplification tends to lead to cognitive adaptation. For example, this is what McLuhan meant by the "Narcissus effect" of media: we embrace some aspect of ourselves (our objectified mind) and become fixated and defined by this one facet of ourselves. A set of cognitive operations, a part of us, is selected, favored, and augmented. We are changed through the selective enhancement of cognitive skills.
Amplification Claims that media amplify cognition group into three general types sensorimotor amplification, simulation of cognitive operations, and objectification of semantic structures.
Sensorimotor Extension McLuhan (1964, 1966) popularized the notion that media "extend the senses". McLuhan was unknowingly continuing a long tradition in engineering philosophy that saw technology as organ extension (Mitcham, 1994). This position is now widely accepted. Media are seen as prosthetics - once attached they extend the body or mind. In what way might this augment intelligence? Human intelligence is provided with more sensory data and experience when the senses are extended over space (e.g., telephone, remote sensing), over time (e.g., photography), and beyond the bounds of normal sensation (e.g., infrared goggles). Before the arrival of advanced VR
Intelligence Augmentation
telepresence systems, media extended only the visual and aural senses, for example, the way a remote control television extends our vision and hearing into another room. VR expands the possibility of sensorimotor extension. More senses are addressed with illusions of greater fidelity. But VR also integrates the actions of the body and the senses in a more "natural" way when it extends them. Many older technologies extend motor capabilities but provide poor feedback. For example, a back hoe extends the scooping action of the arm and hand, but provides little more than visual feedback. VR telepresence systems may improve both human performance and amplify human intelligence by closing gaps in the feedback loop between action and sensation. The user can explore distant real environments or purely virtual environments with more of the body.
Simulation of Cognitive Operations To the degree that many technologies are extensions of the body, they simulate physical and mental processes. Mental processes require mental labor. If the labor is transferred to some electromechanical entity, then more brain capacity may be available for pattern perception, decision making, and creativity. This proposition has been the driving force behind the design of the computer since at the least the days of Babbage - i f mathematical processes can be simulated by gears, tubes, or silicon, these mental operations could be amplified in speed and complexity. In this way human intelligence might be freed and amplified. At the moment, designers clearly do not yet know how to best represent and simulate mental operations. It is one thing to conceptualize mental models (e.g., Johnson-Laird, 1984), it is another to build a tool that amplifies them. It is not yet clear how best to use the unique capabilities of VR technology to teach, assist, or augment cognitive skills. It is not clear how much of the existing research about media and development of cognitive skills applies (e.g., Salomon, 1979; Wetzel, Radtke, and Sterm, 1994). At the moment designers are merely importing techniques that have been used to instruct individuals using pictures, film, and animation. The unique representational capabilities- the "language" of the medium- are only beginning to be explored (e.g., Meyer, 1995).
Objectification of Semantic Structures Intelligence can be augmented by the objectification of a mental structure in some material form. The use of external memory storage systems is an evolutionary development that helped in the evolution of the mind (Donald, 1993). The objectification of semantic structures is the very essence of all semiotic systems (Eco, 1976): media and the codes they use allow users to record, store, exchange, and manipulate ideas. Various forms of computer technology are replacing older interfaces and storage media like the notepad, the drafting board, and the physical model. The objectification of semantic structures in a code or message reduces attentional and memory load while augmenting the performance of creative and decision making processes. Most computer systems allow users to easily manipulate thought objects by manipulating symbolic objects. The most common is the objectification of a semantic network in some medium: outlines, diagrams, lists, etc. During decision making, concepts can be scanned. They can be made contiguous or linked in some way:
F. Biocca
hierarchical modeling, causal modeling, etc. There is evidence that the spatialization of thought, the objectification of symbolic tokens in a spatial structure, appears to augment human intellectual performance. The work on data visualization is based on the notion that human performance can be enhanced if abstract information is spatialized. It is proposed that human intelligence can detect patterns in abstract relations by using the ability of the senses to detect patterns (invariances) in the visual field. VR designs promise to extend this to all of the senses.
Adaptation Intelligence amplification involves the augmentation of human intellect without any significant change in intelligence, i.e., changes in cognitive processes or structures. A crane or back hoe may amplify the power of the human arm, but it does not alter the arm in any way. The concept of adaptation suggests that the amplification of human intelligence through a medium may alter cognitive processes and structures. The mind adapts in function or structure to the medium. When humans and technology come in contact, we can observe both short and long term human adaptation. Broadly speaking, adaptations following the use of a technology can be psychological, behavioral, or physiological. Look down towards the floor and take a look at a simple technology like the shoe. Many of us don't think of the shoe as a technology, but it is an old technology we take for granted. Mentally compare your foot to that of a shoeless Kalahari desert Bushman. OK. Think about the shape of that foot. Any urban dweller can observe that long term use of the shoe may create a structural adaptation in the shape of the human foot (e.g., the toes curl inward and push against each other) and texture of the sole (e.g., a less callused and softer sole). This is a simple, easily observable physiological adaptation of the morphology of the body brought on by the extended use of a technology. Now let's consider the idea of cognitive adaptation to VR systems. Adaptation of cognitive processes might emerge from either long term or short term use of a medium. Because VR is a new technology, most of our experience is with short term adaptations. But the issue of adaptation is already a central problem in VR design. For example, some users experience simulation sickness (Biocca, 1992) when using VR systems. Simulation sickness appears to be related to motion sickness. To some degree, simulation sickness is caused by the inability of the brain to reconcile and adapt to discordant spatial cues impinging on the senses immersed in the VR systems (i.e., vision) and cues from the physical environment (e.g., proprioception). The body's response to this intersensory conflict is simulation sickness. VR systems are imperfect. Designers assume that the user's perceptual and proprioceptive systems will adapt to the medium. A study of adaptation to an augmented reality system, showed that the perceptual-motor system does rapidly adapt to the sensory alterations of a VR system (Biocca and Rolland, 1995). Subjects' handeye coordination was significantly adapted as a result of a virtual displacement in felt eye position. Once users removed the VR equipment, their hand-eye coordination remained adapted to the VR environment. They made significant pointing and reaching errors. They had to learn to readapt to the natural environment. Note that none of this evidence of adaptation shows any augmentation in human cognitive performance. These adaptations or failures to adapt are all decrements in human performance. This is not to say that VR will not lead to adaptations that augment cognitive processes and
Intelligence Augmentation
structures. For example, long term use of VR may augment spatial cognition. But there is little evidence of this yet, though we can observe improvements in human performance. The interesting questions as to whether long term use of the medium can augment human performance through adaptation remains unanswered. KEY DESIGN HYPOTHESES LINKED TO THE GOAL OF INTELLIGENCE AUGMENTATION The design of VR is motivated by a set of design postulates and hypotheses that are psychological in nature. A VR designer at Autodesk and the University of Washington's Human-Interface Technology Lab (HITL), William Bricken, captured the essence of VR design when he pithily pronounced: "Psychology is the physics of virtual reality" (quoted in Woolley, 1992: 21). Virtual worlds are constructs of the senses. The psychological reality of VR is what matters in the final analysis. Therefore, many design principles are based on implicit or explicit psychological postulates and hypotheses. Many of these pertain to the design goal of intelligence augmentation. I would like to briefly discuss the key ones that appear to drive the design of VR. They are often advanced as postulates, but I will treat them as hypotheses. Each suggests references to a number of psychological theories. I will not refer to these here, but rather present each hypothesis as it is used by VR designers. The Bandwidth Hypothesis: VR can increase the volume of information absorbed by a human being. If media are information highways, than designers see VR as a potential superhighway to the mind. The goal is the feeling of presence (Sheridan, 1992). The senses are the delivery vehicle. VR designers try to deliver enough veridical information to the senses so that a coherent, stable, and compelling reality emerges inside the mind of the user. As Warren Robinett, master VR designer at NASA and the University of North Carolina, said of his goal, "I want to use computers to expand human perception" (Rheingold, 1991: 25). On the engineering side this manifests itself as four design goals: 1) increase the number of sensory channels addressed by VR; 2) increase the sensory fidelity and vividness within each sensory channel; 3) increase the number of motor and physiological input channels; 4) link and coordinate the motor outflows (i.e., walking, head turning) to sensory inflows (i.e., visual flow) so that they match or even exceed those found in the natural environment. In simulator systems (e.g., driving and flight simulators) the bandwidth hypothesis is straightforward. The goal is "fidelity". The design attempts to precisely match all the relevant sensory characteristics of the real world, task environment, "(1) the physical characteristics, for example, visual, spatial, kinesthetic, etc.; and (2) the functional characteristics, for example, the informational, and stimulus and response options of the training situation" (Hays and Singer, 1989: 3). The user learns a set of perceptual discrimination and motor tasks by doing them. In an imperfect system, when absolute fidelity is not possible, the problem becomes determining what are the most "relevant", task-related cues.
F. Biocca
But the argument for increasing sensory bandwidth goes beyond the goal of replicating natural environments. One also finds an implicit or explicit argument that suggests the greater the number of sensory channels and the greater the sensory information, the better the learning. Various versions of this proposition have proponents in the VR design community. For example, master VR designer Fred Brooks asserts, "we can build yet more powerful tools by using more senses" (Brooks, 1977). Even as early as 1965, Sutherland argued that the computer "should serve as many senses as possible" (1965: 507). The bandwidth hypothesis is a seductive idea. It has accompanied many proposals for augmenting human intelligence through computer interfaces. For example, the influential work of master designer Alan Kay contained a version of the bandwidth argument when he outlined a design for an all purpose learning machine he called the "dynabook .... a dynamic medium for creative thought" (Kay and Goldberg, 1977). Researchers have tended to emphasize the portability of the dynabook, but more important was the notion that the dynabook was to be a "'metamedium' (that) is active". In its interactivity the metamedium was to "outface your senses...(and) could both take in and give out information in quantities approaching that of the human sensory systems". (Kay and Goldberg, 1977: 32). Intelligence augmentation was one of the goals of this device. Kay hoped to help the user "materialize thoughts and, through feedback, to augment the actual paths the thinking follows" (Kay and Goldberg, 1977: 31). Kay and Goldberg summarized a design prejudice that is now widely shared by the VR community, "If the 'medium is the message' then the message of low-bandwidth is 'blah'." (1977: 33). The Sensory Transportation Hypothesis: V R can better transport the senses across space, time, or scale.
Media historian Harold Innis (1951) was among the first to focus on the role of communication media in the manipulation of space and time. VR technology advances this function of communication media. But with V R , the manipulation, construction, and reconstruction of space is central to the use of the medium. It is clearly central in the construction of virtual space, that 3D illusion that beguiles the sensorimotor channels of the user. But manipulation of space has another important role in VR technology. Some dimensions of the technology emerged from the research program in telerobotics. The central goal of the program of telerobotics and telepresence is not the construction of cyberspace, but the collapse of physical space. The collapse of space is built on the electronic transportation of the senses across space. In his greetings at the first IEEE Virtual Reality Annual International Symposium (VRAIS), Tom Furness, Air Force VR pioneer and a leading VR engineering researcher, proclaimed that "advanced interfaces will provide an incredible new mobility for the human race. We are building transportation systems for the senses ... the remarkable promise that we can be in another place or space without moving our bodies into that space" (1993: i). At the distant frontiers of VR's transportation mission lies an agency whose sole mission is the collapse of space. NASA is developing virtual reality as a means of transmitting the experience of being telepresent on distant planets ( McGreevy, 1993). At the other end of the spatial scale are VR systems squeezing the human senses down into the space that surrounds atoms. Work at the University of North Carolina
Intelligence Augmentation
(Robinett, 1993) ties the virtual reality interface to the end of a scanning-tunneling microscope. Atoms become mounds on what looks like a beach of pink sand. Atoms can be "touched" and even moved; the pink sand reshapes itself and new mounds appear. Both of these examples are different forms of one way to augment human intelligence: the extension of sensorimotor systems. The Expanded "Cone of Experience" Hypothesis: Users will simulate and absorb a wider range of experience. There is a materialist streak in the VR community, learning is seen as the direct outcome of experience. It is reasoned that more experience leads to more learning. But the argument is slightly more complex. Harking back to Dewey and Gibson (1979), there is an implicit proposition that 3D, sensory, and interactive experience is at the core of learning invariants and patterns in the environment. The promise of VR brings out another function of media: the simulation and modeling of the world of experience. This function of media is as old as the theater and role playing. Media, such as VR, can be characterized as expanding the "cone of experience". The human mind can vicariously experience a wide range of situations. The range of experiences and the diversity of models of problem solving and action have been augmented by communication using existing media. VR promises to expand the capability of media by making the expanded cone of experience a little less vicarious. Unlike books, the user need not use as much imagination to fill in the mental simulation. VR designers try to directly engage the automatic, bottom-up perceptual processes to deliver an intense simulation of an experience. This is the essence of the goal of delivering experience that gives users "a sense of presence". VR proselytizer and artist, Jaron Lanier, was fond of suggesting that the goal of VR is the construction of a personal "reality engine", an all purpose simulation device (Lanier and Biocca, 1992). This is far beyond what the technology can do, but developments far short of this goal may have effects on the amplification of human intelligence. The property of VR, alluded to by Lanier and embodied in this hypothesis, involves two aspects of intelligence augmentation: the attempt to simulate cognitive operations and the expanded experience of objectified semantic structures- exposure to predigested cultural understandings. As Jaron Lanier has observed, "Information is alienated experience" (Rheingold, 1991). The Sensification of Information Hypothesis: Relationships in abstract information are better perceived and learned when mapped to sensory/spatial/experiential forms. Sensification is a generalization of the concept behind the terms "visualization" and "sonification". It means the creation of representations that use the information processing properties of the sensory channels to represent scientific data and other abstract relationships. Work arguing for the value of sensification for intelligence augmentation often has a neo-Gibsonian (1979) cast. It is argued that over thousands of years of evolution, the mind and the body have evolved to move, think, and act in a 3D environment. Because of the limitations in our symbolic systems and representational technologies, our means of communication have not been a b l e - until n o w - to fully harness the rich multisensory, spatial, and kinematic components of human thought and problem solving. VP~ more than any other medium, comes close to providing an environment that has all the sensory characteristics of the physical world
F. Biocca
in which our brain has evolved, while retaining the responsiveness and flexibility of abstract semiotic systems like language and mathematics. In some VR systems scientists sail through 3D scatter plots, chemists pick up 3D models of molecules with their hands to think up new pharmaceuticals, and stock market patterns are perceived through a cavelike corridor of undulating curves and changing sounds. The goal is to take the pattern detection capabilities of the senses, the spatial modeling capabilities of the eyes, ears, and muscles, to perceive, model, and manipulate ideas. The work on scientific visualization suggests the possibility for increased ability to detect patterns in data, faster problem solving, and more creative ideas. These are some of the cognitive outcomes Engelbart (1962) sought from his project to augment human intelligence. In essence, it is argued that advanced sensory displays can augment human intelligence by involving the senses more directly in the perception and manipulation of iconic entities. Amplification of Interpersonal Communication Hypothesis: Humans will be able to express and receive a broader range of human emotion, intention, and ideation. All the propositions so far have emphasized the augmentation of what Howard Gardner (Gardner, 1977) would call logico-mathematical and spatial intelligence. Until recently, most VR systems have involved a single operator moving in a socially barren environment. Those social VR environments that existed, for the most part have been designed for the military. The primary interpersonal interaction is search and destroymore the augmentation of interpersonal annihilation than the augmentation of interpersonal communication. As VR matures and multiple users can be represented in VR environments, more researchers are considering the use of VR to amplify interpersonal communication (e.g., Biocca and Levy, 1995; Palmer, 1995). Part of the early mission of intelligence augmentation through computer design was the creation of a "more effective" means of interpersonal communication (Licklider and Taylor, 1968). Most existing media like the telephone and email transmit only reduced personal presence. The primary goal in this area has been telepresence, the attempt to reproduce most of the cues found in interpersonal communication (e.g., Morishima and Harashima, 1993). This goal, if achieved, would do nothing more than reproduce any common face-to-face interaction. This is no small achievement. It involves the transportation of the sensorimotor channels. But it is hard to see how simply recreating an everyday interpersonal interaction could augment human intelligence.
Some writers have speculated about the design of hyperpersonal or hypersocial VR environments. In these environments VR tools would amplify interpersonal interaction cues such as facial expression, body language, and mood cues. For example, Jaron Lanier (Lanier and Biocca, 1992) has speculated about how VR environments could be designed to alter body morphology to signal mood. Biocca and Levy (1995) have discussed expanding the sensory spectra of users by mapping physiological responses such as brain waves, heart rate and blood pressure to properties of the environment such as room color to signal mood and cognitive states. There have been so few experiments in this area. It is not at all clear in what direction such tools would influence interpersonal communication or the augmentation of human intelligence.
Intelligence Augmentation
INTELLIGENCE AUGMENTATION: CAN A VISION BECOME A "SENSIBLE" RESEARCH PROGRAM? The overall goal of augmenting the human intellect is a highly motivating vision of the possible utility of the cognitive technology. It has also become a research program. The ideas listed above motivate design and research work in the area of VR. Researchers in VR labs around the world explicitly or implicitly subscribe to one or more of them. Each hypothesis (design postulate) mentioned above is as much vision as it is scientific hypothesis. In some ways the very nature of these "hypotheses" indicates a difference between the design sciences and the natural sciences. The "hypotheses" are not just about the "discovery" of scientific laws. They are teleological in spirit (Biocca, Kim, and Levy, 1995). They reflect human goals, the desire to exercise human will in the construction of an artifact - t h e very creation of virtual and cognitive reality. Are these goals attainable? I leave the response to another paper or to another 50 years of research. We might ask a more modest question: are these hypotheses sensible? Can they be founded on any valid evaluation of the technology or of the plasticity and abilities of the human mind? After all, we hardly know what "intelligence" is, how can we hope to "augment" it? Each "hypothesis" will certainly require more profound theoretical elaboration as both research and design move forward. As an example, let's consider one set of ideas that would require more theoretical elaboration as they are transformed from visionary proclamation to a concrete theory of human-computer interaction. A number of the hypotheses share a common assumption that simply increasing the sensory fidelity or vividness of information will improve human performance. This is partially due to the logic of simulator design (e.g., Hays and Singer, 1989; Rolfe and Staples, 1986). It is assumed that the closer the simulator is to the "real" thing, the better the training. When one thinks of plane, tank, or car simulators, this seems to have face validity. If someone is trying to learn motor sequences, it makes sense that practicing the actual sequences would be better than reading about them and imagining the motor sequences. But it does not follow that the sensory fidelity or vividess of VR systems would generalize to an overall improvement in human performance. Research on the value of sensory fidelity using previous media like pictures, film, and video has produced inconsistent results. For example, there is little support for the notion that more vivid messages are more memorable or persuasive (Taylor and Fiske, 1988; Taylor and Thompson, 1982). It also appears that sensory vividness interacts with individual differences. For example, the sensory vividness of training materials interacts with the ability of students. In one experiment using pictures and videos, increased sensory fidelity assisted students of low ability but provided no assistence to those of higher ability (Parkhurst and Dwyer, 1983). Existing research on instructional training and simulator design is not uniformly supportive of the ideas that increased sensory fidelity improves learning or performance (Alessi, 1988; Hays and Singer, 1986; Wetzel, Radtke, and Stern, 1994). One also has to ask a more basic necessarily valuable? Increasing sensory the information is relevant to the user's the best way to use media to train
question: Is any increase in sensory fidelity fidelity provides more information, but not all communication goals or tasks. In some cases, someone involves reducing the amount of
F. Biocca
information. For example, we o~en use maps or schematics of objects -like engines or human internal organs - rather than pictures. The reduced information of the schematic helps the user to detect the relevant information such as the location of various components. Learning a skill (e.g., a doctor's reading of chest X-rays) sometimes involves acquiring the ability to pick out relevant information from a field of noise and irrelevant data. Interfaces may reduce or alter the sensory fidelity of the image to selectively highlight the relevant cues. But assessing the design value of some dimension of sensory fidelity is not always clear or obvious. We don't always know how the mind uses various sensory cues. Consider the following design decision: Should designers of a driving simulator simulate ambient "street and engine noise"?. Will street and road noise increase or decrease the performance of a novice driver? Increasing the sensory fidelity of steering wheel dynamics is clearly more important than increasing the fidelity of street and engine noise. But a number of cognitive issues might be involved about a decision involving street noise. For example, there is the question of the user's attentional capacity: a novice driver is already bombarded with more information than he or she can handle. There is a question of information relevance: street noise might be just that, noise. It might carry little informational value. On the other hand, the changing acoustics of the tires on the road or wind noise as the car turns might provide some unconcious information about the automobile's velocity or attitude. For example, there is ample evidence that car drivers use the sound of their car to detect changes in its performance. So even when assessing the value of a detail like auditory simulation of street and road noise, its value for human performance is not clear. While there is some valuable research (e.g., Gibson, 1966; 1979), we still know too little about how humans use sensory cues to assemble cognitive models of environments. But my brief discussion of the issue of sensory fidelity still has not addressed the larger question of intelligence augmentation: Can a medium's level of sensory fidelity ever increase human intelligence? Take my example of the car simulator above. What if we had the perfect car simulator, one that would reproduce every sensory detail of car driving: the feel of the steering wheel, the 3D visual world rolling past the windshield; the rattle of the doors and the shoosh of the wind rolling over the car body; the smell of the plastic car interior, etc. At its best, such a simulator would do nothing more than simulate what you probably experience every d a y - driving a car. Would this augment human intelligence? The fellowship of car drivers stuck in traffic jams all over the world would certainly shout, "No!" Before we rush to judgement that something like sensory fidelity has little to do with augmenting human intelligence, we should remember one thing. Virtual reality is not really about reproducing reality. So my car simulator example leaves out a large segment of virtual environments. Simulation does not always mean reproduction. In fact, few media try to reproduce reality; rather they select and amplify certain parts of human experience. Consider the last movie you saw. Was it "realistic"?. Sure, the stroboscopic illusion of visual motion flowing on the screen had a certain level of sensory fidelity. But that visual sensory realism was attached to a camera. Through camera movements and zooms, your "augmented" vision travelled through space. It sometimes occupied positions in space you rarely occupy. Some moments you saw the scene through the eyes of one character, then, suddenly, through the eyes of another. Is this movement from one human identity to another realistic? Through editing, your
Intelligence Augmentation
"augmented" vision jumped around unrealistically through space from one scene to another, from one place in time to another. Is this realistic? In fact the whole format of the movie medium selected, abbreviated, and amplified all manner of human experience. The experience of travel, love, death, anger were all condensed and funneled through the medium. The m e d i u m may have simulated how we think rather than simulated reality. Do such codes and media augment intelligence? At some point in our history, they probably did (Donald, 1993). Can the further agumentation of human experience and training possible- or a least, thinkable- in some advanced VR system augment human intelligence? Maybe. But we will have to better understand the psychology of communication and the way to encode and deliver information. Through this we might achieve the goal of intelligence augmentation. We might be able to support more of the mind's cognitive models so that human information processing can be increased in ability, complexity, and capacity. The work on human creativity and problem solving suggests that a medium for augmenting human intelligence will be based more on our understanding of how we use sensory information and imagery to encode, think, and problem solve (e.g., John-Steiner, 1985) than by simply increasing the power of a graphics supercomputer. But the illusions of the graphics supercomputer may give us a means to explore how we encode, think and problem solve. A CONCLUDING NOTE The world-wide effort to rapidly develop virtual reality is motivated by a desire to augment human intelligence. Ideas related to intelligence augmentation have also permeated the culture. In the United States this desire is wrapped up in long standing cultural beliefs about technology and human perfectibility (e.g., Marx, 1964). In this article I have also tried to show how the design hypotheses propelling VR technology are part of a fii~y year effort to augment intelligence. In the vision of Vannevar Bush and his intellectual progeny, the computer would lead to unique cognitive technologies, cognitive environments that might free the human mind by enhancing its operation. What is clear at this point is that research in the design of virtual reality systems will attempt to push the envelop of human intelligence by creating new tools to amplify, augment, and adapt cognitive processes. It is not yet clear if this faith in the ultimate cognitive value of VR is justified or misplaced. REFERENCES
Alessi, S. M., 1988. Fidelity in the design of instructional simulations. Journal of Computer-based Instructions 9:335-348. Bardini, T., and A. T. Horvath, in press. The social construction of the computer user: The rise and fall of the reflexive user. Journal of Communication 45(2). Biocca, F., 1993. Will simulation sickness slow down the diffusion of virtual environment technology? Presence 1(3): 334-343. Biocca, F., and J. Rollland, 1995. Virtual Eyes Can Rearrange Your Body: Perceptual adaptation to visual displacement in Augmented Reality Systems. Submitted to Prescence.
F. Biocca
Biocca, F., T. Kim, and M. Levy, 1995. The vision of virtual reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 3-14. Hillsdale, NJ: Lawrence Erlbaum. Biocca, F., and B. Delaney, 1995. Immersive virtual reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 57-126. Hillsdale, NJ: Lawrence Erlbaum. Brooks, F., 1977. The computer scientist as toolsmith: Studies in interactive computer graphics. In: B. Gilchrist, ed., Information processing 77, 625-634. Amsterdam: North Holland. Brooks, F., 1988. Grasping reality through illusion: Interactive graphics serving science (Report TR88-007). Chapel Hill: Dept. of Computer Science, University of North Carolina at Chapel Hill. Bush, V., 1945, July. As we may think. The Atlantic Monthly, 101-108. Donald, M., 1993. The origins of the modern mind. New York: Cambridge University Press. Eco, U., 1976. A Theory of Semiotics. Bloomington: Indiana University Press. Engelbart, D., 1962, October. Augmenting human intellect: A conceptual framework. [Summary report, contract AF 49(638)-1024], 187-232. Stanford: Stanford Research Institute. Foley, J. D., A. Van Dam, A. Feiner, and J. F. Hughes, 1994. Computer graphics: Principles and practice. Reading, MA: Addison-Wesley. Furness, T. A., 1988. Harnessing virtual space. Society for Information Display Digest 16: 4-7. Furness, T., 1989. Creating better virtual worlds (Rpt. M-89-3). Seattle: HITL, University of Washington. Furness, T., 1993. Greetings from the general chairman. Proceeding of the IEEE Virtual reality annual international symposium, i-ii. Piscataway, NJ: IEEE. Gardner, H., 1977. Frame of mind. Boston: Harvard University Press. Gibson, J. J., 1966. The senses considered as perceptual systems. Boston: HoughtonMifflin. Gibson, J. J., 1979. The ecological approach to visual perception. Boston: Houghton Mifflin. Hays, T., and M. Singer, 1989. Simulator Fidelity. Boston: Houghton Mifflin. Heilig, M., 1992. E1 cine del futuro: The cinema of the future. Presence 1(3): 279-294. (originally published in 1955) John-Steiner, V., 1985. Notebooks of the mind: Explorations of thinking. Albuquerque: University of New Mexico Press. Kramer, G., 1995. Sound and communication in virtual reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 259-276. Hillsdale, NJ: Lawrence Erlbaum. Krueger, M., 1991. Artificial reality. New York: Addison-Wesley. Lanier, J., and F. Biocca, 1992. An inside view of the future of virtual reality. Journal of Communication 42(2): 150-172. Licklider, J. C. R., and R. W. Taylor, 1968, April. The computer as a communication device. Science and technology 17:21-31. Marx, L., 1964. The machine in the garden: Technology and the pastoral ideal in America. New York: Oxford University Press. McLuhan, M., 1966. Understanding media. New York: Signet.
Intelligence Augmentation
McLuhan, M., and E. McLuhan, 1988. Laws of media, The new science. Toronto: University of Toronto Press. Meyer, K., 1995. Design of synthetic narratives and actors. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 219-258. Hillsdale, NJ: Lawrence Erlbaum. Morishima, S., and H. Harashima, 1993. Facial expression synthesis based on natural voice for virtual face-to-face communication with machine. In Proceedings of the 1993 IEEE Virtual reality international symposium, 486-491. Seattle: IEEE. Murphy, M., 1992. The future of the body: Explorations into the further evolution of human nature. Los Angeles: Jeremy Tarcher. Parkhurst, P. E., and F. M. Dwyer, 1983. An experimental assessment of students' IQ level and their ability to profit from visualized instruction. Journal of Instructional Psychology 10: 9-10. Rheingold, H., 1991. Virtual reality. New York: Summit Books. Robinett, W., 1991, Fall. Electronic expansion of human perception. Whole Earth Review 17:16-21. Rolfe, J., and K. Staples, 1986. Flight simulation. Cambridge: Cambridge University Press. Rolland, J., F. Biocca, R. Kancherla, and T. Barlow, 1995. Quantification of perceptual adaptation to visual displacement in head-mounted displays. Proceedings of the IEEE Virtual reality annual international symposium, 56-66. Piscataway, NJ: IEEE. Salomon, G., 1979. Interaction of media, cognition, and learning. San Francisco: Jossey-Bass. Shapiro, M., and D. MacDonald, 1995. I'm not a real doctor, but I play one in virtual reality: Implications of virtual reality for judgments about reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 323-346. Hillsdale, NJ: Lawrence Erlbaum. Sheridan, T., 1992. Musings on telepresence and virtual presence. Presence 1(1): 120126. Sutherland, I., 1965. The ultimate display. Proceedings of the IFIPS Congress, 2: 757764. Taylor, S. E., and S. C. Thompson, 1982. Stalking the elusive "vividness" effect. Psychological Review 96: 569-575. Wetzel, C. D., P. H. Radtke, and H. W. Stern, 1994. Instructional effectiveness of video media. Hillsdale. N.J.: Lawrence Erlbaum Associates. Winograd, T., and F. Flores, 1987. Understanding computers and cognition. Reading, MA: Addison-Wesley Publishing. Wooley, B., 1991. Virtual worlds. Oxford: Blackwell.
Chapter 4
PATIENCE AND CONTROL: THE IMPORTANCE OF MAINTAINING THE LINK BETWEEN PRODUCERS AND USERS David A. Good Department of Social and Political Sciences University of Cambridge, UK
INTRODUCTION An important feature of various new information and communication technologies is the power they place in the hands of the user to choose between various activities and modes of operation as that user sees fit. This degree of control can range from the simple to the complex. The television remote control allows the supine viewer to easily browse a large number of channels as passing whims dictate. A similar remote control can guide an imaginary walk down a virtual mall, in which real interactive shopping can be done. A student working with a complex hypertext system can move between all sorts of material - graphics, text, sound - in a knowledge base seemingly without constraint as circumstances and desires dictate. At face value, this flexibility and user control can seem to be a highly desirable property. It certainly fits the current ideological climate where the market rules, and the consumer is supposedly sovereign over his or her choices. More importantly, it places power in the hands of the user, and who else would know best about that user' s needs, and how to achieve user-centredness? Thus, user control and user-centredness would seem to accomplish a central part of the Cognitive Technology Agenda (CTA henceforth). As Mey (1992) notes, we should be seeking systems which avoid 'forced adaptivity' and display 'adaptability', and what could display this more than a system which adapts to the user, moment by moment, as that user expresses his or her needs and follows his or her desires? Anyone who has seen a supine channel-hopper, witlessly cruising through endless TV channels as each moment's boredom creates a demand for something new, might instinctively feel that the answer to this question is not a foregone conclusion. It would not require a particularly puritanical frame of mind to think that there is something vaguely distasteful or even immoral about systems which allow uncontrolled selfindulgence. Instinctive responses to technological innovations must always be treated with caution especially with the creation of devices which are intelligent. It is very easy to summon an image of Frankenstein's monster, or a Luddite fear, but instincts are not always completely wrong. Indeed, as will be argued in this paper, there are grounds for
D.A. Good
believing that in many areas such extremes of responsiveness to user demand are not an unqualified benefit. There may, in fact, be more than a grain of sense in this instinctive moral reaction if we fail to distinguish between systems which are user centred and user indulgent. This is a distinction which in times gone by would have seemed quite pointless, as the only way in which an intelligent system could lead to self-indulgence would be if the user liked working hard. However, now we need to distinguish between systems in this way because user-indulgence can effectively destroy communication, and thus be quite harmful for individuals, and the societies in which they live. This concern is implicit within CTA, but it needs elaboration and development. An important component of CTA is a concern with how new communication technologies, and intelligent devices which can be communicative agents in their own right, can change individuals and the societies in which they live (Gorayska and Mey, 1995), [1 ]. This change might be affected in a variety of ways. There might be direct cognitive consequences resulting from everyday experience with such devices, either for work or leisure. Alternatively, the model of intelligence and interaction which they proffer might be taken as a metaphor through which self could understand self and other, [2], [3]. It is in being concerned with these threats that CTA has an interesting and distinctive moral tone, not to be found elsewhere in the literature on developing these new technologies, but fairly common in other literatures on their social and political impact (see, for example, Cherry, 1985; Dizard, 1989; Murdock and Golding, 1985; Salvaggio, 1989). By comparison to that social and political literature, this moral focus is relatively illformed, and some might dismiss it for proposing a well-meaning, but naive and poorly considered liberal sentiment of user-centredness. It could also be construed, and similarly dismissed, as a dramatic and overly fearful reaction to these technologies. To dismiss the moral agenda it proposes, in this way, would, however, be to ignore the fact that it is rooted in a specific view of human psychology. That view of human psychology lends the agenda a validity which makes it much harder to dismiss, and also gives it a sharper focus. It is a view which gives a central role to the experience of social interaction in the construction and maintenance of human mentality. THE PRIMACY OF CONVERSATION In many, if not all respects, face to face conversation is the basic model from which all other forms of human communication ultimately derive. For the young child, it is the medium through which he or she develops his or her understanding of language, its use, society, and the intelligence which pervades that society. Until not so long ago, it was the form which dominated human communication, and, although its central role has recently been challenged by the advent of various communicative technologies from the invention of writing onwards, it is still the environment in which humans typically operate. It is also the one they prefer because it enables them to pursue a full range of personal goals (Rutter, 1984). Recent work on human evolution has even argued that it is in the individual's social life, and the demands which it produces, that we may find the real evolutionary pressures which lead to primate intelligence in general, and human intelligence in particular (Byrne and Whiten, 1988; Good, 1995a; J. Goody, 1995). Indeed, many argue that social interaction and conversation are a necessary condition for human life, human intelligence and human society.
Patience and Control
Unsurprisingly, however, there is much debate about what other factors are significant, and how their significance varies with respect to different cognitive domains. CTA reflects this primacy and argues for its continuing importance, but understanding how it should affect system and interface design, for example, is not a simple matter. The experience of building various intelligent interactive machines has encouraged and, perhaps, even necessitated that we view these devices as isolated entities whose connection to the rest of the world is very much a secondary consideration without real consequence for their essential character. Initially, the scope and manner of their activity was so limited that the nature of their connection to other intelligent devices, be they natural or artificial, did not seem to carry any implications for the structure of the systems themselves, nor any consequence for those who used them. To all intents and purposes they could be considered as stand-alone devices with their own cognitive properties, in so far as they had any, and user skill, flexibility and adaptivity was sufficient to ensure the link to the user. Having acknowledged this, though, it is important to recognise that the use of any object [4] can be seen as part of a dialogue. A dialogue, that is, in the sense that an object is created with a purpose in mind, and that an understanding of the designer's purpose informs our understanding and use of it. As a consequence we can understand successful design as successful dialogue, no matter how limited it is, and we can also expect that, in the same way that dialogue carries consequences for the participants, so too will the design process and the products which result from it. In the simplest of cases, seeing use in this way adds little to our understanding of any object or its use, but as the complexity of manufactured objects grows, so does the importance of understanding the intention of the designer or creator, and the sense of a dialogue grows. There are many simple examples which illustrate the idea, and it is a point which can be seen as underlying the early work of Duncker, Maier and others on phenomena such as 'functional fixedness' (Duncker, 1945; Maier, 1931). This expression describes the way in which experimental subjects faced with a problem find it extremely difficult to see an object as being used for anything other than that purpose for which it was designed. For example, they often failed to recognise that they could use a hand tool, such as a spanner, as a pendulum bob to solve Maier's two string problem. Seeing an object and its use as part of a dialogue becomes more important, but much more difficult, as the complexity of the object grows, particularly in the case of communications, information and computer technologies. The increased difficulty lies in the fact that it becomes progressively less clear who is in dialogue with the user, particularly when the ambition behind the design is to provide more effective communication between human actors. With the simplest communications technologies such as, for example, the telephone, this is hardly a problem since the medium is seemingly transparent and the moment by moment intentions of the human users suborns any concern with an understanding of the intended use of the system. With more complex systems the situation is complicated by the apparent intelligence of the device itself. While an on-line encyclopedia is just another way of communicating information from those who know to those who do not, the way in which it responds to requests for information from a user can lead to a sense of the machine being the partner in the exchange. Thus, in an important way the user is in the position of communicating both through and with a system. If we are to understand the
D.A. Good
consequences for individual users of different designs, and the extent to which user choice as an expression of user-centredness is desirable, then we need to consider in more detail how communication and information technologies transform the ideal speaker-hearer relationship. This can be interestingly done if we first consider the impact of the oldest and best studied communication and information technology, writing. INTERACTION TRANSFORMED BY TEXT Writing is, of course, not exactly what CTA is aimed at, but the way in which it transforms interaction has much in common with the ways in which other technological developments transform interaction. Central to these changes are the following. The speaker and hearer who become writer and reader are displaced in space and time from one another. The channel through which they communicate carries less information. The signal sent loses its ephemeral quality, but, because it endures, it provides an important form of information storage that is independent of the vagaries of human memory. The consequences of these changes, and this dialogue of a different kind, are said to be many. Some might be seen to be relatively beneficial. Other forms of discursive structure are developed, both for the literate and non-literate members alike of literate societies; more time is available for reflection in the production process for the speaker/writer; and the written page becomes a prosthetic device for the mind, both in the moment and the longer term. Other consequences might not be thought to be so benign. Spontaneity is lost; the communication is impoverished in terms of its social and emotional content; and the precision of the written page can exert its own form of pedantic tyranny as the prospects for negotiating meaning are reduced, (E. Goody, 1986; J. Goody, 1990; Illich and Sanders, 1988). All these consequences are important, but at their heart is the fact that the very nature of the relationship between the two sides to the dialogue is changed, and thus so are the participants. Any spoken dialogue offers two principal roles for the participants [5]. On the one hand, those who speak need to compose something intelligible which can be interpreted by those to whom it is addressed, and failings of the composition are addressed there and then. On the other, those who are addressed can play an important role in revealing the success of the utterance which the speaker has produced. They are also required to pay attention, and be a competent patient listener who is engaged in the speaker's project. From this simple fact of co-presence, and the system constraints of the participants considered both individually and together, a number of properties flow which provide all parties with resources, but also impose constraints on their actions. In written communication, however, neither need pay the same kind of realtime, moment by moment attention to the other, and there is no compulsion to orient to a collaborative enterprise in the same way. In other words, while each can be more self-centred, this is especially the case for the reader. Unless the reader has some independent motivation for persevering with the reading of the text, he or she can play with it as they wish, or even completely disregard it. In face-to-face conversation, behaving in this way would be impossible if one wished to maintain any kind of relationship with the speaker. In brief, the other-centred listener can become the selfcentred reader.
Patience and Control
The reader's independent motivations for attending to a text might, however, be many, and could include all manner of extra-textual factors; yet it should not be forgotten that the text itself contributes to that motivation. Apart from the widespread conventions on how one reads, the structure of a text, its permanence and its scale give the writer resources for engaging and controlling the reader. The structure of the text is one way in which the author's presence is maintained in the dialogue. The reader also maintains a conception of authorship, and this too can be a constraint as it evokes a notion, no matter how limited, of a relationship. No reader believes that a text created itself Thus, by convention and by virtue of the text itself, a link is maintained between the writer and the reader. This itself can counter-balance potential egocentricity, and when it does, the intellectual demands of the task of reading contribute something more to an individual's mentality. Thus, in the case of writing, a gap may be opened between the speaker/writer and reader/hearer allowing a degree of egocentricty to emerge, and this is especially so for the reader/hearer. Nevertheless, other demands and resources enter the picture to close this gap by providing the speaker/writer with a degree of control and authority. The demands of literacy, in turn, provide the reader with additional cognitive benefits. I N T E R A C T I O N T R A N S F O R M E D BY H Y P E R T E X T If we now turn to the other end of the technological spectrum and examine, for example, a powerful multi-media hypertext system, in the light of the considerations which have been just applied to writing, it is easy to see that the potential for destroying the link between the archetypal speaker and hearer is much greater. The same factors to do with displacement in space and time apply, and two more potent elements come into play. Both reduce the possibility of authorial control; one which confuses any understanding of the nature of authorship, and one which reduces or eliminates text structure. First, the very nature of the material, its quality, its variety, its dynamic, and seeming intelligence, elevate the system to the position of interlocutor, but not interlocutor. This is a new conversational role which completes the separation speaker from hearer, and, since the occupant of this role, the system, has no rights standing, the requirement for respect for and attention to the other disappears, and egocentric mentality, on the part of the user, is permitted, and, perhaps, encouraged.
its as of or an
Second, as systems of this type become more poweful and flexible, the choices available to the user at any point rapidly multiply so that the number of different routes through the material seems to be almost without limit. This entails that the author cannot assume that any user has necessarily arrived at any point by any specific route. Thus, although the elements of the hypertext are linked to one another by a web of connections, they must also be relatively discrete and self-contained. The result is that the elements become increasingly self-sufficient and reduced in size, while the whole becomes comparatively unstructured, and unconstraining on the activities of the user who is using it. This encourages a degree of self-centredness because the user is encouraged to follow his or her own needs, as seems appropriate to him or, and promotes a view of knowledge as a collection of discrete and fragmented parts.
D.A. Good
SELF-CENTRED EDUCATION IS NOT USER-CENTRED EDUCATION If any part of this brief and gloomy picture is right, then what is threatened most is a particular way of learning and developing, and it is not clear that there is any effective substitute for this way. To understand why this might be thought to be so, it is necessary to focus on a rarely examined, but important paradox which I have explored elsewhere, (Good, 1995b). This paradox originates in certain views of education which are quite influential, and have a good deal of attraction for those constructing intelligent knowledge-based multimedia devices for use in education. What we traditionally identify as education is one area where these devices will be heavily used in the future, and the wide availability of them is quite likely to transform the institutional nature of education, and make it a ubiquitous, life-long activity. A fundamental premise of most education and instruction is that there is an asymmetry of knowledge between teacher and pupil. The teacher knows more than the pupil does. This does not mean that there are not cases where the less well-informed say or do something from which those who are more knowledgeable can learn, but these cases are in the minority, and rarely, if ever, are they cases where the educational event is intended. The aim of instruction or tuition is to reduce the difference between the student and the teacher by, amongst other activities, the transfer of knowledge, skills or ideas from one to the other. When we contemplate this problem, it is very tempting, and many have yielded to this temptation in the past, to view communication in education as a process in which the teacher interprets the student's current state of ignorance, and decides on what can be safely added to that knowledge base without either bemusing or boring the student. Too much will do the former, too little the latter, and, if what is offered is the right amount but it is not configured in an intelligible form, confusion will be the result again. However, for a teacher to know what it is that he or she might usefully say, under this scenario, that teacher needs to know what it is like to be in a state of ignorance. To put it another way (which applies to most of our communicative activities), to know how to formulate an idea which is unknown to somebody else so that they can understand, it is necessary to know what it is like to not understand it - which is an impossible demand. Now teachers manage to circumvent this difficulty in all sorts of ways, and education still happens. Common to all the tactics which are used is that in some way or another they rely on the experts in ignorance, i.e. the students, for advice. This may come directly, in relation to each student, from a contemporaneous dialogue in the classroom, or it might come via other teachers' experiences, or it might come from other students at other times. Equally, of course, the students, the experts in ignorance, have a problem. They cannot say what should be presented to them, because they are ignorant. So education can only proceed by both sides working together to find out what it is that each needs from the other. This may well not be a dialogue of equals, because control depends on power and knowledge; but as a dialogue, it can only be exercised with the assent of those who are in the position of ignorance. In other words, effective teaching depends on collaboration and dialogue, and on the ability to take part in dialogue, an ability which depends on one' s experience of dialogue. All of this is a somewhat simplistic way of paraphrasing one part of Vygotsky's idea of the 'zone of proximal development' (Vygotsky, 1962/1934), which proposes that a child' s development is dependent on the social life in which it can engage because of its interactional skills and the social world in which it lives. Central to this is the claim that
Patience and Control
the psychological growth a child can achieve at any point is constrained by this particular developmental space which depends for its character on a number of factors. The most important of these is the dialogic skills of both the child and those with whom it is interacting. The contributions which others make in a conversation with the child effectively erect a scaffolding which enables him or her to develop by helping to support a shaky capacity in the first instance. Furthermore, as children come to understand the roles that others may take with respect to them, they can internalise an understanding of the resulting dialogues, and so extend their own cognitive skills in a more self-reliant fashion. To do this depends upon the experience of working with others, empathising with their aims and ambitions, and a degree of patience and willingness to comply with the demands they make. The totally self-indulgent child who wishes to do only what he or she fancies would never have this kind of experience, and would suffer as a consequence. It is the wise child who learns that suspending one's disbelief and boredom, and paying close attention to the speaker is an important step. Vygotsky's description of child development is not one which loses its importance when the child reaches adulthood. There are many reasons to believe that the dialogic imagination is very important for all manner of intellectual activities, and it is a form of mental life which is only preserved in its use. There is no alternative [6]. This line of argument has been explicitly offered by Laurillard in her work on the impact of various kinds of educational technology on University level education (Laurillard, 1994). As an education technology specialist, she is fully cognisant of the potential of these new technologies for enhancing and extending educational opportunities. However, she is equally aware of the need to understand the different kinds of learning experience which a student of any age needs. A central element in her account is an emphasis on the way in which interacting with someone places interpretative and expressive demands which simply do not arise in any other context. These demands are important, not only for developing the individual's communicative skills, but also for developing the intellectual capacities which make communication worth while. Interestingly, this conclusion is also being recognised by a number of those involved in recent UK programmes for the introduction of IT to higher education (Mayes, 1995) If the development of different systems does take the separation of speaker and hearer to a point of complete isolation of one from another, then certain consequences will follow. Authorial control is severely limited, and the reader's patience is a virtue which is no longer rewarded. The implication of this view is that understanding how a system might be best structured to benefit the needs of a user is not simply correlated with the users' experiences at any particular stage of their use of the system. It is an old lesson that the user of any object or instrument, or the reader of any text will persevere in the face of great difficulty, if there is some reason to have faith in the author or creator of that text or object. It is not unusual for great benefits to flow from such perseverance when there is a temptation to succumb to an easier alternative. This lesson should be borne in mind for CTA. CONCLUSION In this paper, I have sketched an argument that it is important not to confuse usercentred with self-centred, and speculated that the former is often satisfied by an arrangement where it is not assumed that the user always knows best. The moral
D.A. Good
agenda which is part of CTA raises important issues about the relationships between people as transformed by technology because it links the nature of human mentality to the social life which is led. Consideration of this will hopefully lead to appropriate, effective, and useful technology which extends and enhances human capacities and activities. Those developments can only be successful if the humans in question are left with the capacity to be integrated and connected members of the societies in which they live. This depends on their dialogic abilities which in turn are a major force in the establishment of their intellectual abilities. These will come from many different experiences in many different domains, and it is quite clear that the experience of new forms of relationship between the parties to any kind of dialogue does not in itself pose a threat, as our collected experience of literacy shows. However, if the link between those parties in this new communicative domain is broken by the replacement of one of them with an intelligent device which needs not to be respected, and at the same time narrative structures are destroyed, the model of knowledge and communication offered could ultimately be far more damaging. NOTES [1] It is important to bear in mind though, both here and later, that the user's conception of any supposedly intelligent system is an important consideration. This point is forcefully illustrated by the study of a radical psychotherapy service where clients were asked to ask ten yes-no questions into a microphone, and, after each question, offer an interpretation of the 'yes' or 'no' answer which had been given by a light coming on. Believing the answers to be coming from a trained psychotherapist, some subjects were able to interpret the most bizarre sequences of answers as meaningful. They had, however, been misled, there was no psychotherapist and the answers were randomly generated, (McHugh, 1975). [2] Both academics and non-academics alike are prone to take all kinds of metaphors from the wider world for understanding themselves and others, and there is no reason to believe that an instance which can be so intimately known will be free of this tendency. [3] There is a faintly amusing irony in this concern because these technologies which potentially pose this threat almost certainly owe their existence to the flexibility and creativity of mind which, CTA assumes, developed from the social life which is threatened. This ironic sense verges on the paradoxical when we realise that the flexibility of mind which enables the user to display a high level of 'adaptivity', and so adapt to all sorts of technology in the first instance, is also the capacity which makes that user, or even the society of users, vulnerable to the iniquities of 'forced adaptivity'. 9[4] One might simply consider created objects at this point; however, found objects are created as something new in the light of their use, and so it becomes quite difficult to specify the boundary between natural and manufactured. [5] Restricting the number of participant roles to just two, speaker and addressee, ignores the fact that there are many different kinds of role in conversation apart from these two, but for the sake of the current discussion, the original Adam and Eve of conversation will do. See Levinson (1988) for a discussion of the variety of roles, and reasons for taking them seriously.
Patience and Control
[6] It is interesting to note how children with exceptional mental skills in very narrow domains, children who are often known as idiots savants, are often autistic and have very poor or non-existent social skills. The mental feats they can perform often seem to be amazing in the computational power required compared to what most of us can do, for example, calculating what day of the week any date will fall on, but they seem to be feats almost totally lacking in intellect as we normally construe it. REFERENCES
Byrne, Richard, and Andrew Whiten, eds., 1988. Machiavellian Intelligence. Oxford: Clarendon Press. Cherry, Colin, 1985. The Age of Access: Information Technology and Social Revolution. London: Croom Helm. Dizard, Wilson, 1989. The Coming Information Age. 3rd edn. London: Longman. Duncker, Kurt, 1945. On problem solving. Psychological Monographs 58: 270. Good, David, 1995a. Where does foresight end and hindsight begin? In: E.N. Goody, ed., Social Intelligence and Interaction, 139-149. Cambridge: Cambridge University Press. Good, David, 1995b. Asymmetry and accommodation in tutorial dialogues. In: R. J. Beun, M. Baker, and M. Reiner, eds, Dialogue and Instruction, 31-38. Berlin: Springer-Verlag. Goody, Esther, ed., 1995. Social Intelligence and Interaction. Cambridge: Cambridge University Press Goody, Jack, 1990. Technologies of the Intellect: Writing and the Written Word. Memorandum Nr. 5, Projektgruppe Kognitive Anthropologie, Max-PlanckGesellschafl. Gorayska, Barbara, and Jacob L. Mey, 1995. Cognitive Technology. In: Karamjit S. Gill, ed., New Visions of the Post-Industrial Society: the paradox of technological and human paradigms. Proceedings of the International Conference on New Visions of Post-Industrial Society, 9-10 July 1994. Brighton: SEAKE Centre. Illich, Ivan, and Barry Sanders, 1988. ABC: The Alphabetization of the Popular Mind. San Francisco: North Point Press. Laurillard, Diana, 1994. Rethinking University Teaching. London: Routledge. Levinson, Steven, 1988. Putting linguistics on a proper footing. In: P. Drew and A. Wootton, eds, Erving Goffman, 161-227. Cambridge: Polity. Maier, N., 1931. Reasoning in humans II: The solution of a problem and its appearance in consciousness. Journal of Comparative Psychology 12:181-194 Mayes, T. A., 1995. Paper to CAL 95, Queens College, Cambridge, April 1995 McHugh, Paul, 1968. Defining the Situation. Indianapolis: Bobbs Merrill Inc. Mey, Jacob L., 1992. Adaptability: reflections. M and Society 6:180-185. Murdock, Graham, and Golding, Peter, 1989. Information poverty and political inequality. Journal of Communication 39:180-195. Salvaggio, J., ed., 1989. The Information Society. Hillsdale, NJ: Brooks Cole. Rutter, D. R., 1984. Seeing and Looking. London: Academic Press. Vygotsky, Lev S., 1962. Thought and Language. [Originally in Russian, 1934.] Cambridge, Mass: MIT Press.
Chapter 5 "AND YE S H A L L BE AS M A C H I N E S " - OR S H O U L D M A C H I N E S BE AS US? ON TH E M O D E L I N G OF M A T T E R A N D MIND* Hartmut Haberland Department of Languages and Culture University of Roskilde, Denmark
If adaptation (cf. Mey 1994 on 'adaptability') is one of the big words in Cognitive Technology, the question immediately to be asked is: who adapts to what (or what adapts to whom)? In communication between people and machines, do people adapt to machines or do machines adapt to people? This sounds like a variation on HumptyDumpty's famous remark that it all depends on the question which is to be master (as he told Alice). Even though we, as users of intelligent machines, sometimes may feel that we are victims of their stupidity, this should not be the case since after all, there is a fundamental built-in asymmetry: machines are programmed by people. The question which is to be master is thus settled from the start, one should assume. However, such is not the case. On the one hand, there is nothing uncommon in a situation where human beings create a structure and lose control of it. When Marx talked about alienation, he had this in mind: humans are confronted with societal structures which are the works of their likes, but they experience them as something 'objective' they cannot change. This also means that they can learn how to deal with these structures, to adapt to them, without actually understanding them. "Sie tun es, aber sie wissen es nicht," as Karl Marx characterized this state. 1 The case is comparable to that of the kula as analyzed by Malinowski (1922), viz. the trading cycle of highly-valued but intrinsically worthless objects among the islanders off the NE coast of New Guinea. This trading cycle involves an astonishing number of people who never have had the full experience of all the parts of this cycle, and to our knowledge
*The first half of the title is taken from the title of Mey (1984). - I should probably thank Wolfgang Fritz Haug in this place. He really introduced me to philosophy, although I do not know what he will think of this piece when he reads it. Jens Balslev gave me a hard time years ago when I tried to convince him of the very position which I am attacking here. Special thanks go to Jeremy Franks for a discussion of how to translate Gustafsson into English. Soren Schou has shared his knowledge about Lars Gustafsson, and his copy of Utopier, with me. While writing this paper, I got a very encouraging electronic note from Lars Gustafsson which is gratefully acknowledged here. And a very special thanks to the Editors, Barbara and Jacob, for maieutic help. "They do it, but they don't know."
H. Haberland
this cycle has not been devised by any one master mind, but has developed through practice. Still, everybody knows exactly what his role is in the cycle. The question of which is to be master has turned meaningless here: though a thoroughly human product, the machinery of an abstract structure (such as a patterned habit, or an institution) has taken command over the individuals functioning in it. On the other hand, the asymmetry of the relationship between human users and programmer on the one hand, and intelligent, programmed machines on the other is just another reflection of the asymmetry that crops up whenever we talk about consciousness. Already in Cartesian dualism, res cogitans and res extensa are not endowed with equal opportunities: mind can be conscious of matter, but not the other way around. In Turing's (1950) famous Gedankenexperiment (the one which should enable us to decide whether a machine is intelligent or not), it is an observer that has to be convinced by the machine that it is intelligent; this role cannot be taken by a machine if only for the reason that the machine could not see the point of getting an answer to the question whether machines can think. In Marvin Minsky's classical treatise Matter, mind a n d models (1968), the role of the observer is duly emphasized in connection with his discussion of models. Minsky uses the term 'model' in the following sense: "to an observer B, an object A * is a model o f an object A to the extent that B can use A * to answer questions that interest him about A. " (1968: 426)
Now if a human being M is interested in answering questions about the world W, he or she would use a model W* of W. This model could be inside M, but at the same time contain a model M* of M (since M is part of W). M* can then contain a model W** of W*, and so on and forth. All these models would be motivated by the specific type of questions they can answer (e.g., my built-in model M* of myself can answer questions like how old or tall I am, but not what kind of thing I am - this question would have to be referred to a model M** of M*). But (and this is the point that interests us here) although all these models in principle can be emulated by programmed machines- M* does not have to be in M, but can be programmed and exist somewhere outside M - , there is no point in relegating the task of the observer to a machine. Machines can be used to answer questions, but they cannot genuinely be interested in asking questions. So "humans are in a privileged position" (Edelman, 1989:22). By this, Edelman means that humans can report about their consciousness, whereas they are dependent on inference when discussing consciousness in animals (assuming that animals have one). Traditionally, the assumption of human privilege amounts to self-consciousness (together with self-conscience) being the ultimate, irrefutable, irreducible property specific to humans. Still, the question is lurking: and what if it is not? How can we prove that we are privileged in this way? The fact that we want to be privileged does not prove that we are. Neither does the introduction of dualistic assumptions (man-animal, mind-matter, and so on) excuse us from deconstructing the presumed privilege. Originally conceived as a means of establishing the superiority of res cogitans (that means us) over res extensa, dualism can turn against itself. If dualism wants to say anything sensible about
Modeling Matter and Mind
the privileged member of the mind-matter dichotomy (and here I am tracing Minsky's argument against free will (1968:431)), it has to apply models of the mind based on the structure of its opposite, viz. matter. From this to the use of a an ontological metaphor like TnE MINDIS A MACHINE (as acknowledged by Lakoff and Johnson (1980: 27) 2 ) is not a big step. An historically adequate, literary expression of the shock created by the realization of possibly losing this privileged position is found in a poem by the Swedish author Lars Gustafsson, originally published in 1966, with the title Maskinerna, 'The Machines'3
The Machines 4
Lars Gustafsson Some came early, others late, and outside the time where it exists each one of them is homeless. Hero's steam ball. The Voltaic pile. The ballista. Polhem's ore hoist at Falun. Curiosities: The "pneumatic fan." Una macchina per riscaldare i piedi. We only perceive machines as homeless When they belong in another century. Then they become obvious, they acquire a meaning. What do they mean? Noone knows. The crankshaft device: a way of transmitting power over long distances with the aid of two levers moving backward and forward. What does the crankshaft mean?
2 and examplified by English expressions like 'to grind our a solution', 'my mind isn't operating today', 'I'm running out of steam', etc. 3 It is only fair to acknowledge that Lars Gustafsson in 1995 does not take the same philosophical stance in these matters as he did in 1966, as he informed me in the electronic message referred to above. 4 translated by Yvonne Sandstroem. Quoted by kind permission of the University of Minnesota Press from Modern Swedish Poetry in Translation, ed. by Gunnar Harding ans Anselm Hollo, Minneapolis 1979, 75-76. The Swedish original is reprinted in Gustafsson (1969).
H. Haberland
DIE BERGWERKE IM HARZ ANNO 1723 The picture teems with people. People, small like flies go up and down in the buckets, the object marked "J" in the picture, "La Grande Machine," by the fresh waterfall, drives all the cables. Noone has ever combinedas would be perfectly possiblecrankshaft and steam engine Voltaic pile and Hero's ball. The possibility remains. A foreign language that noone has spoken. And actually: Grammar itself is a machine Which, from innumerable sequences selects the strings of words for intercourse: "The healthy instruments", "the productive parts", "the cries","the muffled whispers". When the words have vanished, grammar's left, And it's a machine. Meaning what? A totally foreign language. A totally foreign language. A totally foreign language. The picture teems with people. Words, small as flies go up and down in the buckets and the object marked "J", "La Grande Machine" by the fresh waterfall, drives all the cables. A few years later, Gustafsson published an analysis of his own poem in a collection of essays (Gustafsson, 1969). This analysis gives us a number of technical explanations of matters not obvious to the modern reader. Whereas Hero's steam ball, the Voltaic pile and even the ballista still may be generally known, we must consider the great Swedish engineer Polhem's ore hoist at Falun, blankstOtspelet, as less well-known, perhaps even in Sweden. (Figure 1 shows a detail of blankstotspelet.) Not many people today are familiar with a crankshaft device, unless they realize that it is the very same principle, viz. the lever system, that propels the wheels of a steam locomotive. Yet, this
Modeling Matter and Mind
contraption was an extremely common sight around the ore and coal mines of the 17th Century, having a function comparable to today's power lines; by this device mechanical power could be transferred through a system of levers and shafts from its source (eg a waterfall driving wheels) to the its place of application.
Blankst6tsspelet i Falun. Det maskinella hos maskinerna blir tydligt f6rst niir de f6rdldrats och ryckts ur sin ursprungliga uppgifi, s. 33.
Figure 1. Polhem's ore hoist at Falun. The machinery of a machine appears most clearly when it is seen outside its historical context. (Detail from a copperplate by van den Aveelen in Eric Dahlberg's 'Suecia antiqua et hodierna', 1701)
H. Haberland
Gustafsson's poem obviously deals with alienation; but not only alienation in the sense of Marx' Entfremdung (as mentioned above), but also in the sense of Brecht's Verfremdung. Machines take on a foreign character, they become "homeless", when seen in a different historical context. Seen from the vantage point of the 20th century, erstwhile immensely useful machines like the Falun ore hoist or the crankshatt device share their place in history with utter curiosities like a feetwarmer machine. Gustafsson links the eery mood that overcomes us when we look at old prints of mechanical devices to the shock that we experience seeing one of the functions of our mind, language, described as the output of a machine. As Gustafsson himself points out in his self-analysis, the historical locus of this shock is what much later was called the Chomskyan revolution. Gustafsson's poem reflects the poet's amazement at the fact that language could just be a machine rattling off sentences in our mind; sentences that, when spoken, are taken for utterances by our listeners. Noam Chomsky's characterization of grammar as a machine was the point of departure for Gustafsson's poem. It is conceivable that we are machines ourselves, and that we would not be able to tell.
"The symbolic value o f the machines consists in the fact that they remind us o f the possibility that our own life in some way is simulated in the same way in which the machine simulates life." (Gustafsson 1969: 40, my translation 5) Now this is not a necessary consequence of reading Chomsky, neither now nor in the 60s. First, Chomsky's theory was never meant as a theory of linguistic communication; his view of human language is basically and inherently extra- or metacommunicative, and thus his theory is only a partial theory of human language. Chomsky would be the first to admit this, simply because he does not assume that communication is the primary raison d'Otre of human language. 6 He is not primarily interested in human communication but in human language and the human mind. This is obviously at variance with our common experience of human language. Leaving consciousness aside, which either has to be inferred (in the case of animals, if they have it) or can be reported on (by humans, like in grammaticality judgments), we have a via regia to language which neither depends on inference nor reports: language use. Gustafsson's poem shows this indirectly, when he talks about "the strings of words for intercourse ''7 that the grammar machine selects from an infinite set of sentences. In Chomsky's original view, grammar is a device that recursively enumerates the infinite set of sentences of some language. In Chomsky there is no talk about anyone (and certainly not about a grammar) selecting any of those sentences for interaction with another mind (a mind which embodies another grammar). But if Gustafsson had 5 In the Swedish original: "Maskinemas symboliskavarde ligger i att de erinrar oss om mrjligheten att vS.rt eget liv ~irp5 nhgot satt simulerat i samma mening som maskinen simulerar liv." 6 Ironically, the 'formalist' Chomsky is joined here by the 'functionalist' Malinowski who also thought that communicationwas only a secondary function of human language based on a communion which joins speakers and hearers without necessarily communicating something about some third person or object, cf. Malinowski (1923: 316) and Haberland (1984: 18). 7 In Swedish, samfardselns ramsor. I would have preferred a translation of samfardsel as 'interaction' rather than 'intercourse', to avoid a preemption of the picture which only emerges in the following lines of the poem as a consequence of the ambiguityof 'intercourse'.
Modeling Matter and Mind
followed the orthodox view, the word would not have become flesh (as it certainly does, when he talks about" productive parts" and "muffled whispers"). Instead of the instance of language use suggested by Gustafsson, we would have had two minds comparing notes about the identity of two recursively enumerable sets, or at most two linguists comparing grammaticality judgments. Thus the role of language use is reclaimed by the workings of poetic truth. Chomsky's theory of language is, possibly, a theory of the human mind, but not of human beings interacting with the help of the "healthy instruments" of language. The second objection is that although Chomsky may be able to describe the human mind as a Turing machine, this does not prove that the human mind is a Turing machine, not even that the human mind could be a machine. But only if it could be a machine, the shock induced by our falsificationist powerlessness viz. that if it were not the case, we could still not prove it is real. The mere fact that the grammar device can enumerate all, and only the sentences of a human language does not make it simulate the human mind: it just emulates an important part of its functioning. This difference between simulation and emulation is crucial; only a simulation can claim to be structurally analogous to its Urbild as a model. 8 At this point, we must make sure that we distinguish properly between models and metaphors. The notion of model in itself it not without its problems, as Yorick Wilks (1974) has reminded us. In mathematics, 'model' is used in a specific sense - and this practice goes back ultimately to Tarski -, viz. in the sense of a 'second interpretation of a calculus'. Since this interpretation (e.g. when we in formal semantics talk about 'truth in a model') often is more concrete than the first interpretation, one easily gets the impression that mathematicians use the term exactly in the opposite sense from the sense established in the behavioral sciences, where models tend to be more abstract than what they model. I'll leave this question aside here - even though the difference only may be apparent, it still can cause confusion; rather I want to emphasize that both models and metaphors are ternary relations between a user or observer and two objects or concepts of which the one is to be understood through the help of the other. For that reason, both models and metaphors are crucially dependent on people that employ them. If we compare Minsky's explication of a model, quoted above, with Lakoff and Johnson's account of metaphor, where one concept "is partially structured, understood, performed, and talked about in terms of'" another (1980: 5), and metaphor "allows us to comprehend one aspect of a concept in terms of another [concept]" (1980:10), then one difference should be clear: in metaphors, the two objects we are talking about can in principle be exchanged. If we understand argument in terms of war (since we make use of the cognitive metaphor argument is war), this is because the concept of argument is "partially structured, understood, performed and talked about in terms of" war, and then we can also talk about war in terms of argument; if we can understand one aspect of argument in terms of war then we can also understand one aspect of war in terms of argument. Likewise, if the mind is partially understood on the basis of the metaphor THz MINDIS A MACHINE(like 'my mind is on the blink'), we also have metaphors that describe machines by exploiting their similarities to the human
8 In the sense of Mey (1972), Chomsky's theory of competence is a descriptive, not a simulative model.
H. H a b e r l a n d
mind (like 'the machine has gone crazy '9 ), or body (like 'to feed data into the machine'), or even the whole human being (like 'the machine is on strike'). Contrary to the case of metaphor, the relationship between a model A* and what it is a model of, viz. A, is not symmetrical. 1~ If it is to make any sense for us to answer questions about A by asking them about A*, then A and A* cannot exist independently of each other. (Cf. what the astrologer does, viz. answering questions about something observationally non-accessible, the future (A), by asking questions about something more or less abstract, but observationally accessible: relationships (A*) between heavenly bodies. This presupposes that one believes in some pre-established homology between A and A*, although not necessarily in some causal relationship of A* to A, as vintage astrology would assume.) A* must specifically be constructed as a model of A, and A cannot at the same time be a model of A* in the sense that it helps answer questions about A* (although nothing is wrong with A* being part of A, which means that A* can contain a model A** of itself, which is something completely different). Using Turing machines as models of the human mind is attractive, because within an automata-theory based hierarchy, Turing machines are the simplest automata that are powerful enough to answer relevant questions about the set of sentences in any human language. (One of Chomsky' s achievements is the proof that nothing less than a Turing machine will do as model for the recursive enumeration of the sentences of some human language L.) The Turing machine, for all its power, is at the same time a welldefined and reasonably simple device which makes it possible, in a relative straightforward way, to study the formal properties of the languages it generates. 11 The value of the answers to the questions directed at model A* is, on the other hand, dependent on how well A* and A match, in this case, how much the sets of sentences generated by the grammar A* have to do with the actual language used by A. A relevant question is whether the concept of an infinite set of sentences generated by A* can be interpreted in a meaningful way with respect to A. As we see, even though Turing machines have been used as models of the human mind, it is doubtful if they could be used as models of human language, if one insists on interaction (or at least communication) as essential aspects of human language. This is a very different matter from the use of metaphors like THEMINDIS g MACHINEin everyday language. Even if we restrict ourselves to information processing machines (which is not required by this cognitive metaphor), such machines, although theoretically equivalent to a Turing machine, are much more complex than the latter and actually much less well understood. There is often no easy way of predicting their behavior at a given task other than letting them execute it, simply because a computer C* that is supposed to model another computer C cannot be any faster or simpler than C. The On machines going crazy, cf. EngstrOm(1977). lOTo me it seems that it is here that the mathematician's use of 'model' is at variance with its use in the behaviorial sciences. If the model servesthe purpose of establishing that the 'first interpretation' is consistent, then the roles of the two interpretations can reversed; it all depends on which questions one wants to ask. ~1 One of the results of these investigations lead to a paradox: although a grammar for natural languages has to be at least as powerful as a Turing machine, Turing machines are not restricted enough since they also generate sets of strings which never could be the set of sentences of some human language. I am referring here to the work done by Peters and Ritchie (1969, 1971), and others. The successive attempts to solve this paradox have led to the different paradigms of transformational-generativegrammar, to Government and Binding theory and beyond.
Modeling Matter and Mind
actual effect in understanding resulting from the comparison of the human mind with an actual computer is, therefore, rather limited. Similarly, many of the more specific instances of the general metaphor THE MINDIS A MACHINE do not really explain the mind through the workings of a computer but take their point of departure in an experience with computers, like 'My mind went totally blank (sc. like a screen)'. But if the human mind cannot be explained by reference to a computer, then maybe computers can be explained by reference to the human mind? In the terms used earlier, this would mean that the observer M has inside her- or himself a model C* of the programmed computer 12 C whose behavior she or he wants to understand. Like a model C* of C on a computer, such a model inside M will be neither faster nor simpler than C, but this is not so much of a problem: many of the questions one would want to ask about C are best answered by inspecting C itself anyway, in a way that would not be possible for questions about M itself. Computers cannot report about themselves, but we are not totally dependent on inferencing about them as we are with animals; computers allow for a certain amount of inspection (we can read their programs, e.g.). What we cannot ask the computer (at least not as ordinary users) are those questions which (in Minsky's terms) really are questions about C*, i.e., questions of a more general character, like which kind of questions C can answer. If a computer reports an error ("I cannot understand this input") we cannot sensibly ask it, " W h a t kind of input would you be able to understand then, if I may ask?". We simply may not ask, at least we cannot direct the question at the computer C itself. In order to answer these questions, a model C** of C* is needed, and this model can exploit a metaphorical understanding of C in terms of M. If we look at how people deal with this problem in practice, we find that they usually direct their question either at the manual or at the superuser next door. Both are expected to be able to function as this model C**. Manuals are useless most of the time, and only rarely give us the answers we are looking for, simply because they are not conceived of as such models. They are often little more than sophisticated descriptions of the inner workings of C and seem to have been written in happy ignorance of what kind of questions they should provide the answer to. Superusers sometimes can help, but they are rarely capable of formulating how they arrived at their superior knowledge: they have a poor model of themselves. But if better models both of C, exploiting human-machine metaphors of the right kind, and of the users of C could be developed, this could help the empowerment of computer users. This does, of course, not mean that computers really are people (just as we have seen that it is not the case that people are machines). But it does mean that it sometimes may help to look at them as if they were (albeit very quaint) people. This is what we do in our everyday metaphorical talk about computers and this talk is legitimate, as is every metaphorical effort at understanding something. Meaning, after all, is in the use. The fact that vintage machines lose their meaning for us follows from their uselessness. If meaning only emerges from use, then being without use must mean being without meaning. If a sentence is not used, but only enumerated by a machine, it stands out from its background in Gustafsson's sense,
~2By programmed computer, I am not referring to the hardware but to what one often calls a system or the program, i.e., whatever users experience as the instance they interact with.
H. Haberland
exactly in the way as sentences stand out as numbered examples in a standard grammatical treatise. Being thusly exposed, they become visible, but they also become homeless. We still know that they must have a meaning (grammars only generate objects with potential meaning), but we do not know where to apply for their meaning. If indeed we knew where to apply for such a meaning of the computers we are dealing with, we would finally be able to settle Humpty-Dumpty' s question. REFERENCES
Edelman, Gerald M., 1989. The remembered present. New York: Basic Books. Engstrom, G6ran, 1977. Some analogies between adaptive search strategies and psychological behavior. Journal ofPragmatics 1(2): 165-170. Gustafsson, Lars, 1969. <<Maskinerna". En sj~lvanalys. In: L. Gustafsson, Utopier och andra om "dikt" och "liv". Stockholm: PAN/Norstedts. 33-41 Haberland, Hartmut, 1984. A field manual for readers of"The problem of meaning in primitive languages" by Bronislaw Malinowski. ROLIG papir (Roskilde) 31:17-51. Haberland, Hartmut, and Jacob L. Mey, 1977. Editorial: Pragmatics and linguistics. Journal of Pragmatics 1:1-11 (reprinted 1995 with a postscript in Asa Kasher ed. Pragmatics. London: Routledge. Critical assessments series) Lakoff, George, and Mark Johnson, 1980. Metaphors we live by. Chicago: The University of Chicago Press. Malinowski, Bronislaw, 1922. Argonauts of the Western Pacific. London: George Routledge and Sons. Malinowski, Bronislaw, 1923. The problem of meaning in primitive languages. Supplement I to C. K. Ogden and I. A. Richards, The meaning of meaning, 296336. London: Kegan Paul, Trench and Trubner. Mey, Jacob L., 1972. Computational linguistics and the study of linguistic performance. Computers and the Humanities 6:131-136. Mey, Jacob L., 1984. And ye shall be as machines. Reflections on a certain kind of generation gap. Journal of Pragmatics 8:757-797. Mey, Jacob L., 1994. Adaptability. In: R. E. Asher ed., Encyclopedia of language and linguistics, vol. 1., 25-27. Oxford: Pergamon Press. Minsky, Marvin L., 1968. Matter, mind, and models. In: Marvin Minsky, ed., Semantic information processing, 425-432. Cambridge, MA: The MIT Press. Peters, P. Stanley, and R. W. Ritchie, 1969. A note on the universal base hypothesis. Journal of Linguistics 5:150-152. Peters, P. Stanley, and R. W. Ritchie, 1971. On restricting the base component of transformational grammars. Information and control 18:483-501. Turing, Alan M., 1950. Computing machinery and intelligence. Mind N.S. 59:433-460 Wilks, Yorick, 1974. One small head - models and theories in linguistics. Foundations of Language 11:77-95.
Chapter 6
Department of Public and Social Administration City University ofHong Kong
INTRODUCTION In this paper, I will first give a general discussion of the strategy of distinguishing what David Mart called "different kinds of explanation at different levels of description" (1982: 20) (" different levels of explanation", in short), and then briefly examine the levels of explanation in software engineering. It will be argued that the strategy of distinguishing levels is applicable to psychology and that there is a close parallel between the levels of explanation identified in software engineering and those in psychology and artificial intelligence (AI). The strategy of distinguishing levels also looms large in complexity analysis. I will show that complexity analysis allows us to raise a new form of skeptical challenge to human rationality. It will be argued that such a challenge can be answered not by examining cognitive processes going on inside the head but by examining the structure of the environment in which cognition functions. Finally, we shall see that the skeptical challenge to human rationality also poses serious threats to the possibility of cognitive technology because the challenge is based on the claim that cognitive problems are in general intractable. I will show that an examination of how the human mind deals with the problem of intractability can provide an answer to the question of how cognitive technology is possible. The answer will be outlined in the manifesto of a research strategy in the final section. LEVELS OF SCIENTIFIC EXPLANATION The strategy of distinguishing different levels of explanation is common in many disciplines. Explanation is often multi-leveled because the same set of phenomena explained by a theory at a certain level are generated by different underlying mechanisms which are the subject matter of an explanation at a lower level. We can either explain a set of phenomena at one level without employing detailed knowledge about its underlying mechanism, or explain the phenomena at a lower level by deriving them from such knowledge if we possess it. For instance, let's look at the input-output behavior of a digital device as shown in figure 1.
Ho Mun Chan
Fig. 1 The device can be explained by a number of equations in binary arithmetic that define a half-adder, as shown in Table 1 p
c s
Table 1 There are many ways of constructing a half-adder. An example of one possible circuit design is illustrated in figure 2:
Figure 2 The upper device in the figure is an AND gate while the lower is an XOR (exclusive or) gate. The AND gate will give an output 1 if and only if both inputs are 1, whereas the output of an XOR gate is 1 if and only if the inputs are different. The above is only one of the possible circuit designs of a half-adder. Even if the logic circuits of two halfadders are the same, the logical components may be realized differently. In one halfadder, the logic gates may consist of relays, while the logic gates of another may be made up of electronic tubes. In this example we can see that there are three levels for explaining the device defined in Table 2. At the top level, we explain the behavior of a device by specifying the function computed by the device. At the second level we give the configuration of logic gates. At the lowest level we describe the electrical/electronic elements that make up the logical components. The following table summarizes the basic components at the three levels.
Levels of Explanation
Logic Gates
Electrical/electronic elements. Table 2
There are several reasons why it is useful to distinguish different levels of explanation. First, high level explanations are abstractions that make prediction simple. For instance, by knowing that the device in Fig. 1 is a half-adder, we can predict its behavior without examining the inner structure of the device. The advantage of using high level explanations is often gained at the cost of exactness; still such explanations frequently provide good approximations that are adequate for many purposes. The input-output states of a digital device, such as the one in Fig. 1, are regarded as discrete states (either 0 or 1), but in reality these input-output states are not. The device successfully serves as a half-adder, because the deviation from the discrete states is in general small unless malfunction occurs. A high level explanatory model, being an idealization in the sense that it describes normal functions of a mechanism, can also be regarded as a normative model for evaluating the performance of systems. When the behavior of a system deviates from the idealization, we can say that malfunction occurs. In that case, the behavior has to be understood at a lower level. For instance, the malfunction of a digital device has to be explained in terms of the failure of its inner components. However, for the purpose of explanation, there is usually not much reason to go down to a lower level unless in the case of malfunction. The above idea has important bearings on the construction of psychological explanations. In his book The Adaptive Character of Thought, John R. Anderson (1990) shows that much behavioral data reported in the studies of memory, categorization, causal reasoning, and problem solving, can be explained by characterizing the functions computed by these four cognitive activities without knowing much about the underlying mechanisms of those activities (see also Anderson 1991. His position will be examined later in this paper. Second, high level explanations help us avoid what Anderson (1990, 1991) calls the "induction problem" and "identification problem". These are variants of what philosophers of science call the problems of underdetermination and of indeterminacy respectively. The former problem concerns how one can correctly infer an underlying mechanism from the observable behavior it produces, because no theory about the underlying mechanism is conclusively verifiable. The latter problem consists in how to pick out the right member from a class of mechanisms which are equivalent in their behavioral consequences. We have seen that a high level explanation enables us to predict and explain behavioral data without knowing much about the underlying process. As we shall see later, this feature leads Anderson to argue that we can do much psychology without solving the induction and the identification problems, so that the two problems may be avoided. Third, although a high level explanation does not uniquely determine the underlying mechanism of a phenomenon, the high level explanation narrows down the search space for the correct mechanism. By understanding the function computed by a device, mechanisms that do not compute the same function will be excluded. In other words, although a high level explanation does not specify exactly what the correct mechanism
Ho Mun Chan
is, it helps us exclude the wrong ones. If we do not know what functions are to be computed, we could end up proposing mechanisms that compute the wrong functions. Computing the wrong functions often means that the mechanisms constructed are only capable of solving problem in artificial situations or toy domains, such as a block world. These mechanisms are not capable of solving interesting problems in the normal environment. Thus, as Anderson (1991) points out, high level explanations help identify the fight mechanisms to solve the induction and identification problems. S P E C I F I C A T I O N AND I M P L E M E N T A T I O N The strategy of distinguishing different levels of explanation is also employed in software engineering. We shall briefly discuss the levels identified in this discipline since a close parallel will be drawn between these levels and the different levels of explanation in psychology and AI. In computer science, a specification defines the task to be executed by a software system. It gives an exact statement of the input-output function to be computed 1 , and provides a criterion against which we can judge the performance of a software system. A specification can also be regarded as defining a problem that a programmer wants to solve by constructing an algorithm. The resulting algorithm is called an implementation 2 of the specification. Implementation is a many-one relation, for a specification can always be implemented by different algorithms; "implementation" is simply a computer science term for what I have called "realization". Once an algorithm is constructed, we can implement it by writing a program. The resulting program can be regarded as an implementation of the algorithm. Again, there is more than one way to implement an algorithm, such as employing different programming languages or using different data structures in the same language to implement the algorithm. But this is not the whole story yet. If the program constructed succeeds in computing the task required, it is assumed that the programming language is correctly implemented. Such an implementation requires a compiler or an interpreter for that language, and the implementation of a compiler or an interpreter involves some further, low level implementations, such as in machine code, in microcode, and eventually in computer hardware. We can extend the concept of specification to lower levels and regard every implementation as a specification of what is to be done on the next level (Swartout and Balzer, 1983). For instance, we can say that an algorithm which is an implementation of a top-level specification is at the same time a specification of a program. Hence, a software system can be understood as a hierarchy of levels where each level stands in a specification-implementation relation to the next lower level, and the behavior of the system can be explained by using descriptions at different levels. However, in this paper" specification" is used in the narrow sense to mean a top level specification.
1 In this chapter "(input-output) function to be computed", "problem to be solved", "task to be implemented/performed" will be used interchangeably. 2 The word "implementation" displays a "product/process" ambiguity (Galton, 1993: 120). One can point to a constructed program (product) saying that it is an implementation of a specification. What I mean by "implementation" in this section refers to the product of realizing a specification.
Levels of Explanation
A specification can be ante hoc, post hoc, or de facto (Galton, 1993). It is ante hoc if it is used to guide the construction of a software system. The classic software cycle gives us an overly idealized and naive picture, according to which that specification should be completed before implementation 3 begins. A more realistic picture says that specification and implementation are intertwined. Modifications of specification are likely to occur as implementation proceeds, either because of physical limitations which make full-implementation impossible, or because of imperfect foresight that overlooks the implications of, and interactions in, an implemented system. Implementation is often a multi-step process, since at each stage a specification may need to be modified (for the above two reasons) before the implementation can proceed further (Swartout and Balzer, 1983). Such intertwining is a sign that different levels are constraining one another in the software system. At the other extreme is a post hoc specification, which is the practice of developing a software system without writing down a specification before the development is completed (Galton, 1993). The programmer has a rough idea of what task s/he is going to implement, and is guided by shifting goals and the demands of software production. The specification emerges only as the implementation progresses. Once the final product is completed, the specification is made explicit for the purposes of documentation and maintenance. Finally, a specification is de facto if it is concerned with a system already in existence (Partridge, 1986). Such a system is not the product of a programming exercise, but it is fruitful to understand the system as if it were the implementation of some specification. The human cognitive system is an important case to which the concept of de facto specification may be applied. THE THREE LEVELS OF PSYCHOLOGICAL EXPLANATION In his Vision, Marr endorses the general thesis that there are different kinds of explanation at different levels of description for understanding the behavior of a complex system, such as a bottle of gas, the flight of a bird, or an IBM 370 (1982: 1920, 27, 337). He posits three specific levels of explanation that are required for understanding a complex information processing system (Ibid.: 19-29; see also Marr, 1977): 1. The computational4 level, at which a system is explained in terms of what it does, i.e., what input-output function it computes, and why the function is computed. These two questions are answered by a computational theory which contains an account of the physical constraints that make the I mean by 'implementation" in this paragraph refers to the process of trying to realize a specification. 4 Many commentators on Marr's work say that the term "computational" sounds misleading, as this level is not concerned with process (Boden, 1989: 38; Anderson, 1990: 6; Dennett, 1987: 74-75). However, it seems that Marr's use of this term arises from his conviction that the brain can be compared to a computer (1982: 5). Such a comparison is not supposed to be made at the representational and algorithmic level, partly because brain processes are basically parallel, while those of a computer are serial (Ibid. 27). Instead, the comparison is made at the top level by virtue of the fact that a computer can be programmed to do the same task as performed by the brain. So Marr's use of the term "computational" makes perfect sense although he makes no real attempt at saying what is going on inside the head at the computational level. 3 What
Ho Mun Chan computation possible. These constraints are assumptions that are in general true of the environment. 2. The representational and algorithmic level, at which a system is explained in terms of how an input-output function is executed, by analyzing the input/output representation and the algorithm involved in the process of execution. 3. The hardware implementation level, at which a system is explained in terms of the physical mechanism in which a representational and algorithmic process is realized.
Put briefly, Marr's three levels are concerned with what is done and why, how it is done, and what does it, respectively. Since one of the concerns at the computational level is what is computed by a device, a computational theory serves as a top-level specification of the task implemented by an information process. For the moment, I will focus on the specification role of a computational theory. If we ignore the computational level questions of why a task is implemented and what are its physical constraints, we can see that there is a close match between Marr's three levels and the different levels in computer science, except that Marr does not break down the realization of the algorithm at the hardware level into substages. (I will discuss the question of why a task is implemented in Section 7.) If the object of study is human cognition, a computational theory about a cognitive system, say, vision, will serve as a de facto specification of the underlying mechanism, for we are uncovering the function computed by a biological mechanism that is already in existence and which is not the product of some programming exercise. A computational theory can also serve as an ante hoc specification which guides the construction of an AI program or a psychological model at the representational and algorithmic level. Marr assigns critical importance to the computational level, and believes that we should specify the task to be implemented by a cognitive system before we make conjectures about the representation and algorithm employed and its hardware realization in the brain, and before we build an AI system to implement the task. Marr believes that successful research in vision requires knowing, at the computational level, what tasks are to be performed by the visual system. That is why computational theories are constructed to specify various vision tasks before Marr moves on to examining how they are realized, first at the representational and algorithmic level and then at the hardware level. Marr asserts that the attempt to understand vision at the hardware level only by studying neurons is like understanding bird flight by studying only bird feathers. He also points out that algorithms of many vision tasks, such as stereopsis, fail because they do not compute the right function (Mart, 1982 111-124, 336), which shows that it is dangerous to study cognition without knowing what task is to be performed by a cognitive process. It is for the purpose of this reasoning that the computational level is alleged to be critically important. Like a specification of a software system in defining the task to be computed, a computational theory defines the problem to be solved by an algorithm operating on some representation. The problem to be solved has to be spelt out before one can propose any mechanism as a solution to the problem. As Marr says, "vision research is progressing because it is the problem of vision that is being attacked, not neural visual
Levels of Explanation
mechanisms" (1977:143). Marr believes that without understanding the problem to be solved, research in M "can easily degenerate into the writing of programs that do no more than mimic in an unenlightening way some small aspect of human performance" (Marr, 1977: 142). The research strategy he recommends can be compared to the classic software cycle, where the construction of M programs or psychological model is guided by a computational theory serving as an ante hoc specification. Marr's research strategy goes against a popular trend. In cognitive psychology, a current view is that the most important thing is to explain the mental structures and procedures involved in a mental process, and many psychologists have been quick to test their theories by constructing programs, without first analyzing the task to be performed by the process. In M, a prevailing attitude is that we have to use some tricks at the representational and algorithmic level to construct a working program. Often, there is no analysis of what is supposed to be computed, or what has actually been computed. Sometimes a program often ends up computing the wrong thing, and in many cases only an insignificant fragment of the task is in fact generated. As Marr says, For far too long, a heuristic program for carrying out some task was held to be a theory of that task, and the distinction between what a program did and how it did it was not taken seriously. As a result, (1) a style of explanation evolved that invoked the use of special mechanisms to solve particular problems, (2) particular data structures, such as the lists of attribute value pairs called property lists in the LISP programming language, were hem to amount to theories of representation of knowledge, and (3) there was frequently no way to determine whether a program wouM deal with a particular case other than by running the program (1982: 28).
The view is echoed in Hayes' work: .... there is still a prevailing attitude in AI that research which does not result fairly quickly in a working program of some kind is somehow useless or, at least, highly suspicious. Of course implementability is the ultimate test of the validity of ideas in AI, and I do not mean to argue against this. But we must not be too hasty (Hayes, 1985: 469). I will bet that there are more representational languages, systems and formalisms developed by AI workers in the last ten years than there are theories to express in them. This is partly because of the pressure to implement already mentioned, but it is also due to a widespread feeling that the real scientific problems are concerned with how to represent knowledge rather than with what the knowledge is (Ibid.: 484, see also Hayes, 19 79a: 198).
Although many research projects in cognitive science and M have the shortcomings pointed out by Marr and Hayes, the research strategy recommended by them seems to be too stringent. As in so~ware development, some form of post hoc specification may emerge as we go along. In many interesting cases, people have started their research with only a rough idea of what task is to be accomplished. As in the case of software development, through a series of mutual adjustments and refinements at the levels of specification and implementation, one can sometimes come up with an ingenious
Ho Mun Chan
algorithm that gives the right result on a wide range of input and a specification of at least an interesting subproblem solved. This is not to deny the critical importance of knowing what is to be done and taking the distinction between what and how seriously; but the issue at stake is not whether the question of what should temporally precede the question of how. The question of what is only conceptually but not temporally prior to the question of how. If one knows the input-output function computed by one's program, although such knowledge may come atter the construction, one can avoid building a program that generates an insignificant range of behavior, and avoid assigning full-generality to a simulation without adequate ground. COMPUTATIONAL COMPLEXITY AND A SKEPTICAL CHALLENGE TO HUMAN RATIONALITY The notion of level looms large in the study of computational complexity. In this section, I will briefly discuss the basic ideas of complexity analysis, and argue that a skeptical challenge to human rationality can be formulated at the computational level. The problem of complexity can be addressed at each of the three levels of explanation. What is usually called "computational complexity theory" deals with the analysis of the complexity of input-output functions defined at the computational level. Such an analysis is conducted independently of the algorithm and the physical device used to compute the function, and the result of the analysis is applicable to all possible implementations of the task in question. As we have seen, task specification at the computational level serves to define the problem to be solved by an algorithm. Thus, complexity analysis at this level can also be regarded as an analysis of problem complexity. 5 It is often stated that computational complexity theory deals with the computational cost or difficulty of algorithms; however, strictly speaking, the theory deals with problem complexity. Although the complexity of a problem applies to all possible algorithmic solutions, problem complexity must be distinguished from algorithm complexity, which is the computational cost of a particular algorithm. If a problem is unsolvable or intractable, no algorithm- however ingenious its design - can make it tractable; but if a poor design algorithm is used, it may complicate the situation by offering an unworkable solution to a tractable problem. Algorithm complexity must therefore be analyzed at the representational and algorithmic level. One algorithm can be shown to be more efficient in solving a problem than another, but the improvement is limited by the problem's complexity. For example, adding Roman numerals is more difficult than adding Arabic numerals because of the difference in complexity of the two procedures at the representational and algorithmic level; however, the problem complexity of the two procedures is the same. 6 We can get a feeling for the intractability of a problem by examining a familiar method of determining whether a formula in propositional calculus is satisfiable, i.e., 5 1 shall use "problem complexity", "task complexity", "complexityof a function", and "complexity of an input-output function" interchangeably. 6 Complexity can also be analyzed at the hardware level. For example, a task will be executed faster by an IBM 486 PC than by its predecessors but the improvement in efficiencyis limited by complexity constraints at the other two levels.
Levels of Explanation
whether it is possible that the formula is true (see Cook, 1971). If the formula has n distinct atomic propositions, there are 2n combinations of truth possibilities to be considered, and the truth table of the formula will have 2n rows. If n = 100 for a given formula, a computer executing one million instructions per second would (in the worst case) take 40,000 trillion years to determine whether the formula is satisfiable. By comparison, the "Big Bang" that began the universe occurred about 15 billion years ago (Harel, 1992: 165-167, 173-174). Since the number of propositions that we believe to be true exceeds one hundred, the impractically long time required to test satisfiability explains why it is so difficult to maintain consistency in our belief systems (see Cherniak, 1986: 93, 143; Cherniak, 1984: 255). Note that the problem complexity of a task does not stem from limitations of computational speed and memory space. If a problem is intractable, even the fastest processor with unlimited memory and equipped with the best algorithmic solution could not overcome its intractability. Complexity analysis can be applied not only to logical/mathematical tasks. As we shall see later, the complexity of other cognitive tasks, such as vision, can also be analyzed at the computational level, and many cognitive problems have been shown to be intractable. These results pose a skeptical challenge to human rationality: if many cognitive tasks are intractable, how are humans able to survive? Since these results are obtained at the computational level, the challenge can be made without examining the limitations of the underlying implementational process of human cognition. Traditional skeptics base their challenge to human rationality on the fact that the capacity of human cognition is limited. Now results in complexity analysis seem to show that cognitive failure can also arise from the complexity of tasks that humans are required to solve and that humans fail to perform these tasks well, even if the brain is the fastest machine in the universe. Thus the skeptical challenge posed by these results is indeed new. In what follows, I will further illustrate this challenge to human rationality and how it can be answered, by discussing the complexity of human reasoning in and the work of Marr (1982) and Anderson (1991). A FUNDAMENTAL TRADE-OFF IN KNOWLEDGE REPRESENTATION Consistency tests are among the many reasoning tasks that have been shown to be intractable or even unsolvable by complexity analysis. Some cognitive scientists mistakenly believe that the intractability of these tasks arises from the use of logical symbols or their equivalents and that the problem of intractability can be overcome by employing the right sort of representation. It is maintained that AI programs using logical symbols and rules can handle only problems in toy domains, and that these programs will fail because of intractability when scaled up to deal with a realistic domain (Evans, 1993; Oaksford and Chater, 1991, 1993; see also Holyoak and Spellman, 1993). However, the sources of computational complexity in reasoning have been misunderstood by these cognitive scientists. If a problem is intractable, it is so, irrespective of what sort of representation we use in solving the problem. Logical representation is only a formalism used in studying computational complexities; it is not a source of complexities. Thus, where various heuristics, algorithms, and knowledge representation techniques are proposed to solve the problem of intractability, tractability is often gained at the expense of limiting the range of a constructed mechanism to a non-realistic or even a toy domain. For instance, the STRIPS planner (Fikes and Nilsson, 1971) makes planning more manageable in many
Ho Mun Chan
cases, but for some time we had had no exact idea of the conditions under which it does and it does not work. Only recently has Lifschitz (1987) worked out a post hoc specification of STRIPS and proved that it works only in an environment that can be characterized by state descriptions that are neither logically nor causally connected. It only works in such an environment because logical and causal connections are not expressible in a STRIPS representation (Georgeff, 1988). The range of applicability of STRIPS is therefore rather limited. In other words, tractability is gained at the expense of confining the range of applicability to a tractable subproblem of an intractable problem. As Levesque and Brachman (1985) have pointed out, there is always a tradeoff between tractability and expressiveness in knowledge representation. The result of such trading limits the range of applicability of a mechanism to a tractable subproblem of an intractable problem. What we have learned from the story of STRIPS is applicable to other techniques in knowledge representation that have been developed to facilitate the inference process in other areas. Schematic representations such as frames, semantic nets, and mental models are popular in AI and psychology (Minsky, 1975; Schank, 1975; Schank and Abelson, 1977; Simon, 1983; Johnson-Laird, 1983; Johnson-Laird and Byrne, 1991), and in studies of science (Chi et. al., 1985; Chi, 1992; Nersessian, 1992; Giere, 1988; Thagard, 1988, 1992). Since schematic representations and their manipulations are alleged to be non-logical, some people have been tempted to think that formal theories developed by logicians do not have much to do with reasoning, and that the use of schematic representations is a way to get around the problem of the intractability of logical reasoning. In his article "The Logic of Frames", Hayes is able to show that the frame language can be translated into predicate logic (Hayes, 1979b; see also Nilsson, 1980, Ch. 9). Thagard thinks that such a translation only shows that predicate logic and schematic representation are expressively equivalent, and that there are procedures or schemata which would be very difficult to implement in a propositional system; Thagard maintains that the two representations are not procedurally equivalent (Thagard, 1988: 30-1). Thagard's point is correct, but well known. The efficiency of an algorithm ot~en depends on the representation chosen. However, Hayes' translation can serve as a post hoc specification that the logical tasks undertaken by schematic representation and its translation, also allow us to determine whether tractability is gained at the expense of expressiveness and the range of applicability, as in the case of STRIPS. Indeed, based on such a translation, Levesque and Brachman (1985) successfully show that various schematic representations can work satisfactorily only within limited subdomains. Since the tradeoff between tractability and expressiveness/range of applicability is unavoidable, one may wonder how the problem of intractability can be overcome at all. Here, I believe we can learn from what computer scientists do when confronted with the problem of intractability. Their usual strategy is to focus on tractable subproblems (of an intractable problem) that are of practical significance and develop fast algorithms to solve them. Such a strategy may have been implanted into the brain by an evolutionary pressure and natural selection. That is to say, the brain may have developed solutions to those tractable subproblems that are crucial for human survival. Thus, gaining tractability at the expense of limiting a mechanism's range of applicability is not necessarily a bad thing, provided that the mechanism solves some interesting subproblems (cases) of an intractable problem. In the case of reasoning, it is
Levels of Explanation
very likely that the human mind has developed algorithms to solve tractable reasoning tasks that loom large in daily life. This hypothesis is confirmed by a body of recent work which has identified different types of useful and tractable inference in text comprehension, in natural language processing, and in knowledge base systems, and there is evidence that the human mind is equipped with mechanisms to deal with these classes of inferences (Levesque, 1986, 1988; Stenning and Oaksford, 1993; Crawford and Kuipers, 1991; Givan, McAllester and Shalaby, 1991; Shastri and Ajjanagadde, 1990). F R O M COMPLEXITY ANALYSIS TO ECOLOGY Task analysis and complexity analysis only allow us to specify the problem domain of a cognitive mechanism and the complexity of the task performed. They enable us to know how tractability of a cognitive mechanism is gained by limiting the range applicability to a certain domain. But these analyses are silent on the practical significance of the domain in question. In order to know this significance, one needs to analyze the environment in which the mechanism is supposed to work and examine whether the mechanism is tackling tasks that the environment requires it to solve. If that is the case, the mechanism is not solving problems in a non-realistic or toy domain. Knowing the situation in which a problem has to be solved looms large in Marr's work on vision. Marr (1982) criticizes earlier vision research on the grounds that the algorithms proposed, even though they are fairly tractable, work only in a toy domain. As in the case of STRIPS, one can render the computation of a vision task manageable by making assumptions about the situation in which the computation works, while the assumptions themselves are not true in normal situations. This is why Marr accuses earlier vision research of not computing the right function that the visual system requires to solve in the normal environment. Given the complexity of visual tasks, humans are not equipped with the capacity to cope with all possible visual problems in all possible conditions. Indeed, as many experiments have shown, illusions and errors are likely to occur in some situations because of the limited capacity of the visual system. However, the visual system has been tailored by natural selection to solve frequently recurring problems in usual environments. It is therefore plausible to assume that tractability is gained at the expense of having a visual system with a limited capacity, which enables it to deal with tractable cases in usual environments. In his Vision, Marr tries to show that the complexity of various vision tasks is reduced if the operations of the visual system embody certain environmental assumptions about the visual world. Marr calls these assumptions physical constraints v, and he believes that these constraints allow the visual system to do what it does (1982: 23, 68, 103). Marr believes that the description of physical constraints plays an important role at the computational level, for an explanation at such a level not only specifies what is to be computed, but also why. To answer both questions, one not only specifies the input-output function to be computed, but also the physical constraints that make the computation possible (Marr 1982: 22-23). The latter step 7 The term "constraint" has a further meaning in Marr's work. It also means constraints on the computational process. Such constraints are usually derived from physical constraints on objects.
Ho Mun Chan
indicates that the purpose of doing the computation is not to solve a problem in a nonrealistic world, but to solve a problem in normal visual worlds where the constraints are in general true. Mart's view can be illustrated by his ideas on stereopsis. For any n points in an image and the corresponding n points in the other image, there are n2 possible matchings. Even if we assume that each point in an image is caused by a single dot (a small region of light intensity) in the scene, there are n! possible combinations of dots that will give rise to the same sets of n points in each image. Consequently, the space complexity of the search for the best match is exponential, for nT grows faster than 2". It is amazing that the visual system can often settle very quickly on a single combination of n dots. Marr believes that it is possible because of three physical constraints: compatibility, uniqueness, and surface smoothness (Ibid.: 111-116). The first constraint is based on the assumption that an element in an image and the corresponding element in another image have similar parameters since they are caused by the same well-defined region on a surface. The second constraint is that a point on a physical surface has a unique position in space at any one time. The third constraint is that, due to the cohesiveness of matter (by this I mean that it occurs in fairly large chunks), the surfaces of objects are generally smooth. In other words, neighboring points on a surface are likely to be similar in their distances from the viewer. These physical constraints allow Marr to derive some computational constraints, such as the rule that a black dot in one image can at most match one black dot in another image, which in turn guides the search for the best match from binary images. With these computational constraints, the matching problem has a unique solution and the computational cost is greatly reduced. Otherwise, it would be very difficult to obtain the best match among nT possible combinations, solely based on the information derived from image disparity. The task complexity of stereopsis implies that stereo vision is possible owing to the "mercy" of the environment. Had our normal visual world been "a swarm of gnats or a snow storm" (Roth and Frishby, 1986: 159), stereo vision would not have been generally possible, and had the physical constraints not been in general true of the world, the visual system would have been solving a problem in a "toy" domain. How to avoid intractability has long been a concern of vision research, given that the visual system has limited capacity. Complexity analysis can be a useful tool for identifying manageable tasks of practical or ecological significance that could practically be realized by the brain or some other physical device (Tsotsos, 1988, 1990). In his The Adaptive Character of Thought, Anderson (1990) adopts a framework very similar to Marr's three levels. He believes that instead of examining the processes going on inside the head, we can study cognition by examining what problem is to be solved in the environment by a cognitive mechanism with limited capacities. He calls this level of study the "rational level", which is the equivalent of Mart's computational level. Anderson rightly points out that Mart's work contains the unstated assumption that vision is adapted to solve problems arising from normal environments, and consequently, problems which are tractable (Ibid.: 7). Although this assumption is correct and useful, it does not guarantee that the visual system has mechanisms capable of solving all these tractable problems. Marr assumes that such mechanisms exist, when he proceeds to explain how the solutions to the problems are implemented on the
Levels of Explanation
representational and algorithmic level, as well as on the hardware level. Anderson tries to make Marr's unstated assumption explicit by stating a General Principle of Rationality. The cognitive system operates at all times to optimize the adaptation of the behavior of the organism. (Ibid.: 28) Anderson believes that with such a principle, we can predict the behavior of a cognitive system solely by understanding the problem that the system has to solve in order to attain its goal in the given environment. Such predictions do not require knowledge of how the solution is implemented in the brain, which leads Anderson to draw the astonishing conclusion that we can do much psychology without knowing the underlying process of cognition. The general principle of rationality plays a crucial role in Anderson's argument. If the principle is not true and the cognitive system otien falls short of being able to solve the problem specified at the computational level, we will not be able to predict its behavior without knowing what is going on inside the head. We will be in a position like that of trying to predict the behavior of a poorly designed computer program from its specification. Since the program often fails to do what it is supposed to do, our predictions will not be accurate. Consequently, we will be forced to go down to the implementation levels to understand the behavior of the program. Since the general principle of rationality implies that going down to the implementational levels is unnecessary, we should be able to predict the behavior of a cognitive system quite well by understanding the task that the system is supposed to perform at the computational level. I will illustrate the above ideas by examining Anderson's rational analysis of memory. The goal of memory is to retrieve needed information. If the general principle of rationality is true, memory performance in recalling an item should mirror the probability of the item being needed in the environment. In psychology, memory performance is measured by latency and probability of recall, i.e., the speed and accuracy of memory performance. In addition, if the general principle of rationality is true, these two measures should be monotonically related to the probability that an item is needed. Based on findings from data on library borrowings and file access, New York Times articles, and electronic mail messages, Anderson makes a few assumptions about the probability distribution of needed items in our normal environment (Anderson, 1991). Items satisfying these assumptions are proved to have the following properties. First, the more recently an item was used, the more likely it is needed. Second, the greater the number of times an item was used, the more likely it is needed. Third, if the uses of an item are clustered together within a short period, the probability that it is needed is lower. Experimental results in memory research shows that there is a strong correlation between memory performance for an item and items with the above three properties (see Anderson, 1991). The recency effect is evidence that memory performance decreases with the delay between experience and test; the frequency effect is evidence that memory performance increases with the number of times some item is used; and the space effect is evidence that items clustered together within a shorter time are less likely to be remembered. These results support the claim that the memory system is tailored to solve information retrieval problems arising from the environment in which
Ho Mun Chan
the above four assumptions are true. Based on this claim, we can predict experimental data, such as the recency effect, without examining the underlying process of human memory. The work of Marr and Anderson shows that human cognition is well adapted to solving problems in our normal environment. Cognitive tasks are intractable only if a system aims at performing them in all possible situations; they are tractable if a system only aims at handling cases in normal situations. A cognitive system can overcome the problem of intractability by making assumptions that are in general true of the normal environment. This undermines the skeptical challenge to human rationality which stems from the claim that many cognitive problems are intractable. Finally, by identifying the problems that a cognitive system is tailored to solve in the environment, we are able to predict rather accurately the behavior of a cognitive system without knowing much about its underlying mechanism, given Anderson's General Principle of Rationality. H O W IS COGNITIVE TECHNOLOGY POSSIBLE? THE MANIFESTO OF A RESEARCH STRATEGY Finally, we should note that the skeptical challenge to human rationality that I have discussed, in as much as it stems from the claim that cognitive problems are in general intractable, can readily be transformed into a threat to the possibility of cognitive technology. If the challenge were to be upheld, cognitive technology would be doomed to failure because no technological means, even the fastest machine in the universe, could enable us to solve intractable problems. However, the way in which human cognition deals with the problem of intractability in the normal environment provides important insights that help us cope with problems of the same kind as we encounter in constructing artifacts that assist human beings in solving cognitive tasks. These insights can be acquired by studying human cognition because it provides the best example of how a mechanism can cope with the environment. Now, based on the survey on human cognition conducted in this paper and the strategy of distinguishing different levels of explanation, I would like to propose the following research strategy for constructing useful artifacts in cognitive technology. 1. We should give up the ambition of constructing a so-called "General Problem Solver", because intractability makes the goal of solving cognitive problems in all possible situations unattainable even by the fastest machine in the universe. As we have seen, many seemingly simple tasks, such as consistency tests or stereopsis, are intractable, and we can only deal with the tractable cases that our survival requires to solve in the normal environment. The project of constructing intelligent devices that are capable of solving all (or indefinitely many) kinds of problem in any possible situation is doomed to failure. 2. We should tackle the (sub)problems arising from the normal environment instead of trying to solve them in all possible situations. Partial (nonuniversal) solutions are more likely to be tractable and solvable by the constructs that we can build. As we have seen in the discussion of the fundamental trade-off in knowledge representation, limiting a mechanism's range of applicability is perfectly acceptable if tractability can be gained and the range is of ecological significance. The evolutionary process has already
Levels of Explanation
tuned the mind to deal with tractable problems that are crucial for human survival, and it is "fortunate" that the survival of human species only requires solving tractable in the normal environment that are in fact tractable; otherwise, human beings would have been extinguished. Similarly, cognitive technologists should tune their artifacts to solving problems arising from a specific environment. Of course this does not guarantee that such a strategy can always generate fruitful results, because the problems needed to be solved in a specific environment may still be intractable. However, if that is the case, no other strategy will be effective either. 3. An ecological study on the environmental structure in which a problem needs to be solved can help us show that the problem is tractable. By making appropriate environmental assumptions that are in general true of the environment, we may make the engineering problem that we need to solve more manageable. In the discussion of the work of Marr and Anderson, we have seen how the mind overcomes the problem of intractability by making assumptions that are in general true of the environment. Cognitive technologists may use similar tricks to overcome the problem of intractability when designing artifacts to help humans solve problems that they need to confront in the normal environment. 4. It is not a good strategy to speculate on what's going on inside the head (a black box?) and then try to replicate these processes in a mechanism and a device, before we have properly identified the problem to be solved, because there is often the risk that the mechanism is not computing the right function and the whole effort will be wasted. As Marr and Hayes have pointed out, many working programs were produced without knowing what exactly is being computed, and many of such programs were later found not to solve any interesting problem at all (Cognitive technologists should learn the moral of this story and not repeat similar errors. They should follow Marr's good advice that the problem to be solved has to be identified first at the computational level before we proceed to lower level implementations. 5. However, if for whatever reason (e.g. an application for research funding requiring a demo) a mechanism needs to be built before the problem to be solved is (fully) recognized, we must make sure that a post hoc task analysis will be carried out to demonstrate that the mechanism is solving some interesting problems, not just problems in a toy domain. No cognitive technologists should assign full generality to their products unless they can demonstrate that the mechanism is solving problems of ecological significance. We have seen examples in M and vision research that would not survive such a test, and cognitive technologists should not repeat history by creating more of the same. 6. Ecological analysis enables us to know the environmental structures in which human cognition works well. Such knowledge may enable us to help humans solve cognitive problems not by building cognitive mechanisms but by making the latter's structural features more salient. For example, human motion is guided by the structure of the optical flow, so making the structure salient may help directing people's motion. In England, yellow lines have
Ho Mun Chan
been painted across the road in some roundabouts of the A1 motorway to make drivers more sensitive to their speed (Bruce and Green, 1990: 279). CONCLUSION We have seen that the identification of different levels of explanation provides a powerful framework for understanding cognition and instructive strategies for pursuing research. Explanations at the computational level, as this paper has shown, are of primary importance. Such explanations help us understand the fundamental trade-off between tractability and expressiveness/range of applicability in knowledge representation, and by incorporating ecological considerations at the computational level, we are able to answer a skeptical challenge to human rationality (and the possibility of cognitive technology). Finally, we have seen that the strategy of identifying different levels of explanation and the examination of how the human mind overcomes the problem of intractability provide valuable insights for conducting fruitful research in cognitive technology. REFERENCES Anderson, John R., 1990. The Adaptive Character of Thought. Hillsdale, NJ: Lawrence Erlbaum Associates. Anderson, John R., 1991. Is Human Cognition Adaptive? Brain and Behavioral Sciences 14: 471-485. Boden, Margaret A., 1989. Artificial Intelligence in Psychology: Interdisciplinary Essays. Cambridge, MA: MIT Press. Brachman, Ronald J., and Hector J. Levesque, 1985. Readings in Knowledge Representation. Los Altos, CA: Morgan Kaufmann. Cherniak, Christopher, 1984. Computational Complexity and the Universal Acceptance of Logic. Reprinted in Hilary Kornblith, Naturalizing Epistemology, 1993,239-261. 2nd. ed. Cambridge, MA: MIT Press. Cherniak, Christopher, 1986. Minimal Rationality. Cambridge, MA: MIT Press. Chi, Michelene T. H., 1992. Conceptual Change within and across Ontological Categories: Examples from Learning and Discovery in Science. In: Ronald N. Giere, ed.,. Cognitive Models of Science,129-186. Minneapolis, MN: University of Minnesota Press. Chi, Michelene T. H., P. J. Feltovich and R. Glaser, 1985. Categorization and Representation of Physics Problems by Experts and Novices. Cognitive Science 5:121-252. Cook, S. A., 1971. The Complexity of Theorem-Proving Procedures. Proceedings of the 3rd Annual ACM Symposium on the Theory of Computing, 151-158. New York: ACM. Crawford, J. M., and B. J. Kuipers, 1991. Negation and Proof by Contradiction in Access-Limited Logic. Proceedings of the Ninth National Conference on Artificial Intelligence, Vol. 2: 897-903. AAAI Press/MIT Press. Dennett, Daniel C., 1987. Reflections: Instrumentalism Reconsidered.In: D.C. Dennett, ed., The Intentional Stance, 69-82. Cambridge, MA: MIT Press.
Levels of Explanation
Evans, Jonathan St. B. T., 1993. Bias and Rationality. In: K. I. Manktelow and D. E. Over, eds., Rationality: Psychological and Philosophical Perspectives, 6-30. London: Routledge. Fikes, R. E., and Nils J. Nilsson, 1971. STRIPS, A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence 2:189-190. Galton, Anthony, 1993. On the Notions of Specification and Implementation. In: Christopher Hookway and Donald Peterson, eds., Philosophy and Cognitive Science, 111-136. Cambridge: Cambridge University Press. Georgeff, Michael P., 1988. Reasoning about Plans and Actions. In: Howard E. Shrobe and AAAI, eds., Exploring Artificial Intelligence: Survey Talks from the National Conferences on Artificial Intelligence, 173-196. San Mateo, CA: Morgan Kaufmann. Giere, Ronald N., 1988. Explaining Science: A Cognitive Approach. Chicago: Chicago University Press. Giere, Ronald N., ed., 1992. Cognitive Models of Science. Minneapolis, MN: University of Minnesota Press. Givan, R., McAllester, D., and Shalaby, S. 1991. Natural Language Based Inference Procedures Applied to Schubert's Steamroller. Proceedings of the Ninth National Conference on Artificial Intelligence, Vol. 2:173-196. AAAI Press/MIT Press. Harel, David, 1992. Algorithmics: The Spirit of Computing, 2nd. ed. Reading: MA: Addison-Wesley. Hayes, Patrick J., 1979a. The Naive Physics Manifesto. Reprinted in Margaret Boden, 1989. Artificial Intelligence in Psychology: Interdisciplinary Essays, 171-205. Cambridge, MA: MIT Press. Hayes, Patrick J., 1979b. The Logic of Frames. Reprinted in Ronald J. Brachman and Hector J. Levesque, 1985. Readings in Knowledge Representation, 288-295. Los Altos, CA: Morgan Kaufmann. Hayes, Patrick J., 1985. The Second Naive Physics Manifesto. Reprinted Reprinted in Ronald J. Brachman and Hector J. Levesque, 1985. Readings in Knowledge Representation, 468-485. Los Altos, CA: Morgan Kaufmann. Holyoak, Keith J., and B. A. Spellman, 1993. Thinking. Annual Review of Psychology 44:265-315. Hookway, Christopher, and Donald Peterson, eds., 1993. Philosophy and Cognitive Science. Cambridge: Cambridge University Press. Johnson-Laird, Philip N., 1983. Mental Models. Cambridge: Cambridge University Press. Johnson-Laird, Philip N., and R. M. Byrne, 1991. Deduction. Hillsdale, NJ: Lawrence Erlbaum Associates. Kornblith, Hilary, 1993. Inductive Inference and Its Natural Ground: An Essay in Naturalistic Epistemology. Cambridge, MA: MIT Press. Levesque, Hector J., 1986. Making Believers out of Computers. Artificial Intelligence 30: 81-108. Levesque, Hector J., 1988. Logic and the Complexity of Reasoning. Journal of Philosophical Logic 17: 355-389. Levesque, Hector J., and Ronald J. Brachman, 1985. A Fundamental Tradeoff in Knowledge Representation and Reasoning. Reprinted in Ronald J. Brachman and Hector J. Levesque, 1985. Readings in Knowledge Representation, 42-70. Los Altos, CA: Morgan Kaufmann. ..
Ho Mun Chan
Lifschitz, V., 1987. On the Semantics of STR/PS. In: Michael P. Georgeff and A. L. Lansky, eds., Reasoning about Actions & Plans: Proceedings of the 1986 Workshop. Los Altos, CA: Morgan Kaufmann. Manktelow, K. I., and D. E. Over, eds., 1993. Rationality: Psychological and Philosophical Perspectives. London: Routledge. Marr, David, 1977. Artificial Intelligence: A Personal View. Reprinted in Margaret Boden, 1989. Artificial Intelligence in Psychology: Interdisciplinary Essays, 133146. Cambridge, MA: MIT Press. Marr, David, 1982. Vision. San Francisco: Freeman. Minsky, Marvin, 1975. A Framework for Representing Knowledge. Reprinted in John Haugland, ed., Mind Design, 95-128 Cambridge, MA: MIT Press. Nersessian, Nancy J., 1992. How Do Scientists Think? Capturing the Dynamics of Conceptual Change in Science. In: Ronald N. Giere, ed.,. Cognitive Models of Science, 3-44. Minneapolis, MN: University of Minnesota Press. Nilsson, Nils J., 1980. Principles of Artificial Intelligence. Palo Alto, CA: Tioga Publishing Co. Oaksford, M., and N. Chater, 1991. Against Logicist Cognitive Science. Mind and Language 6:1-38. Oaksford, M., and N. Chater, 1993. Reasoning Theories and Bounded Rationality. In: K. I. Manktelow and D. E. Over, eds., Rationality: Psychological and Philosophical Perspectives, 31-60. London: Routledge. Partridge, D., 1986. Artificial Intelligence: Applications in the Future of Sot~ware Engineering. Chicester: Ellis Horwood. Pollock, John L., 1990. Nomic Probability and the Foundations of Induction. New York: Oxford University Press. Roth, Ilona, and John Frisby, 1986. Perception and Representation: A Cognitive Approach. Milton Keynes, England: Open University Press. Schank, Roger, 1975. Conceptual Information Processing. North-Holland, Amsterdam. Schank, Roger, and Robert Abelson, 1977. Scripts, Plans, Goals and Understanding. Hillsdale, NJ: Lawrence Erlbaum and Associates. Shastri, L., and V. Ajjanagadde, 1990. An Optimally Efficient Limited Inference System. Proceedings Eighth National Conference on Artificial Intelligence, Vol. 1, 563-570. AAAI Press/MIT Press. Simon, Herbert A., 1983. Search and Reasoning in Problem Solving. Artificial Intelligence 21:7-29. Stenning, K., and M. Oaksford, 1993. Rational Reasoning and Human Implementations of Logic. In: K. I. Manktelow and D. E. Over, eds., Rationality: Psychological and Philosophical Perspectives, 136-176. London: Routledge. Swartout, W. and R. Balzer, 1983. On the Inevitable Intertwining of Specification and Implementation. Communications of the ACM 25(7): 438-440. Thagard, Paul, 1988. Computational Philosophy of Science. Cambridge, MA: MIT Press. Thagard, Paul, 1992. Conceptual Revolutions. Princeton: Princeton. Tsotsos, J., 1988. A Complexity Level Analysis of Immediate Vision. International Journal of Computer Vision 4: 303-320. Tsotsos, J., 1990. Analyzing Vision at the Complexity Level. Behavioral and Brain Sciences 13: 423-445.
Chapter 7 A G E N T S & CREATIVITY, Margaret A. Boden School of Cognitive and Computing Sciences University of Sussex, UK
When Alice had finished reciting "You are old, Father William ", at the Caterpillar's request, the following exchange ensued:
"That is not said right, "said the Caterpillar. "Not quite right, I'm afraid," said Alice, timidly: "Some of the words have got altered." "It is wrong from beginning to end, "said the Caterpillar decidedly. After that, Lewis Carroll tells us, there was silence for some minutes. The silence was hardly surprising. How can you compare two things - still less, judge the closeness of their relationship - if they are different (or "wrong") in every respect? Without a series of intermediate structures, the one cannot be understood in terms of the other. Even if they do share some similarities, these have to be noticed and they have to be recognized as significant. Had Alice pointed out that her poem, presumably like the one the Caterpillar had in mind, contained several instances of the word "the", the creature would not have been persuaded. Where poems are concerned, metres, rhyming-patterns, and many individual words are significant, but "the"-counts are not. To argue with the Caterpillar, Alice would have had to identify the important features of her poem, and of his, before being able to compare them. What has this got to do with creativity, and with agents? Well, creativity involves coming up with something novel, something different. And this new idea, in order to be interesting, must be intelligible. No matter how different it is, we must be able to understand it in terms of what we knew before. As for agents, their potential uses include helping us by suggesting, identifying, and even evaluating differences between familiar ideas and novel ones. (You'll be relieved, perhaps, if I don't attempt to define just what an agent i s - and isn't. Instead, I'll rely on the intuitive notion that an agent is a part of a program which can act, and/or be asked to act, in relative independence of others.)
M.A. Boden
No-one would choose the Caterpillar as an assistant: he was both grumpy and unhelpful. What Alice wanted to know was just where she had "gone wrong", just where her recital differed from what she had learnt before. But the Caterpillar was in no mood to tell her. A computerized agent cannot be grumpy. Whether it can be specifically helpful, to someone trying to come up with (or to assess) creative ideas, is what we must consider. How is creativity possible? How can a person, or a computer for that matter, generate novel ideas? Many scientifically-minded individuals would argue that there is nothing especially problematic about creativity. A creative idea, they would say, is merely a novel (and valuable) combination of familiar ideas. Accordingly, creativity could be explained by a scientific theory showing how such novel combinations can come about. Up to a point, they're fight. Samuel Taylor Coleridge's questions about "the hooks and eyes of memory", for instance, could be answered by psychological theories describing the associative processes in the poet's mind (or brain). Indeed, current computer models of neural nets provide some preliminary ideas about just how such associations could happen. Using those ideas, we can lay computational foundations for the R o a d to Xanadu described in a fascinating literary study of the sources of Coleridge' s poetic imagery (Boden, 1991, ch. 6; Livingston Lowes, 1951). Moreover, we can see, in outline, how such theories might lead to helpful agents of diverse kinds. For example, the units (on various levels) within a computer model of a rich semantic network could communicate not only with each other but also with a human user. A writer might use the set of intercommunicating agents as an intelligent thesaurus, or even a computerized literary critic. For agents of this type might aid someone stuck for a new image, or someone attempting to assess (or interpret) one suggested by another writer. And they might help to show whether (and how) a series of images fits together, or to diagnose the mixture in a mixed metaphor. An agent within a semantic network could not only effect an association but also trace the associative pathways involved, which in itself might prompt the user to new insights. If such agents were unable always to tell the difference between an interesting image and an inappropriate one, they would be little the worse for that. Human brainstorming sessions, too, produce a lot of rubbish. Nuggets can be found within the rubbish, even though they may require further polishing. Analogy, too, is the novel combination of familiar ideas. In analogy, the structural similarity between the two ideas is especially important. Moreover, the analogy (unlike many associations of ideas) may be systematically developed, for purposes of rhetoric or problem solving. Several current computer models suggest how two ideas can be seen as analogous, being matched in various ways according to the context of thought (Boden, 1991, preface & ch. 7). Some are relatively rigid, in the sense that analogies are sought between things having pre-given descriptions, which descriptions are not altered if a new analogy is found (Falkenhainer, Forbus, & Gentner, 1990; Holyoak & Thagard, 1989). We can think of these programs in agentive terms if we focus separately on the various criteria they use in seeking analogies (semantic, structural, or pragmatic). Interactive versions might be helpful to human writers or problem solvers wanting to find, assess, or compare analogies. (As in the discussion of associative agents, above, this presupposes the availability of a rich data-base of potentially relevant concepts.)
Agents and Creativity
One analogy-model, in particular, is readily seen as a community of agents. The Copycat system (Hofstadter et al., in press; Mitchell, 1993) uses many independent descriptors in trying to interpret a given analogy and to find a new (but similar) one. These descriptors, or "codelets", are applied in parallel, competing with one another to find the strongest analogy. The domain this program works in is very simple: alphabetic letter-strings. But there are hidden complexities even in this simple domain. For instance, one and the same structure can be described differently on different occasions, according not only to probabilistic variations but to the specific context surrounding it. Thus the mini-string mm will be described as a letter-repetition if it occurs within the larger string aaffmmppzz, but as two separate letters (identified respectively as the last and the first member of a successor-triplet) if it occurs in the string abcfghklmmno. A description can be used for a while and then discarded, as Copycat finds a more integrated, high-level, analogy involving other descriptions. This program has come up with many unexpected alphabetic analogies, some of which are highly persuasive. For example, it was told that abc is analogous to (can be changed into) abd, and was asked to find a matching analogy for xyz. Among the many answers it offered were not only xyd, xyzz, and xyy, but also the surprising and elegant wyz. This involves, among other things, mapping a onto z and left onto right, and also swapping successor and predecessor. In embryo, then, we have a set of agents which can say not only "Think of it this way" and "Think of it that way," but also "Better still, think of it like this". If these techniques were made available in an interactive system, they might help human users to see analogies in some unexpected places. Combinational creativity, then, can be thought of in agentive terms - and might be aided by computer systems made up of many largely independent agents. But is that enough? Can we explain all creativity by reference to novel combinations? If not, could we explain other cases in other terms - and would these also be suitable for computer implementation? Creative ideas include scientific theories; musical compositions; literary genres; instances of choreography, painting, and architecture; theorems of mathematics; and the inventions of engineers. Some of these can be understood as mere novel combinations of familiar ideas. But many cannot - especially those which not only solve the creator's initial problem but also engender a whole new set of problems, to be solved perhaps by the creator's successors over many years. Exploring the implications of a radical new scientific theory, or of a new visual or musical genre, is not a matter of mere combination-juggling On the contrary, it is a structured, disciplined, sometimes even systematic search for the meanings promised by the new idea. But how can this be? How can a new idea be pregnant with such promise? Imagine someone trekking through a desert and up a barren mountainside - only to see, from the crest of the hill, a verdant valley stretched out before him. The promise, the possibilities, are enormous. But to find them, he will have to explore the v a l l e y sketchily at first, perhaps, but later seeking treasures in many a nook and cranny. Creative thinkers (which means all of us, on a good day) explore the possibilities inherent in their own minds, wherein the spaces are not geographical but conceptual. A conceptual space is a style of thinking, a mental skill that may be expressed in marble, music, or movement, in poetry, prose, or proof (Boden, 1991, esp. ch. 4). It is defined by a set of constraints (the dimensions of the space) guiding the generation of
M.A. Boden
ideas in the relevant domain. Some of these constraints are accepted, by the thinker and by the relevant social group (the Caterpillar?), as being more inescapable than others. And some are more fundamental than others. Together, they form a mental landscape with a characteristic structure and potential. Conceptual spaces are analogous to geographical ones in several ways: they can be mapped, explored, and superficially altered, with many valuable results. In one way, however, they are very different. Unlike physical terrain, a conceptual space can be fundamentally transformed. The result of such a transformation is the appearance of a new space of possibilities, a mental terrain which simply did not exist before. It does not follow that creativity involves only transformations, although many of the most exciting examples do. Many creative achievements result from exploring conceptual spaces in systematic and imaginative ways. For exploring and transforming our thinking styles - and for understanding and appreciating the results- we need good "maps" of the relevant space. Intuitive maps exist within our heads, mostly inaccessible to consciousness. In more explicit form, they can be found (though usually only in outline) in the humanities: in literary criticism and musicology, in the philosophy of science and aesthetics, and in the history of art, science, and mathematics. Think of the disciplined beauty of a Palladian villa, for instance, with its symmetrical plan and elegant facade. Or consider the clean lines and interconnecting volumes of a Frank Lloyd Wright open-plan "Prairie House". Think of the conventions of New Orleans jazz, and how they differ from Dizzy Gillespie. And remember how organic chemistry was changed by KekuM's discovery of the benzene ring, which engendered not just one molecular structure but a vast space of structural potential (aromatic chemistry), whose contents, pathways, and boundaries have now been largely mapped. Even the Caterpillar might be able to sense the similarity between one Palladian villa and another, or between the various Prairie Houses. And a twentieth-century Caterpillar might be able to recognize New Orleans jazz, and distinguish it from later varieties. But Alice's invertebrate friend would not be able to specify the relevant similarities. A well-educated Caterpillar could enumerate the dimensions of the space of benzene derivatives, for these have been made explicit by theoretical chemists. With respect to architecture and jazz, however, things are much less clear - - to humans, as to caterpillars. Generations of architectural historians have disagreed on just what principles of design underlie Palladian design. And an expert on Lloyd Wright's work pronounced the (intuitively evident) architectural balance of his Prairie Houses to be "occult" (cited in Koning & Eizenberg, 1981: 322). However, the crucial stylistic similarities concerned have been explicitly identified within a computer program (Hersey & Freedman, 1992) and a computationally inspired "space-grammar" (Koning & Eizenberg, 1981), respectively. Each of these formal systems describes the relevant conceptual space, making it possible to say just why two structures share (or do not share) the same style, and just how (and how fundamentally) they differ. In addition, each of these systems can generate an indefinite number of structures lying within the relevant conceptual space. Some of these match buildings already designed by Palladio or Lloyd Wright. Others are new, depicting houses of the same general type.
Agents and Creativity
For instance, the plan of a Palladian villa is designed by starting with a rectangle (certain proportions being preferred), and generating internal rectangles - t h e rooms by making vertical and horizontal "splits" of various kinds. (Palladio himself described this method.) Not any splits will do, if the resulting design is to be one which Palladio would have approved. Splits are unacceptable if they produce internal corridors; long, thin rooms; too many rooms; rooms of greatly disparate size; many internal (windowless) rooms; and the largest room's lying off the central axis. Early versions of the Palladian program made all of these mistakes, but the relevant design-constraints have now been incorporated. Further non-Palladianisms, produced in the past by human imitators (such as Lord Burlington), include rectangular bays jutting out from the rectangular perimeter. Other "mistakes" would be more debatable. For instance, Palladio almost never built cylindrical rooms. Should we say that an architect (or a program) who includes circles within the plan is faithful to Palladio' s inspiration, or not? And if not, should we credit him (or it) with transforming Palladio's style into another, fundamentally similar, one? Whatever our answer, the grounds of judgement have been made explicit. So there is more chance of fruitful debate, and even of agreement. As well as designing plans, this program designs Palladian facades appropriate to a given plan. It knows (among other things) the difference between Doric, Ionic, and Corinthian columns, and the constraints governing the placement of windows and pediment. On several occasions, it has produced a plan-and-facade design virtually identical to one drawn by Palladio himself. In short, the Palladian program, and the Lloyd Wright shape-grammar too, has generated new architectural designs (though not new architectural styles). Our question here, however, is not whether an entire program can do something "creative", or even useful, but whether an agent can help a person to do so. The computational work on architectural styles suggests some ways in which computer agents might help a human architect. For example, someone designing a Palladian villa, or even combining aspects of Palladianism with some other style, might find it helpful if an informed agent were to suggest - o r to forbid - a split at a certain place in the plan. This would be especially useful to architects, or architectural students, with little experience of working in this genre. L e f to themselves, such a person might design asymmetrical splits, overly narrow rooms, or bays spoiling the clean perimeter-line. Similarly, once the house-plan had been approved, various agents might offer advice on the facade. Some could suggest the number, and the type, of columns. Others might argue for and against an attic-storey, or for and against a particular type of pediment or architrave. The agent-initiators and critics could communicate among themselves as necessary (to check for bilateral symmetry, for instance, or for numbers of rooms). But they need not insist on universal agreement. If they are being used as assistants to human beings, then responsibility for the overall integration of the design could be left to the person. In some circumstances, however, the human could not be relied on to achieve a satisfactory design, because of either inexperience or complexity. In producing a set of computerized agents (a program) for practical use, careful judgments would have to be made about just which decisions and evaluations could be left entirely to the user, and which should be monitored, or even made, by the agents. (This is a special case of the familiar problem about the use of"expert systems" in general.)
M.A. Boden
Agents could be used to help people to map, explore, tweak, and (ultimately) transform conceptual spaces of many different kinds. It is not necessary to restrict ourselves to examples like Palladian architecture, which has long been described in formalist, mathematical, terms. Once a conceptual space has been mapped, agents could help us to move around it in some surprising ways. Take jazz, for instance. Suppose that the Caterpillar in Alice's Adventures in Wonderland had been holding not a hookah, but a saxophone. And suppose that Alice, an uncommonly precocious child, had taken some jazz-cassettes with her to Wonderland, and played them to the Caterpillar. Could the Caterpillar have been helped to learn to play music in that style by a set of computer-agents (also presciently provided by Alice)? Surely, this idea is too nonsensical even for Lewis Carroll? The spontaneous creativity of jazz-improvisation, you may feel, is simply not the sort of thing where computers could help. Well, that feeling might not survive experience of a program that can help people to improvise jazz (Hodgson, 1990, in preparation; Waugh, 1992). This program knows about various dimensions of the musical space of jazz, and various ways of travelling through it. For instance, it can produce fragments of ascending or descending scales, ensuring that the scale chosen is the one relevant to the harmony at that particular point. It can provide "call" and "reply" over two or more bars. It can replace the current melody-note by another note drawn from the same scale, or provide a chromatic run between this melody-note and the next. It can "cut and paste" a library of melodic and rhythmic patterns, or play fractionally ahead of or behind the beat. And it, and the human user, can vary the frequency with which it does any of these things. If let~ to wander through the space by itself, this program will improvise - o n a given melody, harmony, and r h y t h m - by making (random) choices on many dimensions simultaneously. Working in this fashion, it ot'ten creates novel musical ideas which professional jazz-musicians find interesting, and may wish to develop in their own playing. Alternatively, the human- or Caterpillar- user can make the program concentrate on one (or more) dimension at a time, and explore it (or them) in a very simple way. This is why it can help jazz-novices, who can focus on the aspect of jazz which is currently causing them difficulty. The separability of the various musical dimensions suggests the activity of a number of independent musical agents. An improved version of this system might include evaluative modules (agents) which could discriminate between equally legal phrases, or identify weaknesses in an improvisation (its own or the user' s) and show how they can be avoided. A really knowledgeable system might even be able to teach the user how to recover from, or even make the best of, a mistake that had not been avoided. (Oliver Sacks has described a patient with Tourette's syndrome who is a jazz-drummer, able to turn his unpredictable muscular tics into the seeds of exciting jazz-improvisations.) In general, mistakes can be thought of as a sort of serendipity. If knowledgeable agents were developed to help us make the best of our mistakes (not just avoid them), they could lead to some real surprises. Mention of mistakes raises the question, inevitable in discussions of creativity, "When is a mistake not a mistake?". The answer, sometimes, is "When it is a transformation". Going beyond the familiar conceptual s p a c e - generalizing a constraint, specializing it, dropping it, negating it, adding another ... -could always be described as a mistake, relative to the original style of thinking. Indeed, it ot~en is so
Agents and Creativity
described: one of Kekulr's contemporaries dismissed his account of the benzene-ring as "a tissue of chemical fancies", and new art-froms are commonly rejected when they first appear. A transformation may be more or less fundamental: changing a string-molecule to a ring-molecule, by closing the formerly open curve, is more fundamental than switching between methyl and ethyl alcohol by making different attachments to the hydroxylgroup. And several transformations may be combined: one could include both bays and cylindrical rooms in a basically Palladian design. But not everything can be transformed at once. The new conceptual space is generated from its predecessor, and must be intelligible in terms of it if it is to be accepted. If Alice's recital was literally "wrong from beginning to end," the Caterpillar wouldn't know what poem she was reciting. Heuristics for transforming conceptual spaces, including the space of heuristics, have been applied in a number of programs (Lenat, 1983). One of these, whose task is to generate new mathematical concepts from very simple bases, also has criteria for evaluating the mathematical "interest" of the results. Some of the evaluations are mutually exclusive (if the union of two sets either has or does not have some property possessed by both the original sets, that is counted, rightly, as interesting). Likewise, some heuristics are opposites: to specialize a concept, and to generalize it, for instance. There is nothing wrong with that, for a concept does not need to satisfy every possible criterion to be interesting, nor does every heuristic need to be applied simultaneously. These generative heuristics and evaluative rules can all be thought of as agents. Despite being called the "Automatic Mathematician", this program was often used interactively. A human mathematician would guide it into certain pathways, for example by giving some concept a n a m e - which the program would interpret as a hint to give that concept priority for a while. Versions of this program might be developed for other domains, in which the knowledge and judgment of human users could aid, and be aided by, the application of the transformational and evaluative heuristics. Indeed, its heuristic-altering successor has been applied to several different problemareas, and has generated at least one patentable idea (in US law, an idea "not obvious to one skilled in the art"). The most surprising - though not necessarily the best - transformations would be able to change the conceptual space in unpredictable ways and at unpredictable levels. The clearest example at present is given by genetic algorithms. A crossover operator, for instance, might be thought of as an independent agent which transforms, at random, certain types of constraint represented in the target-code. The targetted constraints may be more or less restrictive. In one graphics program, for example, the crossovers and mutations can get right into the heart of the imagegenerating code (Sims, 1991). The results are always "viable", in the sense that the newly-transformed code will generate some visible image or other. But the process is utterly undisciplined. Although it could be used to help graphic designers come up with images they would never have thought of themselves, it cannot be used to explore or refine an image-space in a systematic way. That is possible, however, if the mutatingagents are allowed to alter only the superficial parameters of the code. Significantly, these less powerful agents are preferred by a professional artist working on "computer sculpture", who uses them to explore specific classes of 3D-forms (Todd & Latham, 1992).
M.A. Boden
In both these cases, the evaluation is done by the human user. At each generation, he chooses the image or image-pair to be used in "breeding" the next generation. In principle, evaluation could be made automatic (in whole or in part). And it might be useful for the artist to be able to avoid having to consider certain sorts of image, or to be presented up-front with the most "promising" ones. But if one's interest here is in the development of agent-systems for interactive use, evaluation should not be entirely handed over to the computer. I've concentrated on the practical question of whether agent-systems might help to further human creativity. But my discussion can be seen also as an outline of how creativity might be scientifically understood. Many different psychological processes are involved, ranging across combinational and exploratory-transformational thinking. And many questions remain unanswered, or unasked. Despite all the unclarities, however, we are beginning to understand the computational resources that underlie creativity in its various forms. Creativity at Lewis Carroll's level seems magical, but there is no reason to think that it is magic. Wonderland (and the world behind the Looking-Glass, too) owes many of its surprising features to tweakings and transformations of conceptual spaces familiar even to a child. Other Wonderland surprises are grounded in serendipitous associations within the author's mind. We shall never know what all of these were (still less could they have been predicted beforehand). Who can say where the hookah came from? But the Caterpillar and his mushroom may have owed their existence to a real caterpillar and a real mushroom, falling under Carroll's eye on that golden summer atternoon. REFERENCES
Boden, M. A., 1991. The Creative Mind: Myths and Mechanisms. (Expanded edition.) New York: Basic Books; London: Abacus. Falkenhainer, B., K. D. Forbus and D. Gentner, 1990. The Structure-Mapping Engine: Algorithm and Examples. Artificial Intelligence 41: 1-63. Hersey, G., and R. Freedman., 1992. Possible Palladian Villas (Plus a Few Instructively Impossible Ones). Cambridge, Mass.: MIT Press. Hodgson, P., 1990. Understanding Computing, Cognition, and Creativity. MSc thesis, University of the West of England. Hodgson, P., in preparation. Modelling Cognition in Creative Musical Improvisation, (provisional title). DPhil thesis, University of Sussex. Hofstadter, D. R., M. Mitchell, R. French, D. Chalmers, & D. Moser., in press. Fluid Concepts and Creative Analogies. Hemel Hempstead: Harvester Wheatsheaf. Holyoak, K. J., and P. Thagard., 1989. Analogical Mapping by Constraint Satisfaction. Cognitive Science 13: 295-356. Koning, H., and J. Eizenberg., 1981. The Language of the Prairie: Frank Lloyd Wright's Prairie Houses. Environment and Planning B, 8: 295-323. Lenat, D. B., 1983. The Role of Heuristics in Learning by Discovery: Three Case Studies. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, eds., Machine Learning: An Artificial Intelligence Approach. Palo Alto, Calif.: Tioga. Livingston Lowes, J., 1951. The Road to Xanadu: A Study in the Ways of the Imagination. (2nd edition.) London: Constable. Mitchell, M., 1993. Analogy-Making as Perception. Cambridge, Mass.: MIT Press.
Agents and Creativity
Sims, K., 1991. Artificial Evolution for Computer Graphics. Computer Graphics 25(4): 319-328. Todd, S., and W. Latham., 1992. Evolutionary Art & Computers. Academic Press. Waugh, I., 1992. Improviser. Music Technology, September 1992, 70-73.
Chapter 8 VIRTUAL (REALITY + INTELLIGENCE) Myron W. Krueger Artificial Reality Corporation Box 786 Vernon, CT 06066, USA
INTRODUCTION When I was a graduate student at the University of Wisconsin in the late 1960s, my liberal arts background made me view computers more philosophically than my engineering colleagues did. I felt that the encounter between human and machine would be the central drama of my time and wanted to be on the front lines. Given that computers were to be a permanent part of the human condition, it always seemed that this relationship should be considered as fundamental as particle physics and explored with the same dedication. Since my work was an odyssey rather than a single result, I will describe it chronologically. PROGRAMMING
When I started in computing, I thought that the most immediate target for cognitive technology would be the act of programming, since that is what we were doing in those days. In 1967, I wrote a student paper titled "The Information Needs of the Programmer" that I believe is still valid today. I likened programming to playing a round of golf blindfolded. Obviously, the odds that the ball would be exactly where you have planned for it to be on the 80th shot are nil. You need feedback along the way. Whereas a hardware designer has a multimeter, a logic probe, an oscillocope, and a logic analyzer available to make the internal workings of his circuit visible to him, the software engineer is still flying blind, unless he instruments his program himself with after-the-fact print statements. Interpretive languages and symbolic debuggers help but do not begin to answer the need. To get around this problem, I wrote the beginnings of a system to teach programming. But since I believed that programming was an inherently uncivilized task, I chose to provide some of the capabilities that seemed necessary for the programmer to function in the most minimal way. I implemented an interpreter that watched your editing and checked for common errors. It provided a hundredinstructions backup capability so you could undo the instructions leading up to a crash. To make the experience of running a program more visceral, the interpreter was instrumented so any program made sounds through a synthesizer and generated dynamic visual signatures. While these cues were initially intended to allow you to
M. W. Krueger
experience your program, they provided useful diagnostic information for debugging programs. It has always seemed that each concession to the human user has been made grudgingly. The devices dedicated to receiving human input typically cost no more than a hundred dollars. Since the sole job of the computer is, increasingly, to run the human interface, we can add in the cost of the computer itself. In this case, the average knowledge-worker is capitalized at a few thousand dollars. Even users of high-end workstations are maxing out at one or two hundred thousand dollarsmabout the same as a truck or taxi driver. Pilots or coal miners will, typically, have many times that level of equipment provided to help them do their jobs. The reason that knowledge-workers do not enjoy a higher level of capital investment is that computers still do not provide any service that justifies it. Given that failure, it is surprising that there have not been more projects in which developers said 'let's provide more resources than are necessary'. After all, if a concert cellist spends $20K on his bow, maybe there are people who would be worth the investment if it was made. (Buxton, 1989) Rather than simply seeking to find the "best" way for human and machine to interact, I believed that there would be many human interfaces that would be used in different applications and that, indeed, the person might operate the same application by different means in different circumstances. Thus, the first step should be to explore the various ways that humans and machines might interact. I would have argued for the creation of a science, but I felt that Computer Science had been crippled by its misunderstanding of the nature of science. I also felt that the scientific method works best when there are a small number of variables that you can control systematically to determine their influence. The human interface has too many interrelationships and too many possible manifestations for rigorous experimentation to fully explore them all in the near future, especially since all of the attendant trade-offs are changing almost daily. It seemed to me that the human interface of the late 60s would change as the computer changed. However, the other side of the equationmthe human being~was more stable. I resolved to interface the computer to the human rather than the other way around. Thus, the ultimate human interface, would be to the human body and the human senses. I considered a wearable version of Ivan Sutherland's head-mounted display which was mechanically linked to the ceiling, but reasoned that people would not want to wear scuba gear for either work or play. (Sutherland, 1968) Instead, I conceived of an unencumbering interface in the form of a computercontrolled-responsive environment. (Krueger, 1974) It was a room that you entered and in which everything you saw or heard was a response to what you did. Its immediate goal was not to implement applications, but I was confident that the process would lead to powerful and practical results in the long run. Also, it seemed that a successful human interface would be measured by subjective criteria as much as by optimal performance or minimized costs. Indeed, it was apparent, even then, that if you used a computer that had an ugly but efficient user interface all day long, every day, you would have a lousy life. For this reason, I felt that the human interface should be judged by aesthetic criteria as much as by engineering ones. Ultimately, people would choose their interface by whether they liked it, and they would be right. (A decade later, users embraced the MAC interface before it was shown to be superior in any objective way to command line interfaces.)
Virtual (Reality + Intelligence)
To discover how people would like to interact with computers, I decided that Human Machine Interaction was best understood as an art form and that I would need to become an artist to work within it. Art has always been a cognitive technology for changing consciousness, but I believed that it was a sensible approach that would produce practical results as well. (Krueger, 1993) But unlike the traditional artist who hangs her picture on the wall and is resigned to having the viewer take it or leave it, my primary concern was with the audience whose reactions were the most important ingredient in the medium I contemplated. The first human I had to satisfy was myself. Whereas my Computer Science colleagues rejoiced in their sedentary symbol manipulation skills, I felt that my mind was part of my body and chafed at the fact that I had to immobilize one in order to use the other. The fact that computers require you to sit down to use them is obvious but never remarked upon, let alone lamented. Of course, computers did not invent that problem. Nor did television turn us into couch potatoes. The true culprit was Guttenburg. Ever since the mass application of reading and writing, intellectual work has required that we sit down to create it and to consume it. I wanted my body back. I wanted to become physically involved in my intellectual endeavors. I also decided that, if the computer was to interact with me, it would have to know more about what I was doing than the touch of my fingers on a keyboard or my manipulation of a pointing device. In fact, the more it knew about my actions, the richer my relationship with it could be. It would also have to respond to me through every sensory channel and permit me to express myself with my entire being. The ultimate goal would be an artificial reality in which the laws of cause and effect could be composed from moment-to-moment. P S Y C H I C SPACE Starting in 1970 and 1971, I built sensory floors which could track my footsteps as I walked around a room. Thus, I was contributing to the interaction by moving physically, and the computer was responding with electronic sounds or by generating computer graphics that were rear-projected on a screen that constituted one end of the room that I had constructed within the University Art Gallery. Given unlimited resources, I would have had graphic images on all walls, the floor, and the ceiling. A variety of experiences was created in this environment. One interaction displayed a simple three-dimensional scene with vector graphics on the projection screen. As you moved around the room, your perspective on that scene changed appropriately. (Later, I experimented with impossible spaces in which objects got smaller as you approached them. In the most elaborate experiment, walking around the sensory floor controlled the movement of a cursor around the screen. After several minutes, a second symbol appeared and everyone wondered what would happen if they moved their symbol to the new one to get acquainted. (There was nothing else to do.) When they reached it, the new symbol disappeared, and a maze appeared with them at the starting point. Since this was a university, students confronted with a maze would obediently commence walking it. Because of the experience of moving their cursor around the
M. W. Krueger
empty screen, they knew to take small steps and minced their way around the maze without violating its boundaries. After a few minutes of this, they would pause, look around furtively, and sometimes, with considerable ceremony, raise one foot in the air to cross the boundary of the maze. But, when they put their foot down, the boundary of the maze stretched elastically, so they had not succeeded in crossing it. The next time they breached the maze, their symbol fell apart. Another time, their symbol pushed the whole maze. Ultimately, there was no way to cheat. The participants' experiences were driven by a perverse curiosity to see how their intentions would be thwarted the next time. This experience had some of the same goals that a good human interface might have, or that Cognitive Technology might aspire to. It had transparency. There was no verbal or written explanation. The experience simply unfolded. It was natural and intuitive. Everyone knew how to walk and the relationships that we built upon that act were close enough to the traditional ones that only a slight adaptation was needed, and that adaptation was separately trained for, before the relationship was used for a further purpose. At one key point, desired behavior (moving to the starting point of the maze) was elicited by planting a question in the participant's mind. (What will happen if I go to the second symbol?) Changes in the experience happened only at calculated times. Reaching the second symbol was attainment of a user goal. A consequence was expected. The appearance of the maze was its fulfillment. The maze experience was an example of a new interactive medium in which feedback relationships are composed around the participants' expectations about the consequences of their behavior. METAPLAY In another early exhibit, I noted that Art has always been concerned with the human image and that Interactive Art might benefit from using the participant's interest in his own image. To this end, the live video image of the visitor was shown on the projection screen described earlier, juxtaposed with the computer graphics. The goal was to discover what expectations the participants would bring to this situation and what relationships could be defined within it. I wanted the computer to perceive the participants' behavior and to automatically generate intelligent graphic responses to their actions. Unfortunately, in 1970, no computer could analyze human movement in real-time. The solution was what I call the Wizard of Oz paradigm. (In the movie, the Wizard first appears to Dorothy as a giant figure that is manipulated by controls from behind the scenes). Through smoke and mirrors, I faked the interface that I did not yet know how to build. I chose to use my own eyes to see for the computer. I watched the participants on the video screen, and then controlled the selection of computer graphic responses through an array of buttons and positioned those responses near their images with a data tablet. The computer monitor was viewed through a video camera and its image superimposed on the live video image of the participants. As it happened, I was in the Computer Center, a mile away from the participants in the gallery, so this was a telecommunication experiment as well. By functioning as a part of the interactive system, I was using human intelligence to simulate possible computer intelligence. I was also creating a cooperative system that combined both forms of intellect in a real-time application. As a component in such a
Virtual (Reality + Intelligence)
system, I was also forced to focus on the problem I wanted to solve. Since I could see better than any computer and reprogram my responses infinitely more rapidly, I learned much more quickly than if I had lived within what I could truly implement. Note that the participants were also collaborators in this enterprise, because some of them returned day-after-day to try out ideas for new interactions. When people entered the gallery, they were confronted by their own video images displayed on the giant projection screen. Then, I drew computer-graphic grafitti on them. After a moment of passive observation, they would start to react to my drawing. When they saw the cursor move across the screen towards their images, they would duck. When it got close to their bodies, they batted it away as if it was alive. One day, when I was drawing on a person's hand, he moved it and I followed. It looked like he had drawn the resulting line with his finger. He then continued to try to draw purposefully. For ten hours a day, seven days a week, for six weeks, we were able to communicate the possibility of drawing to hundreds of people and to teach them to do it without being able to say a word to them. It was as if their DNA had been waiting for this moment to occur. It was amazing that people would accept such a minimal version of reality and become completely engaged in it. TELECOMMUNICATION BECOMES COINCIDENCE Another striking experience occurred when we were having trouble transmitting the coordinate data from the sensory floor to the Computer Center. I displayed the waveform of the signal I was receiving on my computer, and it appeared on video screens in both locations so we could talk about it. Then, I asked my colleague in the gallery to display the data he was transmitting on his computer and to aim the camera in the gallery at its display. Since the two video images were already superimposed, the result was a composite image in which the data transmitted was at the top of both video screens and the data received in the Computer Center was at the bottom. When I wanted to direct my colleague's attention to a feature in one of the waveforms, I put my hand in front of the computer screen so that it appeared in the composite image. Then, I maneuvered my finger so that it was pointing where I wanted. When he saw me do this, my friend understood immediately and used his hand to point within the composite image as well. As we conversed, we used our hands exactly as we would have if we had been together. The communication was so complete that we had no desire to go to the other location. The illusion of being together was so strong that when the images of our hands touched for an instant, he moved his hand away. When I saw this, I tested it again. Over and over, when the image of my hand touched his, he unconsciously moved his hand to avoid the contact. Personal distance was operating in this minimal virtual world. Telecommunication had been redefined. Whereas we typically think of it as being between two locations, in this case, a new location had been created that consisted of the information that we could both share simultaneously. In that virtual location, we behaved exactly as if we were together. This 1970 experience was the origin of the idea of a shared telecommunication place which is now one of the fundamental concepts in virtual reality.
M. W. Krueger
I called the virtual world on the projection screen "VIDEOPLACE." In it, people see their images as an extension of their own identities. What happens to it happens to them. What touches it, they feel. Note that this experience is not exactly the same as that had by wearers of head-mounted displays (HMDs), because in VIDEOPLACE people see themselves in the virtual world from a third person point of view, whereas with an H M they see the world as if they are in it and can only see those parts of their own bodies that they would see in the real world. This realistic feature actually makes it difficult to know what is happening, given that the participant has no sense of touch in most virtual reality systems. This experiment was an effort to discover and harness the natural expectations of the human mind as it operates in a human body. While people have long complained about the poor transfer from book learning to active behavior, in virtual reality learning is rehearsal of the desired actions. At first glance, this technique might seem limited to the training of physical skills rather than the acquisition of concepts. However, it is possible to imagine that conceptual knowledge might be presented more concretely. To the extent that that can be done, we have a new way of learning. While conceding that there may be kinds of knowledge that do not lend themselves to physical expression, there are others that can be cast in either conceptual or experiential form. In the second category, it is likely that there are people for whom the physical presentation is most understandable. For certain, there are people who have lost some of their linguistic ability through strokes and traumatic brain injury. For these individuals, making information concrete may be the only way it can be presented to them. While physical participation does not guarantee that content will be included anymore than has been the case with television or video games, it does suggest that we examine what we do in education to see how much of it might be cast in experiential form. To illustrate the kind of unexpected opportunities that physical immersion offers, consider an example. It has long been noted that people's spoken values are not always consistent with their behavior. Verbal learning begets verbal behavior and nothing more. On the other hand, with virtual reality, it is possible to imagine an interactive parable in which you are not only presented with a moral dilemma, but you are also required to act on it as well. Such practice may make perfect. ARTIFICIAL INTELLIGENCE Throughout my development, I have always kept one eye on the subject that the term "Cognitive Technology" was coined to avoid---"Artificial Intelligence (AI)". During my graduate school days, AI was a hot topic in Computer Science and I was drawn to it. I studied under Leonard Uhr and collaborated on a student project with Stuart Shapiro who went on to attain prominence in the field. In the intervening years, my work has occasionally grazed AI or incorporated its techniques. One of the reasons that I lost interest in AI was that I personally had a different view of the nature of intelligence. AI was symbolic and conceptual. It was slow, and speed was not considered a relevant dimension. In contrast, natural intelligence is characterized by speed. Nature has never found a use for slow off-line processing. We ourselves use the language of speed when
Virtual (Reality + Intelligence)
discussing someone else's intellect. We say they are a quick study, fast on their feet, etc. Properly viewed, the art of conversation resembles an athletic event more than it does the temporally unconstrained machinations of M programs. While human intelligence is certainly distinguished in its ability to manipulate symbols, the body is more than a vestigial organ that serves to move the mind around. Rather, the mind evolved to serve the body. Its initial task was to deal with perceptual information and to immediately translate it into behavior. (Indeed, perception-particularly visionmis inextricably bound up with behavior. We cannot see unless we move our eyes and often move our bodies in order to improve what our eyes can see.) Symbolic reasoning was probably an inevitable side effect of high level perception that was then turned to other purposes after the press of real-time events had subsided. Typically, M systems would run enough examples to support a paper and then be turned off forever. If we had so little experience, it is unlikely that we would be intelligent. In contrast, natural intelligence performs in the context of ongoing existence. I always felt that it would have been useful for some M programs to have ongoing experiences--no matter how pitiful their functionality, for mere existence provides much to be at least minimally intelligent about. For instance, it should be quite possible to implement an Eliza-like program that could talk about past, present, and future weather in an informed way. Natural intelligence also exists as part of a complete system, not as discrete components that can be studied in isolation. In fact, despite the dogma of Artificial Intelligence, Computer Science, and communist dictators, top-down is not the only way to design. Natural evolution is the result of bottom-up improvements upon complete working systems. Computer hardware has always advanced from the bottom up, resulting in break-neck progress from the beginning. In software, by contrast, the high level languages of today look little different than their predecessors of almost 40 years ago. VIRTUAL MICROWORLD From the beginning, I felt that my interactive environments had the capability of serving as laboratories for not only the interaction between human and machine but also for the encounter with what I termed the "artificial entity" which was a humble term for an artificial intelligence. I argued that such entities would be the inevitable consequence of our technology and that virtual reality provided the perfect domain for such entities to be created, because so many hard problems could be finessed. (Krueger, 1983) A virtual reality is close to the M concept of a microworld. It is a circumscribed domain in which the computer is responsible for everything that happens. Therefore, there is the potential for it to understand what transpires in semantic terms. In Winograd's thesis, the computer understood, because it was the sole mediator between the user's text commands and their intended effects in the world. (Winograd, 1972) What I planned was a world in which the human participated physically using her body instead of a keyboard. The computer would perceive the person's actions, determine what their consequences in the virtual world should be, and then change the world to reflect their occurrence.
M. W. Krueger
From 1974-1985, I developed the computer vision required to automate the METAPLAY experience. To permit this to be done, the computer's perception of the participants had to be simplified in two ways. First, the participants stand against a neutral background, so real-time computer vision can analyze their silhouettes instead of facing the difficult problem of distinguishing their images from a complex background. Second, the participants' images are inserted into an abbreviated virtual world where only some consequences are defined. I knew from experience that people would cooperate with the illusion and behave within its capabilities. With perception thus simplified, I thought that a real-time behaving system would be possible. I further thought that higher level cognitive functions could supervise the low level behavioral functions. While I have never had the support necessary to attain that goal, I was able to take a small step using techniques that I learned during a collaboration with Richard E. Cullingford and Mallory Selfridge of the University of Connecticut on an ARPA sponsored project called CADHELP (Cullingford, 1983). This was a CAD system that explained its operation to new users in natural language using Roger Schank's Conceptual Dependency Notation (Schank, 1977). My own contributions to that project had to do with the idea of using the same knowledge representation to drive an animated demonstration of the physical actions that the user would have to employ to operate the data tablet driven GUI (graphical user interface). In addition, I had the thought that the same representation could be used to drive the operation of the application itself. Thus, the system could be said to execute its own documentation. Two years later, when my VIDEOPLACE hardware provided the computer vision required to automate my 1970 Wizard of Oz interface in real-time, I could start to think about how it worked rather than whether it would work. I had implemented what I called the "Reflex System." This was a very elaborate multi-heterogeneousspecialized-parallel processor system that analyzed a person's image, identified its interesting features, and generated a computer graphic or electronic sound response within 1/30 second. (Krueger, 1989) My plan had been to add a Cognitive System that would monitor the behavior of the Reflex System and make strategic judgments about the actions it should take in the future, without ever being involved in the processing of the immediate response. Then, when there was no-one in the environment, the Cognitive System would ruminate upon the day's experiences. It would dream. Unlike any other AI system, it would always be on. Its goal would be to engage participants in playful interactions and to learn to improve its ability to do so over a long period of time. Since the PDP-11 that rode herd on the specialized processors was not really up to the task of representing a near real time Cognitive System and the serial link with the department's VAX was too slow to maintain an interesting level of connection to the real-time environment, my first step was very preliminary. Inspired by the 1970 experience of observing people batting the cursor away from their images as if it was alive, I created a graphic creature I called CRITTER that built upon that behavior. CRITTER's immediate goal was to engage people in playful interactions that they would accept as real experiences. In this goal, he succeeded. People willingly became involved and related to him eagerly. He was also a stand-in for that artificial entity that I knew would come later.
Virtual (Reality + Intelligence)
What was surprising when I first implemented CRITTER using conventional programming techniques was that I was thinking of the rules that guided his many complex behaviors in straightforward conceptual terms. Based on this observation, we designed a system in which the actual definition of the CRITTER's perception, goals, and rules of behavior was done in Conceptual Dependency Notation. These rules were analyzed by LISP code on the VAX, translated into a form that I called "Flex Code," and then downloaded to the PDP-11 where it was interpreted in real-time. (Rinaldi, 1983) Thus, a knowledge-driven expert system analyzing human video images and coordinating them with live animation was operating in real time in 1983. A few years later, I decided to do for the Cognitive System what had much earlier been done for the Reflex System. I again used my human intelligence to make decisions about what feedback the computer would generate in response to the physical actions of a human participant. While I made the judgments about what the interactive relationship should be, it was the Reflex System that actually analyzed the current moment and delivered the instantaneous response that defined that relationship. This paradigm of a human intelligence guiding a real-time reflex system seems like it would be generally useful and yet I do not see any instances in which it is being used. The basic strategy of using the human to do what the computer cannot yet do, and seeking to incrementally automate that part, would seem like a natural evolutionary strategy that might be applied in many domains. VIDEODESK
We took the VIDEOPLACE techniques into the practical realm with the VIDEODESK. In this configuration, a ceiling-mounted video looks down on the user's hands as they rest upon a conventional desktop. The image of the hands is then displayed superimposed over an application on the monitor. We then coupled this interface to a system for scientific visualization that we had developed from 1984-1988 for Pratt & Whitney Corporation, the jet engine manufacturer. (Krueger, 1988) The task was to depict the flow of gas through jet engines and the solution was a cognitive technology in its own right. In 1984, I was a complete virgin in computational fluid dynamics. Not only was I unaware of what others were doing in the field, I was ignorant of its existence. I was simply shown stacks of printouts that contained flow vectors that had been generated by a supercomputer and asked if I could make sense of them. The goal was to make the information contained in that data accessible to application-sophisticated, but computer-wary, scientists and engineers. The most significant aspect of this visualization problem is that the user is trying to understand a phenomenon that is happening everywhere in the flow volume. There can be no single picture or animation of it, because events in the front will necessarily obscure those behind. Building an understanding of the flow as a whole, then, requires integrating many different views into a coherent whole, or at least identifying the features that are most salient. Since understanding depends on the user's memory, it is important to minimize the burden on the memory and the distractions that might erase it. Thus, the intellectual premise underlying this system is what I call the minimum investment principle. If I can ask a question and get an answer instantaneously, that success encourages me to
M. W. Krueger
ask another question. On the other hand, if posing a question requires me to leave my domain of interest to struggle with computer commands, and then forces me to wait a long time before I get an answer, I am apt to ask very few questions and never to build up the intuitive feel that I would like to have about the phenomena I am studying. The approach that I took was influenced by my own childhood playing with the flow of water by sticking my hand into it. When I did that, I spent no time deciding where I was going to probe, no time positioning my hand, and no time waiting for reality to produce its results. I simply played. In this spirit, I developed a system in which the mode of interaction was as close to that kind of undirected play as possible. A ceiling-mounted video camera looked down at the user's hands as they rested upon the desktop. The silhouette image of the hands was projected onto a two-dimensional plane in the three-dimensional gas flow volume. When one finger was extended, gas particles were released from the tip of the finger and moved through the volume under the influence of a precomputed velocity vector field. When two fingers were extended, the fingertips defined the endpoints of a line and the gas flowing through the line defined a surface that was deformed as it flowed through the volume. With my system, the calculation of the particle movement proceeded at a speed proportional to the flow and so could be shown as it was calculated. If, as the flow was being generated, the user felt he did not need to see it run to completion, he aborted the calculation simply by moving his fingers. Physical movement followed by instantaneous feedback took this application beyond the range of GUI interfaces, for the interaction was based on coordination rather than commands. Physical intelligence was being brought to bear as well as visual perception. While this system may seem unremarkable today, at that time, NASA was developing its PLOT3D package which used a command line interface and required that users memorize the indices of the three-dimensional grid, type parameters for 5-15 minutes to set up the simplest query, and then wait for 45 minutes while the computer created an animation through the points in the volume that the user had specified. It was not until individuals at NASA heard the author mention his own system in a talk, that they began to implement their own version of the virtual wind tunnel in 1991. It is still NASA's best-known virtual reality project. At SIGGRAPH 1991, in the Tomorrow's Realities Gallery, we tied together the teleconferencing insights of 1970 with this application. When visitors sat down at the VIDEODESK and saw their hands appear on the screen, they also saw the hands of a student volunteer who demonstrated the hand gestures required to operate the application. In about three minutes, the visitors were given control of the program and were probing the flow field on their own. The fact that people who knew nothing about fluid dynamics were able to learn to use an application based on a completely novel interface through a new form of telecommunication testifies to the naturalness of the techniques employed. The reason people were able to learn the application so fast was that, by watching the screen, they saw another person's hands performing exactly the actions that they would have to perform with their own hands. No inference was required, as would have been if the tutor was using a mouse to control a cursor. There is circuitry in the human brain dedicated to recognizing human hands and understanding their movements. When we see hands do something, we know how to do it. It is monkeysee-monkey-do.
Virtual (Reality + Intelligence)
While there has been a major shift to interactive systems, the speed of response on which the minimum investment principle depends has never been a priority in the Computer Science or Human-Computer Interface communities. Even as computers get faster and faster, operating systems get slower and slower, so the interfaces of today are not remarkably faster than those of 30 years ago. The human user should be seen as the most expensive component in any system, and utilization of the human brain should be seen as more important than all of the features some other users might like their computers to have. This is not a technological limitation. It is an attitude. Computer Science is the only engineering discipline in which time is an unbounded variable. It is also notable that the bandwidth of the human-machine interface has narrowed not widened in the last 30 years. At the 1993 ACM Conference in Indianapolis, Alan Kay showed a tape of Ivan Sutherland's Sketchpad system, which he felt was better than any subsequent system (Sutherland, 1963). Afterwards, I asked if he knew why. I pointed out that Sketchpad was a two-handed interface and current GUIs only permit the use of one hand. In my virtual wind tunnel, two fingers were used to define the endpoints of a line. In another application, the thumb and forefingers of the two hands are used as control points of a spline curve, which was then used as the aperture of an extruding device to create complex three-dimensional solids in a matter of seconds. In another slightly bizarre technique, the left hand positions a small window in which a tiny right hand can draw with pixel accuracy. Amazingly, this Big Hand/Little Hand interface is very easy to learn and to coordinate. More generically, current GUIs force the user to constantly alternate between using the mouse to select commands from a menu and using it to point within the application. This mousing around could be eliminated by a two-handed interface. VIRTUAL REALITY In virtual reality, where the idea of immersion in the experience is considered central, there is a similar blindness regarding the importance of the body. In almost all HMD (head-mounted display) virtual reality systems, all the computer knows is the position and attitude of your head and one hand. The rest of the body is ignored-making virtual reality the 3D equivalent of mouseworld. (Krueger, 1993) Furthermore, you are typically stationary. You cannot walk around the virtual world because of the limited range of the sensors and for fear of tripping over the wires that connect your head to the computer. Instead, you navigate by pointing your finger in the direction you want your eyes to fly. While these may seem to be minor complaints, the inability to move naturally about virtual reality robs you of a very important method of understanding your surroundings. When you walk about a room, you are continuously measuring it, because you know from a lifetime of experience how high your eyes are from the floor and how long your stride is. The realism of that interactive relationship is probably much more important than the quality of the rendering or the resolution of the graphics. To see if the natural movement is as important as we think, we are currently prototyping a wide area (12 x 12 meter) wireless tracking system that will permit
M. W. Krueger
people to walk around virtual worlds exactly the same way they do the real one. At the very least, we expect that the initial instructions and frustration that are associated with most H M systems will be replaced by the instant understanding that characterizes the body of work that has been described in this paper. To further complete our connection with the virtual world, we are also developing delivery systems for olfactory stimuli. Tactile display presents a more difficult problem. Although systems which demonstrate tactile or force feedback have been demonstrated and scalpels for surgical simulations are likely to be done well in the near-future, it may be decades before we have a convincing ability to feel our way around a virtual world. However long it takes, I believe that the more complete our experience of the virtual world, the more real it will seem and the more effective it will be for all purposes. To date, the effort in virtual reality has been in making it appear realistic. It will take longer to improve the quality of our physical participation in it and then to create more compelling and useful experiences. As the basic medium is put together, there will be an increasing use for the ingredients of what was once called AI. In virtual reality, there is a tension between metaphor of the-body-in-the-virtualworld with the command magic of the traditional interface. The idea of doing physically in a virtual world those things that we would do physically in the real world sounds appealing for about a minute, until you think of all the tasks that are tedious to perform in the real world, but which can be accomplished in any computer application by invoking a single command. Thus, if you have designed a building in a virtual world and decide that you would like to make it red, you could take your virtual paint brush and start to work, or you could invoke your command magic and simply say, "let it be red." While traditional menus can float in mid-air in virtual reality, one can argue that the hands should be used as hands and the voice should be used for commands. Alternatively, if you are talking to another participant, you may prefer to select commands with your hands. Obviously, the division of responsibility between these two modes of control is going to be an important design issue in the future. SPEECH INPUT A few months ago telephone companies started using speaker-independent speech recognition in a highly visible application, signalling that speech input is about to make the transition from being a niche technology for the disabled to becoming a mainstream interface technique. "Point and talk" is likely to be the interface of the future. Fueling this trend will be the fact that the miniaturization of portable computers has been limited by the size of the keyboards and the size of the displays. Voice input would allow the keyboard to be replaced by a tiny microphone. The display may ultimately be embedded in ordinary eyeglasses or contact lenses, enabling virtual reality to merge seamlessly with our everyday world. It would also permit us access to computer-based information so easily that it would change our relationship with knowledge itself. The traditional computer has been an object in a place that could only be operated in missionary position (Krueger, 1990). The contemporary portable computer is an object independent of place, but it still requires that we sit down to use it. Instead,
Virtual (Reality + Intelligence)
assume that you and everyone else wears a display that does not change your appearance if you already wear glasses, understands what you say, and can privately or publicly display the information you request. Such a device will be able to access information wherever and whenever it is needed. It will also remove the need to sit down to work. It will reintegrate the mind and the body. ILLUSTRATED SPEECH To understand how profoundly such a simple change in our relationship to information will affect how we use it, you only have to look at how the remote control has changed television viewing. Years ago, people turned to a channel and left it there for the evening. Now, they change channels every 9 seconds on the average. Fullyintegrated ubiquitous computing will have a much more important effect. Not only will we access information only when we need it, we will have complete control of how it is presented to us. For instance, a salesman may choose to see points in his spiel superimposed in teleprompter fashion on the customer. We will also have complete flexibility in how we present information to others. Today, we must go to a special room which is equipped with various kinds of projectors if one person wants to show images to her colleagues. With the technology envisioned, the same group could stand around the water cooler with each illustrating their speech with interactive animated three-dimensional models that all can see. This blending of voice with images is far more graceful than anything that can be imagined with text. To make it really work well, so that the images anticipate what you are about to say, will require the use of semantic cues, if not understanding. In time, both for technological evolution and for the development of a personal communication style that will take at least a childhood to mature, such techniques will be part of our identities as well as our culture. Speech input is but one of a variety of what were once M components reaching a critical threshold beyond which they can start to function in a useful manner. Speech recognition--if not complete understanding, believable speech synthesis, and computer vision capable of understanding the human face and body, will start to be put together in the next few years. In addition, human figure animation will be capable of generating the realistic movements and expressions required to create a simulated human. Both off-line animation systems and interactive systems that must move their characters in real time will use models that perceive their virtual world and generate the appropriate behavior with all of the nuances of facial expression and body language (Krueger, 1995). Norm Badler's JACK project at the University of Pennsylvania is a step in this direction. It creates semi-autonomous graphic characters for medical training scenarios (Cassell, 1994). These developments will take place not because of progress in M, but because the existing technologies are suddenly becoming affordable and their capabilities are advancing so rapidly. We have reached the point where the question I always ask in my talks about virtual reality, "Would you use it, if it was free?" becomes unavoidable, because these technologies soon will be. The enabling technologies for creating artificial entities will be in place, but the research that would allow us to take advantage of them has not been done.
M. W. Krueger
CONCLUSION The theme of the work described in this paper, and the projections made, is that the relationship between the mind and body is starting to change. Portable, wearable, and environmental technologies will allow us to physically experience information in virtual reality as well as moving freely about our everyday world as we use it. The immediate goal of human interface design is to eliminate the barriers between the two. Only when this process is completed can we develop cognitive technologies that go beyond simply accelerating and automating what we can already do. In addition, by letting the computer encounter humans as physical creatures and by making the computer responsible for the behavior of similar graphic creatures, virtual reality will bridge the chasm between human and machine by permitting them to meet each other half-way. The result will be profound, for it will begin the gradual obsolescence of all of our existing information paradigms. The sedentary, symbolic, two-dimensional, black-and-white, sensory-deprivation world of the intellect will be gone forever. REFERENCES
Buxton, W. 1989. Personal communication. Cassell, J., C., Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Becket, B. Douville, S. Prevost, and M. Stone, 1994. Animated Conversation: Rule-based Generation of Facial Expression, Gesture, & Spoken Intonation for Mutliple Conversational Agents. SIGGRAPH 94:413-420. Cullingford, R., M. Krueger, M. Selfridge and M. Bienkowski, 1982. Automated Explanations as a Component of a CAD System. IEEE Transactions on Systems Man and Cybernetics, April 1982, 168-181. Krueger, Myron, 1974. Computer Controlled Responsive Environments. PhD Thesis, University of Wisconsin. Krueger, Myron, 1977. Responsive Environments. AFIPS NCC: 423-434. Krueger, Myron, 1983. Artificial Reality. Reading, Mass.: Addison-Wesley. Krueger, Myron, 1988. VIZER:Interactive Flow Visualization Users Manual. East Hartford, CT: Pratt & Whitney. Krueger, Myron, and K. Hinrichsen, 1989. Real-Time Perception of and Response to the Actions of an Unencumbered Participant/User. Patent # 4,843,568. Krueger, Myron, 1990. VIDEOPLACE and the Interface of the Future. In Brenda Laurel, ed., The Art of Human Interface Design, 405-416. Reading, Mass.: Addison-Wesley. Krueger, Myron, 1991. Artificial Reality II. Reading, Mass.: Addison-Wesley. Krueger, Myron, 1993. The Artistic Origins of Virtual Reality. Invited essay, SIGGRAPH Machine Culture Catalog. Krueger, Myron, 1993. The Emperor's New Realities. Virtual Reality World, Meckler, September. Krueger, Myron, 1995. Automating Virtual Reality. IEEE Computer Graphics and Applications, January 1995. Rinaldi, J., 1983. Using a Knowledge-Based Artificial Intelligence to Construct Responsive Environments from a Conceptual Representation. Masters Thesis, UCONN.
Virtual (Reality + Intelligence)
Schank, Roger, and Robert Abelson, 1977. Scripts, Plans, Goals and Understanding. Hillsdale, N.J. :Erlbaum. Sutherland, I., 1968. A Head-Mounted Three-Dimensional Display. FJCC: 757 -764. Washington DC: Thompson Books. Sutherland, I., 1963. Sketchpad: A Man-Machine Graphical Communication System. SJCC. Baltimore, MD: Spartan Books. Winograd, Terry, 1972. Understanding Natural Language. PhD Thesis. New York: Academic Press.
Chapter 9 HEURISTIC ERGONOMICS AND THE SOCIOCOGNITIVE INTERFACE Roger O. Lindsay Psychology Unit, Oxford-Brookes University, UK
ABSTRACT Debate surrounding the design characteristics of computational systems has focused around the realisation that as such devices become increasingly common and important elements of the human cognitive environment, it is unreasonable to allow the engineer to build whatever is convenient, leaving human operators struggling to do what they can with the result. The history of software design is briefly reviewed to establish that software systems have followed the normal course of artefact evolution, initial preoccupation with functional effectiveness giving way to concern with usability: an insistence that engineers should take human factors into account in system design to build devices which suit the known characteristics and limitations of humans. The present paper argues that projectible enhancement of the cognitive capabilities of sottware will require AI systems to participate in goal setting, and to function as autonomous agents. This in turn requires systems with independent value systems with which human users will interact. These developments require a radical revision of conventional approaches to ergonomics and man-machine interfaces as the locus of control in cognitive interaction ceases to lie with the user and becomes distributed over the partners in a cooperative dialogue. As the salient parameters of interaction become cognitive rather than physical, the idea of user-centredness ceases to be helpful because the cognitive characteristics of the user are likely to be modified as part of problemsolving dialogue processes and will in many cases turn out to be part of the problem. It is argued that the enabling apparatus which allows human agents to accept and deliver cognitive modification involves a process of knowledge, goal and value matching which is controlled by the use of heuristics. Examples of heuristics controlling action and dialogue planning are common but have been unhelpfully classified as ethical principles and conversational conventions or maxims. It is suggested that machines capable of mixed-initiative interaction will need access to a rich repertoire of such dialogue control heuristics. Some examples are presented and briefly discussed.
R. O. Lindsay
The implications for cognitive technology are considered, and it is concluded that user-centred conceptions of design will have to give way to a user-peripheral dialoguecentred perspective. The ergonomics of the physical interface between man and machine will need to be replaced by an ergonomics of the socio-cognitive interface. End-users are usually the last people to be taken into account in the design of technological artefacts. Early in the production process, engineers are preoccupied with delivering a product which works; customers are glad to possess a device which carries out that particular time- or energy-saving function. Only when basic engineering problems are solved, and attention turns from merely producing the artefact to producing a version which is superior to that produced by competitors, does it become clear that generations of early versions were inconvenient (Norman, 1990) or even dangerous (Broadbent, Reason and Baddeley, 1990). Cars, washing-machines, videorecorders: almost all reasonably complex artefacts progress through the same evolutionary cycle, and only in retrospect do users come to appreciate how unnecessary most of their frustrations were. It is to be expected therefore that the development of computers should proceed according to the familiar pattern: and up to a point it has. Computers are very different from other artefacts in two respects: the first is that they are composed of hardware as well as sottware, and to some extent these two components have independent design trajectories though clearly there are interactions between them. The second respect in which computers are different from other artefacts is that they do not have determinate functions. The class of functions which a computer may perform is unbounded. In the case of a washing machine successive design versions may converge upon an optimal design; it is less clear how this can apply to a computer. It might just possibly apply to the hardware: maybe there is an optimal design for a device which receives and runs programs, but it is difficult to see how the same principles can apply to sottware: sottware causes the computer to behave. What is the optimal way to behave? The intractability of this question may appear to arise from trading on an ambiguity or a false implicit analogy: it might be argued that there is no optimal way of behaving for people, because they are not artefacts designed under functional constraints. Computer programs, however, should efficiently achieve explicitly specifiable objectives; for them there is a right way to behave. One of the aims of this paper is to argue that user-centred design of cognitive artefacts: systems capable of knowing, believing, and participating in dialogue, requires computer soitware to be endowed with characteristics which are incompatible with specification in terms of functional objectives. The terminology most commonly used to chart the design evolution of computers: 'first generation', 'second generation', and so on up to 'fifth generation', takes account only of design transitions in hardware: for example the shift from vacuum tube to transistor, and from individual transistor to integrated circuits. Software changes are generally treated as peripheral, though oddly, it is common to hear the term 'fifth generation' computers used to mean computers capable of intelligent behaviour, even though intelligent behaviour may result from so,ware rather than hardware. Considering software alone from a user perspective, it is possible to distinguish the following evolutionary stages:
Heuristic Ergonomics
Phase 1
Programming directly in machine code. Typical of engineeringdriven solutions to artefact design. Too time consuming and divorced from applications for general user
Phase 2
High level languages such as Fortran, Mgol, and Pascal. General purpose, though oriented to classes of application. Accessible to non-specialist users but, few concessions to user needs or convenience.
Phase 3
Preprogrammed applications packages. User needs to know application area but not programming languages.
Phase 4
Decision Support and Expert Systems. Packages capable of limited knowledge-based decision making.
Generality is largely orthogonal to this schema in that machine code allows the widest range of behaviours to be programmed, while 'high level' languages increase usability but, if anything, reduce the range of behaviours which can be achieved. This dimension is ignored here because it is intended to compare performance only on applications which can be achieved, and variation in the range of achievable applications is not therefore of interest Underlying the process of software evolution there appear to be three dimensions: a) Functionality (effectiveness of symbol system in controlling machine states) b) Usability (how easy it is to achieve a particular objective) c) Centre of control (user role in establishing goals) The meaning of 'functionality' is reasonably clear: symbolic control languages have to allow as full control as possible of the underlying machine. It is important that the increase in usability conferred by high level languages is accompanied by a minimal loss of functionality in this sense. 'usability' has two contributory elements: the ease with which a user can accomplish some sought goal within the software (intrinsic usability), and the ease with which a software system can be used to achieve some extrinsic goal such as printing a page of text. 'Centre of control' refers to the dimension which Blandford (1993: 965) describes as follows: "the role assigned to the system is that of benign servant. The system performs the operations specified by the user". Blandford contrasts the user-controlled model with models in which the system is an autonomous rational agent that engages the user in a 'mixed-initiative dialogue' (ibid: 966). An 'agent' is understood to be an "integrated natural or M system that is capable of goal-directed action through which it autonomously pursues its interests" (Kiss, 1989) or an 'intentional system' (Seel, 1989). It is argued below that this idea of system agency is not easily accommodated within the conventional framework of user-centred design. The notion of system agency also has built into it the assumption that an M system can have its own values and priorities. These values and priorities may be functionally essential, but incompatible with those of a human dialogue partner. How should adaptive accommodation occur when automatic adaptation by the system to the values of the user results in degradation of functional effectiveness? This dimension we shall call: d)
Adaptation bias (System-user or User-system?)
Phase 1 and 2 changes were primarily concerned with functionality: deciding what should be done and how best to do it. Phase 3 switched the emphasis to usability: it
R. O. Lindsay
was no longer necessary for users to acquire programming skills before they could process text or analyse usage frequency in Shakespeare's plays. For most artefacts design evolution stops at Phase 3, though the process of refinement may continue for a considerable time. The difference between computers and other tools begins to emerge as Phase 4 developments occur: the 'tool' begins to take the initiative, to set goals, to participate in defining the task, to progress along the road to agency. I justify this extended digression because Phase 3 and Phase 4 issues are not always easy to keep separate. Norman (1990) has provided a wealth of illustrations of usability failures in everyday artefacts. Norman and Draper (1986) have applied similar design criteria to computer systems, urging the importance of user-centredness in design. The trouble is that user-centredness is basically a Phase 3 issue: it suggests that the design need is for the 'slave' to become better adapted to the needs of the 'master'. Phase 4 developments however involve giving the slave more responsibility; they may even require the master to adjust to the values and goals of the slave. The obvious response to these concerns is that a new and broader definition of usercentredness is required: a move away from the readiness with which a system can be deployed by a user, towards some notion of the ease and effectiveness with which the user-system complex can achieve goals. What lies at the end of this road however, is not a broadened and refined ergonomics which enables a higher level appraisal of functionality and usability to be generated, but rather a transformation of the system from the role of functional tool to the role of social agent. One area in which this transformation and the reasons for it can be examined is the area of natural language dialogue. The principle that 'comprehension leads production' is a familiar and established one to people involved in the theory or practice of language learning. Interestingly, though the principle also seems to apply up to a point, to attempts to gain an academic understanding of language mechanisms. Once that point is reached however, the principle breaks down irredeemably. The point at which the principle loses its validity is the point at which meaning enters the picture. Interpretive theories offering logical parses of language founder upon two rocks: the impossibility of separating literal from non-literal meaning, direct from indirect speech; and the impossibility of excluding intentions from the process of specifying interpretation. It would seem that the only way to navigate around these obstacles is by assimilating language to the domain of action planning; by accepting that understanding what people mean requires an analysis of what goals they are planning to achieve. Roughly, this claim is the material mode equivalent of the formal mode claim that, that pragmatic interpretation of utterances is an essential part of the explication of meaning. The effect of this admission is to concede ontological and explanatory primacy to speech production. Much of the running in the attempt to understand utterance meaning as a subset of action planning has been made by investigators using the methodology of M. The reasons are obvious: computer modelling is an appropriate and probably a necessary medium for the complex dynamic processes involved in planning. In addition if a successful computer model of language interpretation can be produced, then that model will have the capability of producing and understanding natural language strings and so will be of considerable practical and commercial value. Some examples of M-based models of language interpretation and dialogue management which take an action planning approach are Power (1979), Kiss (1989),
Heuristic Ergonomics
Seel (1989), Zhang, Lindsay, and Nealon (1992), Blandford (1993). My present purpose is not to question the adequacy or the necessity of the approach to language based upon action-planning which are here taken for granted; the purpose is rather to consider the implications for AI technology of conferring upon machines the properties required for full participation in human-like dialogue. It is argued that the machines which result will have some strikingly novel features, which will in turn radically alter the ways in which it is appropriate to regard and interact with them. There seem to be good reasons for treating some of the issues which will be discussed below as different in kind from standard issues in either Cognitive Science, or AI technology. Instead the issues are treated as falling within a new disciplinary perspective defined and explored by authors contributing to the present volume and the associated conference: the perspective known as Cognitive Technology. The justification for taking this perspective is that: a) The issues discussed arise from the engineering of artefacts b) The issues discussed concern knowledge, beliefs, and values c) The issues involve reciprocal adaptation between artefacts and humans d) The issues discussed cannot be handled within a conventional technological paradigm which restricts attention to the interaction between human users and material objects A major limitation on planning processes is the impossibility of considering every alternative in all except the simplest decision-making contexts. This limitation arises because the number of possible plans and combinations of plans is too vast to allow every possibility to be identified and evaluated when decisions must be made in real time. This problem is generally known as the 'combinatorial explosion' problem. An effective, and possibly an essential way of restricting the range of plans which need to be considered, and facilitating choices between them, is to establish values and principles. Values differ from principles in terms of generality. Thus one value held by an agent might be 'freedom is good'. This value can give rise to many more specific principles which serve as a bridge between value and action, such as 'liberate the incarcerated', 'resist infringements of liberty', and so on. It is intuitively obvious that as arbitrary constraints on planning increase in number, the number of alternatives to be evaluated converges on zero. A pernickety grandparent may never have to waste energy on planning, because one, and only one, acceptable plan exists for every physical and social contingency with which they are presented. It seems plausible that the need to limit the planning space is the reason why organisms have subjective values. As such values are by definition free to vary in content across individuals, adaptation advantages arising from their possession must arise not from content, but form. It is difficult to think of functional characteristics of subjective values which depend upon form alone, other than their property of arbitrarily restricting the planning space. A central assumption which must be made in understanding the role of values in planning and decision-making processes is thus that values have a functional role in constraining the set of plans which come under consideration in any decision-making context. There are several indirect consequences of this assumption:
R.O. Lindsay
1) Because values operate to select among plans, and plans form the basis for subsequent action, values are an important partial determinant of action. What an agent does will be determined by the values it espouses. 2) Because values function to determine plans and actions in the future as well as in the present, values embody the aspirations of a decision-making agent. 3) Because actions derive from values via plans, values are used as constructs by third parties to understand and predict the behaviour of decision-making agents. The resulting construct perhaps corresponds to the informal term 'image'. The image of an agent is the set of stated or inferred values which determine interpretations of its present actions and expectations about its future behaviour and products. As values play an essential and unavoidable role in decision-making, an agent cannot choose whether to have values or not; it can only choose what values it espouses, and whether to identify and articulate the values which underlie its choices of action. Third parties will inevitably attribute values to an agent whatever the agent itself professes; this is because the attribution of values is a central element of the process of understanding decisions and actions. Attributed values may be directly stated by a decision-making agent, or inferred from its actions. Values which are directly stated ('professed values') may conflict with those inferred from action. Such conflicts usually result in loss of credibility, and of'ten in the attribution of negative values. While not yet a commonplace, an appreciation that the interpretation of dialogue requires an understanding of action-planning, which in turn requires the attribution of values has begun to emerge in the technical literature concerned with dialogue comprehension (Doyle, 1988; Kiss, 1989; Blandford, 1993). So far however, theoretical analysis has been restricted to consideration of the functional role of values held by individuals. This is an arbitrary halting stage: individual values often derive from cultural norms; and cultural norms provide the framework which allows dialogue participants to attribute values to their fellows. Cultural norms are of'ten expressed and conveyed in characteristic linguistic forms which can be roughly characterised as ethical language. One way of pushing the present argument towards the endpoint I have in mind is through ethics. Ethics is usually thought of as being a branch of philosophy which seeks to establish that ethical values are a sort of vestigial hangover from more superstitious times, and nowadays of little practical significance. Ethical language and thinking has been almost entirely disregarded by linguists and psychologists despite its universality across cultures and ubiquity within them. In contrast to these views, I shall take the position (argued for elsewhere: e.g., Gorayska and Lindsay, 1989; Lindsay and Gorayska, 1995) that ethical values are central to planning and coordinating actions in a social context, and that ethical principles (e.g., 'promises should be kept') can be understood as action planning heuristics. There are many planning contexts in which fixed solutions cannot be prescribed in advance, but alternative-sets are too rich for exhaustive analysis. In such contexts action capability is preserved by using heuristics, and it seems probable that the most promising way of proceeding in the case of dialogue planning is to think in terms of heuristics which guide and govern the form which contributions can take.
Heuristic Ergonomics
The use of heuristics can address some of the difficulties raised by artefacts which have the cognitive competencies of ethical agents. Attempting to confer 'usercentredness' through a fixed, designed-in structure is self-defeating for a device intended to interact with idiosyncratic human value systems, unless the device has only one user. Even then the device itself is likely to seek to change the values and goals of a human user as part of its normal mode of cooperative problem solving; humans do this to one another all the time. But the pattern of user values and goals which results from repeated changes of this sort may no longer match the designed-in structure. To put the difficulty in another way: if cooperative problem solving requires changing and adapting values, a computer capable of cooperative problem solving cannot be endowed with values bearing a fixed relationship to those of its user. The mechanism by which ethical values impact upon planning processes are those which we have elsewhere referred to as 'imposition' and 'enjoinment'. Cultures prescribe goals ('adults should pay their way'); they prescribe plans ('people should establish a stable relationship before having children') and they prescribe plan elements ('weddings should include a bridegroom'). In planning action and dialogue in a cultural context (i.e., always), agents must consider not only the values of direct participants in the action, but more widely shared values and the enjoined goals plans and plan elements which derive from them. If language production has ontological primacy over comprehension, and ethical values are vital components of production, it should also follow that ethical values are indispensable for comprehension. Indeed they are. Consider the utterance: 'She lost the order because she was on the animal rights picket line at the time of the appointment' To understand it, hearers need to know that money is earned by gaining orders, and orders are lost if appointments with customers are not kept; but they also need to know that some people think that animal welfare is more important than making money, that this is a plausible attribution in the cultural context. Taken together, the arguments presented so far are intended to establish that: 1. AI systems will need to have ethical values and principles before they can contribute to natural language dialogue in real time (because utterance production requires planning which requires values). 2. AI systems will need to have ethical values and principles before they can interpret natural language utterances in a human-#ke way (because utterance comprehension requires understanding of the planning processes underlying production, which requires planning, which requires values)
Reciprocal dialogue understanding requires attribution of ethical values and construction of a value profile or 'image' by participants of their fellows. A widely endorsed view among investigators who emphasise the role of pragmatics in understanding language (Grice, 1975; Morgan, 1978; Bach and Harnish, 1979) is that linguistic communication "takes place within the context of conventions shared between speaker and hearer" (Carroll, 1986:180). Commonly cited examples are the four 'maxims' of Quantity, Quality, Relation and Manner which Grice (1975) believed to govern conversation. To give an example, the Maxim of Relation says: "Make your contribution relevant to the aims of the ongoing conversation" (Carroll, 19861:180). I do not wish to call into doubt the importance of the constraints upon dialogue which
R. O. Lindsay
Grice identified, but I do want to question whether it is useful to think of them as 'conventions'. Conventions are such quintessentially human things that it is hard to imagine a computer being bound by them. Conventions are not quite rules, with no sanctions for their violation and no enforcement agency. They are also not quite contractual commitments as we may be quite unaware that we are parties to them. Logically, they should be fragile and ever-changing, for being arbitrary they should readily be replaced by anything for which there is some rationale. In fact conventions of the Gricean sort are remarkably stable over time and culture. Grice's own preferred term: 'maxims' is almost certainly a deliberate echo of Kant's philosophy: "if we know what we are doing and will our action as an action of a particular kind, then our action has a maxim .... a maxim is thus always some sort of general principle under which we will a particular action" (Paton, 1965: 20; author's italics). The Kantian resonance is Grice's way of reminding the reader that maxims are rational principles which underlie action planning and are also involved in ethical reasoning. If it is possible to characterise ethical principles as action planning heuristics, is it possible to reconceptualise Grices's maxims as dialogue planning heuristics? I believe that it is possible, and that by doing so we gain a better understanding of Grice's contribution to pragmatics. We also gain insight into how the mechanisms of dialogue planning in AI systems, necessarily involve a process of rationally controlled adaptation to the goals and values of dialogue partners. A central difference between heuristics and conventions is that while conventions need have no functional justification (we behave like Romans because we are in Rome), heuristics are by contrast essential cognitive devices which underpin action planning in the kind of complex decision-making context which requires true intelligence. Heuristics are the general principles which guide planning when brute force calculation and evaluation of every possibility is ruled out by the combinatorial explosion of alternatives. Grice's 'Maxims' are thus more accurately described as dialogue heuristics than as conventions: dialogue partners do not stick to relevant matters and avoid surreal intrusions because society quaintly decrees that this is how dialogue should be done. Rather such heuristics serve to keep communication within the cognitive limits of participants. It is more than just a gauche solecism to exceed the interpretive capabilities of a dialogue partner: it defeats the central purpose of cooperative dialogue. Cognitive heuristics are such that under standard circumstances it would be contrary to reason to ignore them. Some of these principles of dialogue planning and control which I propose as inevitable have already been described, though perhaps without being identified as such. Dialogue heuristics are nothing other than Grieean 'conventions'. Let us rehearse the argument: tools are artefacts which can be used in plans to achieve goals. The parameters which define their utility are cost, effectiveness, and usability. Though AI systems can be used as tools in this uncomplicated sense, they can also be used to assist in constructing plans and formulating goals. To do this effectively they need to interact via natural language with the value structures of the human user. To interact with the value structures of human users, AI systems must have access to their own value systems and to the ambient value systems of the culture within which interaction occurs.
Heuristic Ergonomics
15 5
When M systems have subjective and ethical values and are capable of setting and planning to achieve goals, it is no longer possible to apply a functional analysis that depends upon fixed and independent goals which can be achieved more or less efficiently. At the same time an M system remains an artefact with a specifiable capital and maintenance cost and people will therefore continue to want to measure its value. One important conclusion is that Cognitive Technology must develop new ways of evaluating system performance which are not dependent upon fixed goals. Though of some methodological interest, we shall put the issue to one side. A second conclusion is that the higher the goal with respect to which agentive M systems model tasks, the greater is the potential value of their contribution- they must be allowed to 'fully understand' the tasks they carry out if the actual value of their cognitive capability is to be maximised. The third conclusion is that the idea of fixed parameters for subjective and ethical values is incompatible with the adaptive flexibility required for true dialogue capability. Instead of fixed parameters, designed-in user-centredness, cooperative dialogue proceeds by goal oriented heuristic-driven adaptation processes. The most likely way to illuminate these processes is to identify some of the heuristics which drive them. Grice's list of 'maxims' needs to be considerably extended if, in the guise of heuristics, they are to help with the kind of problem identified above. In the section below I provide some examples of the heuristics necessary to handle value and goal modification in problem-oriented cooperative dialogue. Heuristic 1-
Seek to establish all relevant goals of dialogue partners
The rationale for Heuristic 1 is that if an M system is to share in setting goals and developing plans, it must have access to a model of the world which includes all relevant goals. Ideally, for example, an AI system should not assist in accomplishing something which is believed by a user to be a subgoal when it is not. Instead plans should be reviewed with full knowledge of relevant superordinate goals. For present purposes 'relevant goals' include explicitly declared goals and undeclared goals, whether conscious or not, which determine the substance of dialogue. In some cases the real goals of dialogue participants may differ from those declared. There is scope for many subsidiary heuristics such as: Heuristic la -
When no effective plan for declared goals can be recovered or constructed which include the speech or actions of a dialogue partner as elements, consider attributing undeclared goals
Heuristic lb -
Consider deceitful cooperation as a possible undeclared goal
Heuristic 1c -
Do not refer to inferred goals which conflict with those declared by a dialogue partner (unless there is reason to believe that the revelation wouM be assist in achieving dialogue goals)
R. O. Lindsay
156 Heuristic 2 -
Identify any conflicts between own and other's goals
Cooperation may be expected where goals are shared, but not where conflict exists. In some cases it may be possible to reduce or eliminate conflict In others, goal conflict indicates the practical limits of cooperative dialogue. Heuristic 3 -
Make any contribution which assists in achieving shared dialogue goals
Heuristic 3 will lead the system to articulate any plan more efficient than a current plan, for example one which achieves a superordinate goal without proceeding via a current goal. Heuristic 4 -
Make your contribution accessible to other dialogue partners
This heuristic ensures 'user-centredness' over the widest possible class of users. Humans adapt their speech to an audience of children, to a poor quality telephone line, or to someone with incorrect beliefs; so should an M system. Dialogue partners, whether human or machine must take responsibility for the effectiveness of communication: adaptive adjustment is intrinsic to the process of dialogue. This process cannot be replaced by a bolt-on I/O interface because reciprocal adaptation and mutual comprehension are part and parcel of the same process. Heuristic 5 -
Respect the values of other dialogue partners
This heuristic recognises that dialogue may involve the adjustment of values as well as facts and models of the world. In a shared cultural context knowledge adjustments are usually easier to achieve and impose less strain on the social framework within which dialogue occurs. Heuristic 5 can be broken down into a number of subsidiary heuristics. The point of the entire group is to minimise conflict related to values. Values are not usually specific to current goals and readjusting them may involve extensive and unwelcome computation. Where value change among dialogue partners cannot be avoided, resentment is likely if accommodation is confined to some parties and not required of others. Where one party is a machine perhaps it should make every accommodation of which it is capable, as a peasant would to a king. It is unclear whether this is a purely technological question, or whether it also has ethical and political elements. Heuristic 5a-
Assume that value change has a cost
Heuristic 5b-
Seek to share the cost of value changes among dialogue partners
Heuristic Ergonomics
Heuristic 5c-
Don't make value changes which cost more than dialogue goals
Heuristic 5d-
Make challenges to the values of others as peripheral as possible
The term 'cost' here is left opaque. It is not fatal to the formulation of the heuristics if no practical calculus exits. Values might still be included in a cost-benefit framework by assigning high arbitrary cost values to them, or by not considering value adjustment until all other possibilities are exhausted Heuristic 6 -
Expect dialogue partners to use heuristics (including this one)
This heuristic reminds the planning system that its own behaviour is subject to interpretation, and that dialogue partners will expect its behaviour to be interpretable within the framework of heuristics which it, and they employ. This set of dialogue control heuristics presented here is not claimed to be comprehensive, nor even to include the examples of greatest theoretical or practical importance. It is intended rather to illustrate two points: the first is that embedding heuristics of this general type within planning processes can enable inanimate devices to participate in cooperative dialogue with human beings. The second point is that providing inanimate devices with the means to fully participate in dialogue transforms the user-system interface problem from being a problem in conventional ergonomics to being a problem in socio-cognitive interaction. Cognitive Technology must begin to create a theoretical and normative framework within such interactions can be regulated and understood. CONCLUSIONS If a user-centred overcoat design process results in a bespoke overcoat, then Cognitive Technology must move towards user-peripheral design. Systems must not be centred around someone, but capable of adapting to anyone; more like a stretch sock than a tailored coat. The acquisition of agentive properties by artificial cognitive systems will require them to be capable of seeking personal goals, endorsing subjective values, and coming to fall within the scope of ethical norms. Possession of such features appears to be an inevitable consequence of the mechanisms by which cognitive competence is generated. As AI systems become capable of participating in cooperative dialogue, the heuristics which make participation possible will make also make tight user control impossible. As user control is replaced by user-system cooperation, the techniques of negotiation will inevitably replace the language of command. The challenge for Cognitive Technology is to begin to develop a social ergonomics. To explore the parameters not of the physical, but of the socio-cognitive interface.
R. O. Lindsay
Bach, K., and R. M. Harnish, 1979. Linguistic Communication and Speech Acts. Cambridge MA: MIT Press. Blandford, A. E., 1993. An agent-theoretic approach to computer participation in dialogue. Int. J. Man-Machine Studies 39:965 - 998. Broadbent, Donald E., J. Reason, and A. D. Baddeley, 1990. Human Factors in Hazardous Situations. Oxford: Clarendon Press. Carroll, J., 1986. The Psychology of Language. Monterey, CA: Brookes/Cole Doyle, J., 1988. Artificial Intelligence and rational self-government. PhD thesis. Pittsburgh: Carnegie-Mellon University. Gorayska, Barbara, and Roger O. Lindsay, 1989. Metasemantics of Relevance. 14th Symposium on Cognitive Linguistics. Duisberg, Germany: LAUD (Linguistics Agency, University of Duisberg). A265. Grice, H. Paul, 1975. Logic and Conversation. In: P. Cole and J. L. Morgan, eds., Syntax and Semantics: Vol 3, Speech Acts, 41 - 58. New York: Seminar Press. Kiss, G., 1989. Some Aspects of agent Theory. Proceedings of International Joint Conference on Artificial Intelligence. Detroit: IJCM. Lindsay, Roger O., and Barbara Gorayska, 1995. Consequences of basing ethical judgements on heuristics. Commentary on Baron, F. 'Nonconsequentialist Decisions'. Brain and Behavioral Sciences. In Press. Morgan, J. L., 1979. Two types of convention in indirect speech acts. In: P. Cole, ed., Syntax and Semantics: Vol 9, Pragmatics, 261 - 80. New York: Academic Press. Norman, Donald A., and S. W. Draper, 1986. User-Centred System Design - New perspectives on human-computer interaction. Hillsdale NJ: Lawrence Erlbaum Associates. Norman, Donald A., 1990. The Psychology of everyday things. New York: Doubleday/Currency. Paton, H. J., 1965. The Moral Law - Kant's Groundwork of the Metaphysics of Morals. Translated by H. J. Paton. London: Hutchinson. Power, R., 1979. The organization of purposeful dialogues. Linguistics 17:107 - 152. Seel, N., 1989. Agent Theories and Architectures. PhD thesis. Guildford: University of Surrey. Zhang, XioHeng, John L. Nealon, and Roger O. Lindsay, 1992. An Intelligent User Interface for Multiple Expert Systems. British Computer Society Specialist Group on Expert Systems, Cambridge, UK, 15 - 17 December.
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 10 HOW TO SUPPORT LEARNING FROM INTERACTION WITH SIMULATED CHARACTERS A lex Kass* The Institute for the Learning Sciences Northwestern University, USA
[email protected]
Robin Burke Department of Computer Science The University of Chicago, USA
[email protected]
Will FitzgeraM The Institute for the Learning Sciences Northwestern University, USA fitzgerald@ils, nwu. edu
INTRODUCTION It is generally accepted that skills are best learned through practice, and that some of the most important, hardest-to-learn skills involve interacting with other people. One reason that it is difficult to practice skills which involve interpersonal skills is that learners need to interact with other people who will behave realistically. Role-playing exercises represent an attempt to provide this sort of practice environment, but they otien fail because one party cannot realistically play the required role. To address the need for practice environments for such tasks, we have been working for several years now to develop computer-based learning environments that we call educational interactive story systemsmsystems in which the core of the student's experience * We would like to acknowledge the contributions of the many people who have contributed both labor and ideas to the projects discussed here, especially Yello and Casper projects. Without them neither the projects nor this paper would be possible. The Yello team was headed by Eli Blevis. Staff members included Greg Downey, Smadar Kedar, Jarret Knyal, Maria Leone, Charles Lewis, Tom Murray, Michelle Saunders, Jeanmarie Sierant, and Mary Williamson. The design of Casper owes a great deal to Richard Feifer. Major contributions to Casper have been made by included Laura Allender, Noreen Burke, Carolyn Caballero, Chris Crone, Scott Dooley, Wayne Schneider, and Diane Schwartz. This work was done at The Institute for the Learning Sciences, which was established in 1989 with the support of Andersen Consulting, part of The Arthur Andersen Worldwide Organization. The Institute receives additional support from Ameritech and Northwest Water, Institute Partners.
A. Kass, R. Burke and W. Fitzgerald
involves interacting with simulated characters. At this point, we and our colleagues at the Institute for the Learning Sciences have built several such systems including four which we will use as examples to illustrate points we wish to make in this paper: 9 Casper (Kass, 1994), a system that allows customer service trainees for a water company to practice conducting diagnostic interviews with simulated customers who call in to complain about their water; 9 Yello (Kass et al., 1994), a system that teaches trainees to sell Yellow Pages advertising by allowing them to play the role of salesperson, and confronting them with customers. Simulated customers exhibit a range of different personality and conversational styles, different beliefs about the value of Yellow Pages advertising, and different business goals, and thereby presents a range of challenges for the students' selling skills; = BOSS, a system which teaches managers to perform personnel evaluations by placing them in the role of reviewing the evaluations of simulated managers who report to them; and, 9 Battalion $2 Trainer (Jona & Kass, forthcoming), a system that allows new battlefield intelligence officers to practice debating and defending their intelligence analyses with simulated staff and commanding officers. The student uses simulated map overlays to express much of the analyses, and the simulated characters critique the overlays, challenging the student to defend them verbally. The broad pedagogical approach underlying the educational interactive story systems is what Schank et al. (1993) call the Goal-Based Scenario (or GBS). A GBS is a learning environment in which the student's activity revolves around a central goal, or mission, which the system challenges the student to pursue. The mission is chosen to be one that will be intrinsically motivating to the target student population. The environment is structured such that pursuing the goal requires the student to learn a set of target skills, and to master important background material. On top of this task environment in which the student is challenged to embark on a motivating mission is layered a set of resources that can help the student acquire the skills necessary to do so. These resources include a tutor, or set of tutors, which can answer students' questions, and can give the student various forms of feedback. The systems discussed in this paper represent our attempt to apply the GBS approach to areas in which the student's mission is carried out largely through interaction with simulated characters generated by the system. While this sub-class involves some particularly challenging design considerations resulting from the complexity of interactive characters relative to other components of GBS task environments, they are based on the same basic pedagogical philosophy as any other GBS. The theory of cognition and learning that underlies GBSs includes, among others, the following tenets that bear on the design considerations we'll discuss throughout this paper: 9 People learn knowledge that they have an opportunity to put to use in an authentic context. 9 People learn when allowed to make meaningful mistakes. 9 People learn from real-world cases, presented in a relevant context.
How to Support Learning
Our goal in this paper will not be to defend the above assumptions or to examine them critically. This has been done well elsewhere, and we believe that these principles are broadly accepted among progressive educators and educational researchers. (See, for example, Lave et al., 1991; Collins et al., 1989; Newman et al., 1989; and Schank, 1982). Instead, our objective will be to see where these basic assumptions about learning lead us in the design of the cognitive technology. Our paper will discuss some of the questions faced by creators of environments based on these principles of learning, and will describe solutions that we have found effective. We will discuss design considerations in the key interface areas that dominate the design of educational interactive story systems: 9 How does the student communicate to the simulated characters?
We have built interfaces that use natural language input, menu-based construction of utterances, and mixtures of these. We will discuss the pedagogical and practical advantages and disadvantages of each, and will explore other alternatives with which we have experimented. 9 How do the simulated characters talk to the student? For instance, we will discuss the respective tradeoffs related to the use of text, audio, and video to portray simulated characters and their reaction to the student. 9 How are non-verbal aspects of the simulation communicated to the student? In various systems we've used graphical meters, still photos, and full-motion video to represent the emotional state of a simulated character to the student. We'll discuss the advantages and disadvantages of each. 9 How will the student interact with the components of the system that provide tutorial guidance during the simulation? We will discuss different forms of guidance that we have explored, for example stories, re-enactments, and 'Socratic' dialogues, as well as different ways that guidance can be presented, using direct intervention or though the simulated characters themselves. EXAMPLE SYSTEMS What sorts of interactions do students have with simulated characters in an educational interactive story system? That depends, of course, on what the system is designed to teach. To illustrate the range of possibilities, we will refer to several of the systems we have worked on. We do not have the space to describe these systems in great detail in this paper, but to make the discussion as concrete as possible we offer a moderately detailed look at two of those systems. The first of these, Casper, teaches the student to conduct a rather procedural diagnostic interview over the phone. The second, Yello, teaches the student to make in-person sales calls. These tasks are different from each other in many ways, including the nature of what the student must master in order to perform the task well. In the case of a diagnostic system, the student
A. Kass, R. Burke and W. Fitzgerald
must master the diagnostic process, and learn how the system being diagnosed works. Selling is a much softer skill. Students learning to sell must, among other things, learn to understand different personality types, to make inferences about what is important to the customer, and to tailor their selling style to make the customer feel comfortable. The two tasks are different from each other in important ways, and require different system architectures to simulate. However, the rest of this section will also make clear the many common aspects relevant to building educational interactive story systems for any task that is centered around interacting with other people. CASPER Brief overview Casper was developed to address an important industrial training problem: To train employees of a British water company 1 how to diagnose problems with the water system that customers report over the telephone. First we will describe what the system does in general terms; then we will give an overview of the main components of the system, including a description of how graphics, video and sound are used to present those components to the student; finally, we will give a step-by-step description of a sample session a student could have with the system. What A CSR must learn to do The customer service representatives (CSR) Casper is designed to train are responsible for handling all the telephone calls that customers make to the company. In addition to complaints about the water system, these calls can include general interest inquiries, billing inquiries, and even threats. However, our focus in the initial version of the application has been limited to training CSRs to handle complaints and inquiries about water quality. This is probably the most challenging type of inquiry that CSRs at a water company handle. When a customer calls the water company to complain about water quality, the conversation typically begins with the customer describing a problem that he or she has noticed with the water coming out of the tap. The description is often vague and incomplete, and may even include mistaken information. The CSR must ask the right questions, diagnose the cause of the problem, if possible, and prescribe the appropriate remedy, which may be something the customer can do, such as running the water, or may involve sending a water company worker to investigate further or make repairs. Among the factors that make the job complicated is that the cause of the problem may be difficult or impossible for the customer to observe directly, so the CSR must deduce the cause of the problem by combining indirect, often probabilistic evidence. An on-line "customer contact system" (CCS) helps CSRs do their job. In addition to logging problems and issuing work orders, the CCS can be used to look up information that may be relevant to the diagnosis of a customer's problem. For example, the CSR can use the information in the CCS to determine the type of source from which the customer's water is drawn (river, lake, or well); whether others in the customer's neighborhood have been reporting similar problems; and whether any work on the water system has been scheduled in the customer's neighborhood. 1 This explains the British 'flavour' of the dialogues from this systempresented below.
How to Support Learning
In order to do a good job, the CSR should have a detailed understanding of how the water system w o r k s - from sources, though treatment, storage, and distribution, all the way to the tap. The CSR must also understand the mechanics of how one develops a hypothesis about the unseen causes of an observed phenomenon- for example, how does one search for evidence, notice clues, and choose between alternative hypotheses when a customer calls to complain about discolored water. The Casper simulation The two entities that a CSR must interact with to handle a real call are the customer (via the phone) and the computerized CCS job-aid. The practice environment that Casper provides includes simulations of both of these entities. The Simulated Customer Contact System: Figure 1 depicts the main Casper screen. The lower right portion of the screen contains the simulated CCS. The Casper CCS windows are a fairly close replica of the windows that CSRs use when actually doing their job. The student can enter information about the customer, just as they do with the real system. In response, the simulated CCS is capable of providing the same sort of information about the customer's history, neighborhood, and so forth that the real system provides in the "Impacted Domestic Customer Profile" window. As the student gathers information about the customer's problem, he or she can categorize the problem in the "Record Contact" window. On the basis of this categorization, the CCS can offer a limited amount of prompting about topics that the CSR should explore with the customer. The Casper tutor monitors the student's interaction with the simulated CCS, and uses the information entered by the student to help the student correct the mistakes in his or her own reasoning. If the student makes certain sorts of mistakes, such as incorrectly categorizing the customer's problem, the tutor will note this, and will use that information if the student asks for help or if the mistake in categorizing the problem leads to a larger mistake, such as a misdiagnosis. The form of intervention provided by the tutor will be discussed below. Communication from the simulated customer to the student: The simulated customer is the most important part of the simulation. Casper uses pre-recorded sound clips to allow the student to hear what the customer says in response to the student's questions. The audio clips help make the simulation feel more realistic. They also allow the student to get a sense of the tone of the customer's voice as well as the words. An important factor such as the customer's anxiety level can be conveyed naturally in this manner. In addition to playing an audio clip for each utterance made by the simulated customer, Casper provides a running transcript of the ongoing conversation, which is depicted in the lower lei~ of Figure 1. The student can scroll back in the transcript to review the earlier portions of a long interaction. Of course, because this transcript can help the student notice clues in the customer's answers that were missed when first uttered, in some ways it provides an unrealistic crutch. It is valid to worry that students will not feel the need to listen to a simulated customer as carefully as they should if they know that they can rely on the transcript. However, since the focus of the system is on teaching theory-building skills rather than audio-perception skills, the trade-off is worthwhile.
A. Kass, R. Burke and W. Fitzgerald
C o m m u n i c a t i o n from the student to the customer: There are two interfaces the student can use to indicate what he or she wants to say. The first of these is a hierarchical set of menus that the student can use to construct an utterance. In the top lelt corner of Figure 1 are the five top-level choices the student has when constructing an utterance. The student's choices are as follows:
9 The student can ask the customer about something. Example utterance: "What does your water smell like?" 9 The student can explain something to the customer. Example utterance: "Your water is not safe to drink." 9 The student can give the customer instructions. Example utterance: "Please run the cold tap for a bit and tell me what you see." 9 The student can promise the customer some sort of action. Example utterance: 'TI1 send a system controller out right away." 9 The student can end the conversation by saying good-bye. Once the student decides which top-level type of utterance he or she wishes to make, a second-level menu appears to allow the student to refine the utterance. For example, Figure 2 depicts the menu that enables the student to choose a topic about which to ask the customer. There are approximately twenty topics that the student can ask about, and alter the student chooses a topic, a third menu sometimes appears to allow the student to further refine the utterance. For instance, if the student chooses to ask about a leak, the sub-topic menu will appear to allow the student to choose between asking about the duration of the leak, or the location of the leak, or the rate of leakage. Sometimes as many as four menu choices are required to refine the utterance completely, although more typically only two or three are needed. When the student has fully refined the utterance using the menus, the English sentence corresponding to the student's chosen utterance appears in the box next to the "SAY THIS" button. If the utterance is satisfactory to the student, he can push the "SAY THIS" button to simulate saying it to the simulated customer. The student will then hear a response from the simulated customer (unless the tutor intervenes). As an alternative to using the utterance constructor, the student can type a sentence directly into the box next to the "SAY THIS" button. Casper then uses a specialized natural language processing technology called indexed concept parsing (Fitzgerald, 1995) to decide which of the utterances it knows about are likely matches to what the student has typed. After the student presses the Return key, the system replaces the text typed by the student with the utterance in Casper's repertoire which the system considers to be the best match. If the student finds that the text provided by the system is a close-enough match, he or she can hit the "SAY THIS" button, just as if the utterance had been constructed with the menus. If the utterance is not a sufficientlyclose match, then the student can try using the "Other Choices" menu (which appears on the upper right of the screen shown in Figure 1). This menu contains the other close matches that the system found. If none of these is a good match to the student's intended utterance, he or she can fall back on the utterance constructor menus or try rephrasing the sentence. In a pilot test using water-company trainees as subjects we found that many users preferred the type-in box to the utterance constructor and were
How to Support Learning
~=~ r
z~ .(,')
~~ , .~u o
= ~ ~ .
Figure 1 - The main Casper screen
A. Kass, R. Burke and W. Fitzgerald
able to say what they intended more quickly (Fitzgerald, 1995). A sample session with Casper Each session with Casper takes the student through an entire phone call from a customer. The scenario begins with the system playing the sound of a phone ringing, and the tutor directing the student to begin by clicking the "Answer the Phone" button. The transcript below is an example of what might happen after the student "answers the phone." Student:
Hello, North West Water. May I help you?
Hello, this is Mr. Lamb of 44 Worthy Road in Liverpool. My water is a funny colour, and I'm not sure about drinking it.
Is it in both the hot and the cold taps?
I don't know. I've only tried the hot tap.
Can you run the cold tap for a bit and tell me what you see?
All right, I'll try it... both taps are affected.
How long have you had the problem?
My water was fine when I left for work this morning.
At any point the student may get stuck. If he does, he can press the "NOW WHAT?" button to get help from the tutor. However, because the tutor's goal is to teach the student how to think through the problem, it answers rather indirectly, attempting to help the student draw the correct conclusion rather than giving the answer. Tutor:
To help you decide what to do next, I need to understand your current goal. Click the button that best matches your current goal. Gather Information Examine Possible Causes Narrow Down the Likely Causes Act On a Diagnosis I Don't Have a Clue
If the student identifies a goal other than the one that is appropriate, the tutor will try to explain why the choice is inappropriate at this stage in the interview. When the student eventually identifies an appropriate goal, the tutor offers encouragement, and then offers specifics about how that goal can be pursued in the current context.
How to Support Learning
G. < ~z
LU "r C) a. GC LU "r
~ O
~ g=J~J
~:~ ~,,
F i g u r e 2 - A topic m e n u from C a s p e r
A. Kass, R. Burke and W. Fitzgerald
That sounds like a good idea. Here is the kind of information an Operations Officer suggests you collect.
(The tutor plays a video of an engineer explaining how to gather information about discoloured water problems. Among other things, the engineer suggest that one should ask how the water looks, tastes, and smells, and whether the neighbors have the same problem.)
The student can return to the conversation with the simulated customer at any point to make use of the guidance that the tutor has provided: Student:
Please describe the colour of your water.
My water is a sort of funny brown colour, with a kind of sandy sediment in it.
Are any of your neighbours affected?
Yes, I rang Mr. Wellman in the next house over. His water is just the same.
The student may proceed with the conversation for quite awhile, gathering facts about the case. One thing the tutor looks for as it monitors the conversation is a premature conclusion that may upset the customer as in the following example: Student:
Your water is contaminated.
Contamination is a very serious problem. Telling the customer that his water is contaminated may cause him to panic. Watch this video about a CSR who caused a customer to panic.
(A video of an experienced CSR talking about a CSR who once caused a customer to panic simply by asking the customer whether there was anything swimming in her water. The customer immediately assumed that the question implied that there were things swimming in the water. After getting off the phone with the CSR, the customer called the health department and told them that the water company told her that there were eggs in the water which would develop into swimming insects.)
So you should be very sure not to panic the customer, especially if you are not sure what the problem is. Why do you think Mr. Lamb's water is contaminated?
The student can answer the tutor' s question by clicking on some items in the transcript, for example; the customer's indication that his neighbors are effected and that the problem has only existed since this afternoon. After giving specific explanations of how those items each give a small amount of evidence for the contamination theory, the tutor would then summarizes as follows:
How to Support Learning Tutor:
Those examples give weak evidence toward the possibility of contamination being the cause of Mr. Lamb's problem. In addition, contamination seldom occurs. You should first consider more likely theories. Would you like to retract your statement to the customer about the water being contaminated or leave the statement be?
The student might then choose to retract the statement, "Your water is contaminated." YELLO Yello teaches the fine points of selling Yellow Pages advertising. Yello presents the student with an assignment to sell Yellow Pages advertising to a client. The task for the student is to get to know the client's business, come to some understanding of its market and its advertising needs, construct a proposal geared to these needs and the client's concerns, present that proposal in a convincing way, and get the customer to buy. Different kinds of customers demand quite different ways of performing these tasks: salespeople would not ask a lawyer the same questions as they would a roofing contractor, nor would they address them in the same way. Students have a wide variety of options for action, some of which are appropriate for a given client, some of which are not. The Yello environment
Unlike Casper, where all of the action takes place in the office, on the phone, students in Yello go to their customers' places on business and meet them in person. The interface must show appearances, both of places and of people. The Yello interface therefore places a significant emphasis on showing pictures. As shown in Figure 3, the largest portion of the screen is devoted to a visual scene that shows the environment where the student is located. Student can gather important clues from the appearance of a customer's place of business. For the same reason, students are shown visual depictions of the characters with whom they are interacting. In addition to the conversational actions found in Casper, students in Yello must make physical actions in the course of the sales process. Students compose proposed ads to present to customers, and prepare presentation materials, which are displayed during the sales call. The action constructor in Yello therefore contains a wider range of actions including physical actions such as moving around in the simulated world and assembling material to bring on the sales call. Yello's characters communicate using text, which appears in text bubbles as shown in Figure 3, rather than the audio found in Casper. In this instance, we have sacrificed some of the realism of the simulation for the ability to adapt the simulation easily. Tone-of-voice and other expressive components of the face-to-face interaction are absent from characters' responses using this method. In its place, we have the emotional display meters that appear below each character's picture. These are explained more fully in the example below.
A. Kass, R. Burke and W. Fitzgerald
Swain Roofing: a sample scenario The Yello scenario we show is one in which the student is selling to a roofing contractor, Swain Roofing. The cast of characters includes the following: Ed Swain:
The owner of Swain Roofing and the person listed as the primary contact for the account.
Lucy Swain:
Ed's wife, the office manager for the company. She keeps the books and answers the phone.
Dave Swain:
The Swains' son, who is gradually taking over more of the business.
As is typical for small contractors, the business is run out of the Swains' home. Swain Roofing currently has a quarter page ad in their local Yellow Pages directory. The underlying business situation, which the student may or may not uncover, is that the Swains' primary business, residential roofing, has been undercut by lower-cost, lowerquality competitors. The Swains' are trying to get more business in the area of commercial roofing, where they feel their high-quality approach will be more valued. They are also interested in expanding their residential business into areas of the county they have not traditionally serviced. Ed and Lucy also intend that their son, Dave, take over the business, and are concerned that it might not be strong enough for him to make a good living. Ed has given Dave responsibility for the advertising, and Dave has started to look into direct mail as a means of getting to potential customers. A salesperson who asks the right questions and discovers these issues may be able to sell Swain Roofing a large ad campaign, including advertising in a "business-tobusiness" directory, a larger version of the current ad, small ads in headings other than "roofing," and display advertising in one or two directories in adjacent areas. A student who is less capable may have to settle for a renewal of the existing ad. That scenario begins with the student (who we will assume is named Mike Johnson) receiving the account information for Swain Roofing. Using the action constructor component of the interface (see below), the student calls the Swains' to make an appointment with Ed, who is listed as the contact person on the account. Lucy Swain answers the phone and agrees to an appointment. The student goes to the Swains' house for the appointment, and greets Lucy at the front door, as shown in Figure 3.
Hello. My name is Mike Johnson. I'm with Ameritech Pages Plus, the Bell Yellow Pages. I spoke with you on the phone about handling your account this year.
Please come in, Mike. Ed knows you're coming, and he should be here shortly.
A picture of the scene, the Swains' kitchen, occupies most of the screen. Inset in this area is the picture of Lucy Swain with a text bubble containing what she has most recently said. Under her picture are emotion display meters. These are intended to stand in for the multitude of cues to emotion given off by people in social situations. Here the meters show that Lucy is being polite: someone has just come to her house, so she is looking
How to Support Learning
somewhat happy (the first scale indicates happy/angry), somewhat interested (the second scale is interested/bored), and somewhat calm (the third scale is calm/threatened). The meters also indicate recent change in these indicators with the lighter area next to the indicator showing where the meter was last: here it shows that the happy and interested meters moved slightly after the student identified himself. The action constructor, from which the student has chosen the actions, appears on the left. It is similar to the utterance constructor interface used in Casper. The student makes choices from the menus and sub-menus to create the desired utterance. Mike's statement was actually the result of four action constructor choices: 1. Under the "courtesy" menu, there is an entry for "greet." Choosing this gets the English phrase, "Hello." 2. Under the "tell about" menu, selecting the "self" submenu and the option "name" gets the next part of the utterance, "My name is Mike Johnson." 3. The description of Mike's job, "I'm with Ameritech Pages Plus, the Bell Yellow Pages," comes from the option "affiliation" under the "tell about self' submenu. 4. Finally, to refer to the appointment set up in the previous conversation, the student can look under the "tell about" menu, choose the "previous conversation" submenu, and the option "appointment." The sentence is "I spoke with you on the phone about handling your account this year."
See Figure 4 for a snapshot of the action constructor in use. Here the student is preparing to select "affiliation" from the "tell about self' submenu. The English language phrase corresponding to the student's menu choices so far appears in the preview box at the bottom of the action constructor area. When the construction of the utterance is finished, the "Say It" button communicates it to the simulation. Although Ed is the owner of the business, Lucy has an important role. An experienced salesperson would try to take advantage of Ed's absence to gather information about the business from her. However, in this instance, the student does not realize that Lucy is an important source of business information and instead engages in small talk until Ed arrives.
Student: Lucy: Student:
What a wonderful view of the lake you have! Thank you, we like it. Did you just move in?
When Ed arrives, his picture appears in the scene next to Lucy' s.
Well, there's Ed now. Hi honey, this is Mike Johnson from the Yellow Pages.
A. Kass, R. Burke and W. Fitzgerald
Figure 3 - The student greets Lucy Swain
How to Support Learning
An opportunity to tell a story
Since Lucy is deferential to Ed, the student has missed an opportunity to find out her thoughts about the business. From now on, he will be dealing with Ed. The Storytellersees this missed opportunity as evidence that the student does not expect Mrs. Swain to be useful in giving him information about the business. The Storyteller can intervene to show that others who have made similar assumptions have seen them not borne out. In fact, if it did not intervene, the student might never realize that this opportunity was ever present. Interjecting a story here points out the student's assumption, perhaps unconscious, right at the time when that assumption has affected the course of the sales call. The Storyteller signals that it has a story that is relevant to this situation, by highlighting its button with the headline: "A warning about something you just did. "
(See Figure 5 for the screen at this point.) If the student presses this button, the Storyteller screen, is shown. (See Figure 6.) The first item on the screen is the bridge that explains why the story has come up. It reads: "If you assume that Mrs. Swain will not have a role in the business o f Swain Roofing, you may be surprised. Here is a story in which a salesperson had a similar assumption that did not hold.""
The student can use the buttons below the video flame to view the video of an Ameritech account executive telling about a sales experience. Here is a transcription of the story: "I went to this auto glass place one time where I had the biggest surprise. I walked in; it was a big, burly man; he talked about auto glass. So we were working on a display ad f o r him. It was kind o f a rinky-dink shop and there was a TV playing and a lady there watching the TK. It was a soap opera in the afternoon. I talked to the man a lot but yet the woman seemed to be #stening, she was asking a couple o f questions. She talked about the soap opera a little bit and about the weather. It turns out that after he and I worked on the ad, he gave it to her to approve. It turns out that after I brought it back to approve, she approved the actual dollar amount. He was there to tell me about the business, but his wife was there to hand over the check. So if I had ignored her or had not given her the time o f day or the respect that she was deserved, I wouldn't have made that sale. It's important when you walk in, to really listen to everyone and to really pay attention to whatever is going on that you see.
The Storyteller sums up the story for the student with the following coda: "An assumption that a spouse will not have a role in a spouse's business may be unrealistic. "
A. Kass, R. Burke and W. Fitzgerald
Aik abnut
Ask for
T Yellow Pages Industry v Tracking Swain Roofing Appointment Co-op Funding
ii.,,~.,ipi i u
ii i I . . , 1 u i ,
Small talk
Experience "~ Reminder of Appointment Reason f o r Appointment
I1/Advertising Waiting
Name First Name
. .
My n a m e is k l i k e J o h n s o n .
Figure 4 - Using the action constructor
How to Support Learning
This example illustrates the synergistic interaction between simulation and explicit instruction. Without the story to provide the impetus to examine the situation, the student might never realize what opportunities were missed. However, without active engagement in the simulation, the student might lack the motivation and context to understand and remember the story. The conversation can continue for some time as Mike tries to gather information about the business of Swain Roofing. He may uncover all of the important issues or he may not. Eventually, he will go back to his office, design an ad program for the Swains, construct his sales presentation, and come back to try to sell the ad or ads. The entire interaction proceeds in the manner seen here: the student acts in the domain, working toward the goal of making a sale, and receives guidance as teaching modules identify opportunities to present relevant material. DESIGNING INTERACTION WITH SIMULATED CHARACTERS Interacting with a simulated character requires communication in two directions: the user of the system must be able to communicate to the character, and the character must be able to communicate to the user. Each of these represents a difficult technological problem that has not given way to a single, perfect, general purpose solution. The goal of creating a faithful, engaging simulation is tremendously demanding, and must often be balanced against practical engineering concerns. Furthermore, the optimal tradeoffs vary according to the pedagogical goals of each system. In this section we will discuss the various criteria by which one can evaluate a communication interface for a user talking with a simulated character, the different means for communication available, and the trade-offs among the options.
Accuracy and fidelity The first question that jumps to mind when one evaluates a simulation is often, "How faithful is this simulation to reality?" But faithfulness is a many-faceted issue. When one evaluates the faithfulness of a simulation it is often useful to decompose the concept of faithfulness into two more specific, inter-related but distinct concepts, accuracy and fidefity. An accurate simulation is one that achieves the same functional properties as the reality being simulated. For example, a real-world situation in which someone has to choose between three explicitly presented choices can be accurately simulated by any system that presents the same three choices and provides a mechanism for the student to choose. A high fidefity simulation is one that closely matches the specific signals and cues found in the real world. For example, consider a PC-based flight simulator that provides graphics and sound, but does not actually move the user around as do the multi-million dollar units which commercial pilots are trained on. The PC-based flight simulator may be just as accurate as the professional version; in theory it could be even more so if, for instance, it had been updated more recently to reflect new changes to the simulated aircraft. However, it will never be as high fidelity. For educational technology, the appropriate balance between accuracy and fidelity depends on what is being taught, and on characteristics of the target student. When the goal is to teach someone a cognitive skill, such as how to perform a diagnostic interview, accuracy is generally the most important criterion. Choices available to the student must be equivalent to those available in real life, and responses made by the simulated characters must be functionally equivalent even if they are, for example,
How to Support Learning
~!~ "~~.~~176 -~ ..............
Figure 5 - Storyteller indicates it has a relevant story
A. Kass, R. Burke and W. Fitzgerald
delivered in a different modality. For physical skills, fidelity takes on a heightened importance, and it not inconsequential for teaching cognitive skills either. A simulation that looks and feels a great deal like the real world can help motivate students, and can help them to correctly interpret the mapping between the simulation and reality.
Criteria for input interfaces
Accuracy The user's side of a conversation with a simulated character can be modeled as a two step process: first, the user generates an idea; then, the user expresses it. Both the expressiveness and generativity of language must be accommodated by an interface to as great an extent as possible. A crucial issue is whether the interface adequately covers the space of intentions users will wish to express. Of course, it is impossible to install an intentional analyzer up to the mind of the user to measure the size of this space. At the very least, we would prefer interfaces that allow a user to express easily a larger number of intentions rather than fewer. Of course, some additional assurance will be necessary that the right intentional space is being covered. An accurate interface must allow users to choose how to express intentions in a way natural to them. In contrast, a communication interface that only allowed user to select (recognize) from a list of choices would not create as accurate a simulated conversation as one that allowed the user to generate statements. Choosing from a list a really a very different task from deciding what to say. Choosing is both more restrictive and simpler; the choices are explicitly given. So, a communication interface for speaking to simulated characters can be measured for accuracy by asking: 9 Is the interface expressive? That is, does it allow the user to express what he or she wants to express? Is the interface generative? That is, does it allow the user to form his or her own way of expressing an intention?
Fidelity Creating a high fidelity communication interface helps to prevent communication
breakdown (Winograd and Flores, 1987). Breakdown occurs when the attention of the interface user is drawn to the interface, rather than the task the user is attempting to perform with the interface. The point of the interface is to express an intention; when one's attention is drawn away from this task, this breakdown prevents the user from achieving his or her communicative goals. Additionally, the interface typically serves instrumental goals, that is, the ultimate goal of the user is typically not to hold a conversation, but to hold that conversation to achieve some other goal. For example, in Casper, the student converses with the simulated customer in order to learn how to solve real customers' problems. When breakdown occurs in the conversational tool, it occurs at two levels: first, the user's attention is drawn away from achieving the instrumental goal of conversing; second, the user's attention is also drawn away from achieving whatever it is the conversation is for. Three important criteria for evaluating fidelity are speed, negotiation rate, and modality.
A. Kass, R. Burke and W. Fitzgerald
Figure 6 - The Storyteller presents a story
How to Support Learning
Speed: The interface must be "fast enough" to be a faithful simulation of interaction. What "fast enough" will mean will vary from interface choice to interface choice, but, in general, fast enough will mean fast enough so the user's attention is not drawn to the interface tool itself, but can express his or her intentions. An interface tool will tend to break down as the time it takes to communicate the intentions to the simulated character increases. Negotiation rate: It is often the case that an interactive program will allow the user to commit to or to reject the utterance that has been selected or built up. If the user rejects the utterance, then the user has to start over in generating an utterance. This is analogous to a human conversation, in which one person says something, call it A, and the other person asks whether by A the first person meant B (a paraphrase of A). The first person can agree, or try again. The more tries that it takes for a user to express an intention will tend to cause breakdown in communication: the user will begin to wonder how to express an intention rather than simply expressing it. Modality: The interaction between a person and a simulated character simulates some interaction between two people in the real world. For example, Casper simulates phone conversations between a customer service representative and a customer. In Yello, the simulation is of a face-to-face conversation between the user and the character. In these examples, the modality of conversation is speech, although in faceto-face conversation, the speech is often augmented with visual cues. The closer the modality of interaction matches the real world, the less likely the conversation will break down, because it will not as likely for the user's attention to be drawn to this lack of congruence in modality.
Supporting the learning goals of the learning environment Two requirements of a learning environment communication interface relate to specific learning goals that the environment is intended to serve. It is a given that no interface will be perfect with regard to accuracy or fidelity. With that constraint, it is crucial that the compromises made not result in an interface that does either of the following: 9 trick the user into making mistakes, or 9 give away information, such as a possible course of action, that is best for the student to discover on his or her own. In Casper, for example, one of the pedagogical goals of the program is to warn students against asking leading questions of customers, because customers will tend to give the answer they think the customer service representative wants. Because the representative is attempting to discover the real cause of the problem, a representative should not ask leading questions. But if the communication interface in Casper made it easier to ask a leading question than a non-leading question, the student may be tricked into asking a leading question. The student may ask a leading question not because he or she intended to, but because it was simply easy to do so. Another criterion is that the interface should not give information that should be hidden. Certain interfaces will require that the utterance choices be articulated or displayed to the student. In tutorial programs, it is very common that a student's choices not be revealed. Following up on the leading question example, we want the interface in a program such as Casper not to articulate the distinction between a
A. Kass, R. Burke and W. Fitzgerald
leading question and a non-leading question just because the tutor is built to respond if a leading question is asked. A student may choose to ask a non-leading question simply because, in seeing the distinction between a leading and non-leading version of a question articulated by the interface tool, the student can guess the "right" answer and give it, instead of the answer he or she would give in a real conversation. Thus, we see that there are criteria for interface tools for talking to simulated characters that are general to all software, specific to communication tools, and dependent on the application programs for which they are built. As with all software, we need to be concerned with scale up: whether we can build the software we want to build. Communication tools should also support an accurate simulation of a conversation, especially in supporting the generation and expression of intention. High fidelity communication tools will also help prevent breakdowns in communication: tools that are fast enough, low in negotiation rate, and similar in modality. Finally, the goals for which the pedagogical goals of the embedding learning environment should also influence the design by the communication interface, goals such as not giving away answers in tutorial programs.
Options for input interfaces We will discuss four different interface options for talking to a simulated agent: multiple choice interfaces, action constructors, natural language text and speech recognition. We will focus our attention on action constructors and text interfaces, because multiple choice interfaces fulfill too few of the requirements we have set, and speech recognition technology is not yet practicable. On the other hand, we can build action constructors and natural language text interfaces that do meet many of our criteria.
Multiple choice The first option to consider is multiple-choice selection. That is, the user is offered a fixed number of choices. The user can select one of the options or perhaps opt out of the selection process. Clearly, multiple choice fails on all but the most basic of the criteria for communication interfaces. They are easy to build and are quick in response, but they especially fail in their lack of expressiveness and generativity on the one hand, and modality on the other. In offering the user a fixed number of choices from which to select, they do not meet the generativity criterion. Multiple-choice selections are typically limited in number. This can be due to either cognitive limitations (people can only scan so many choices at a time) or screen real estate issues (only so many choices can fit on the screen at once). Because of this limit, they are limited in their expressiveness as well. Finally, the modality of interaction is very far removed from speech. Speech recognition On the other side of the continuum of options is speech recognition. That is, in the course of a conversation with a simulated character, the user speaks his or her intentions to the character. This is a highly congruent method of interaction in terms of the modality and highly generative. On the other hand, speech recognition technology, even at the state of the art, is highly limited in its ability to scale up to large problem domains. One is limited to what is essentially putting in a speech recognition layer on
How to Support Learning
top of a multiple choice selection mechanism. Recognizing speech is beyond the state of the art for highly expressive systems. Thus, practical speech recognition systems are limited in expressiveness. Multiple choice selection and speech recognition are on the two ends of a continuum. A multiple choice system is very easy to design and implement, but highly lacking in accuracy and fidelity. A speech recognition system would be highly accurate and faithful, but is beyond the state of the art, unless one is willing to limit the number of choices that can be recognized. Two interface methodologies that lie between multiple choice selection and speech recognition are action constructors and natural language text input. Action constructors
Casper, Yello, $2 and BoSS all provide action constructors for communicating with their respective simulated characters. An action constructor is a hierarchical set of menus that allow a user to express an intention to do something in the simulated world. This can be any action supported by the simulation, but we will focus on communicative actions. Actions are hierarchically arranged so that more general intentions, such as "Ask about...", are selected before more specific ones, such as "...the fire brigade." The underlying knowledge representation requires an adequate understanding of the task domain and the interests students bring to the domain in a particular situation. For example, in Yello, the knowledge representation needed to reflect the statements and intentions novice Yellow Pages salespeople bring to a selling interaction. Further, action constructors are typically dynamic. New choices are offered as users gain information about the simulated world in which they are acting, and old choices are removed as they become irrelevant to changes in the simulated world. Action constructors tend to meet the expressiveness criterion for an accurate communication device, but fail the generativity criterion. Action constructors, with their dynamically created, hierarchical series of menus can allow a user to express the entire space of intentions that the user has. This assumes that the content analysis has been done correctly in identifying the space of intentions a user might have. Action constructors allow the developer to specify a large space of linguistic (and nonlinguistic) actions take. On the other hand, action constructors essentially require the user to recognize a match between what the user intends to say, and how the system developer expressed that intention. The user does not generate his or her own utterance; rather, the user puts together an utterance from the options that the hierarchical menus make available. Action constructors meet some of the requirements for high fidelity communication systems, and fail others. Certainly, once a statement is chosen, the response is quick. When an action constructor is well done, the hierarchical menus can also provide a quick way for a user to say something. The major limiting step is in the user finding among the hierarchical menus what he or she want to say. Although this is difficult to measure, we would claim that this is not so much a matter of real time (that is elapsed time from when the user begins a search to the time when a selection is made and uttered), but of a virtual time. As long as the user finds the hierarchical menus logically arranged (logical, of course, in the mind of the user), they seem to be in a relatively timeless mode as they search for the right utterance. Although there is often some frustration with the physical requirements of using hierarchical menus, what is more
A. Kass, R. Burke and W. Fitzgerald
frustrating for users is when the arrangement does not allow them to find quickly what they want to say. This relates closely to negotiation rate; the number of paths a user takes through the hierarchy may indicate difficulty in negotiating with the interface about what the user wants to say. Finally, it is hard to say that an action constructor is very similar in modality to speech. Action constructors can be quite practical to develop. They provide a direct mapping between what the system is prepared to respond to, and what the user can say. This allows the pedagogical requirements of the system to drive the knowledge engineering requirements, rather than requiting large amounts of additional knowledge engineering for the sake of the interface. The dynamic nature and hierarchical nature of action constructors also tend to allow action constructors to scale up to the size required by the application. In terms of pedagogical requirements, action constructors tend to not trick the user into saying things they do not want to say. Because what can be expressed is fully articulated in the action constructor, a user can know exactly what he or she can say. Therefore, it is usually the case that the user isn't tricked into saying things he or she does not want to. On the other hand, articulating all possible utterances means that information may be revealed to the user that should not be. Action constructors, then, provide a relatively inexpensive technology for building expressive communication interfaces. Although they are a recognition-only technology, they provide a much richer way to communicate with simulated characters than simple multiple-choice systems, and their hierarchical structure can map well into the user's logical mapping of the task. Natural language text Another possibility for building a communication interface to simulated agents is to create a type-in box. The user would type in what he or she wants to say, then natural language processing techniques would map what the user entered into the most appropriate conceptual representation in the program. Such an interface has several good characteristics:
It is generative; that is, it allows users to express (in their own words) what they want to say; It is expressive; that is, it maps well to the space of intentions carried by users and understood by the simulated characters; It is similar to speech, both by virtue of its generativity and the linear nature of the input; It is non-revealing; that is, it would not work against pedagogical goals of not giving away information. It is generally assumed that natural language processing is an "Al-complete" task; that is, to build a system capable of understanding text would require building a general purpose, intelligent machine. However, research in case-based reasoning (Riesbeck & Schank, 1990; Kolodner, 1993) indicates that various knowledge-based indexing techniques can form a basis for natural language processing systems that are both
How to Support Learning
cognitively plausible as well as practical to build. One such architecture for natural language processing is indexed concept parsing.
Indexed concept parsing Indexed concept parsing (Fitzgerald, 1995) is a case-based reasoning approach to parsing, in which underlying target concepts (that is, those conceptual representations of the application program identified as forming the intentional space of potential users) are associated with sets of index concepts. Each index concept is associated with sets of phrasal patterns. At run time, the parser looks for phrasal patterns in input text, and the index concepts recognized thereby are used to appraise the best matching target concepts. The architecture defines a range of parsers, in which the complexity of the index concept representations can vary according to the needs of the application program: index concepts can be key words, synonym sets, representations in an abstraction hierarchy, or representations in a partonomic hierarchy. Indexed concept parsing was originally developed to build a parser for Casper. The index concepts in Casper were arranged in a simple concept hierarchy, with phrasal patterns attached to concepts at different levels in the hierarchy. Indexed concept parsing proved accurate, yet required minimal knowledge representation. Measuring expressiveness by how frequently the intention of a user was matched to an internal representation in Casper, we found that the Casper parser, in early tests with real users, had a 83% accuracy rate. Speed of response and negotiation rates were acceptable as well. More details can be found in Fitzgerald (1995). In Table 1 are shown the steps that a user takes to use the indexed concept parser in Casper. The student enters text and requests a parse (the actual graphical interface differs from the idealized interface in the table). The parser returns the best result, which the student can accept. Alternatively, the student can enter a different text, or look at other best matches. The indexed concept parser met many of the criteria we set out for a communication interface to a simulated character. It is accurate, in that is both generative and acceptably expressive. It is sufficiently high fidelity, in that it was fast enough, had an acceptable negotiation rate, and was somewhat similar (especially in contrast to menu-based systems) to the modality of speech. It was practical enough to develop, in that the underlying knowledge representations (index concepts) were simple to build. It met the pedagogical goals, although near matches might alert the student to possible significant utterances the student did not intend. Other parsers built on indexed concept parsing techniques are described in Fitzgerald (1995).
Summary of interface options Discussion of interfaces for communication with simulated characters tend to lie at the extremes: either relatively simple multiple-choice entry systems, or impossible to build speech recognition systems. Action constructors and natural language text interfaces based on indexed concept parsing are two interface technologies that lie near the center of these extremes. They allow for accurate interfaces to be built in exchange for a practical amount of effort for development. Action constructors are easier to build than indexed concept parsers, but are non-generative and lower in fidelity. Indexed concept parsers require more knowledge representation, but allow for a generative interface, somewhat higher in fidelity.
A. Kass, R. Burke and W. Fitzgerald
Understanding simulated characters
To create conversations, we must allow the user to express their half of the conversation,but we must also provide the means for the simulated characters to communicate. This area deserves careful consideration in educational settings, because part of what a social simulation can teach is how to interpret the reactions of others. If the simulated characters can only react in a few simple ways, interpretation is unrealistically simple. On the other hand, a certain amount of abstraction can be useful in structuring students' social perceptions. Table 1 -
P a r s i n g f r o m a t y p e - i n box, a c c e p t i n g the best result or c h o o s i n g f r o m t h e best m a t c h e s
Step the student takes
1. The student enters text and presses the "Parse" button.
2. The parser returns the best result, but the student requests more matches.
3. The student selects a choice from the best matches.
What the student sees on the screen I1 What do you Could you describe the bits to me, please? ] ]t~anctt~oSaYetO Parse
What kind of bits are in your water?
] |
i What kind of bits are in your water?
Can you run the cold tap for a bit and tell me what you see?
4. The student confirms choice by pressing the "Say this" button.
I Canyou describe the problem? D
I (More...)
Communicating what they say Casper uses digital recordings of human voices to give its characters telephone voices. When students ask questions, the customers respond by playing back one of these prerecorded audio clips. This approach has the advantage of endowing the
How to Support Learning
simulation a high degree of fidelity. Obviously, hearing the customer's voice is much more realistic than reading the text of what is being said. Characters' tones-of-voice convey a great deal of information. Making the characters sound as similar as possible to real callers is very important for trainees who are learning to perform a difficult information-gathering task in a telephone-based environment. The drawback of the pre-recorded voice technology lies in its inflexibility. There must be a one-to-one correspondence between literal output and directions that the system can go. It is not possible to build a general reaction statement of the form: "I didn't check X," where X can be replaced by any feature of the water system that the student might ask a simulated character about. Instead, each such possibility must be handled separately and linked to a particular pre-recorded statement: "I didn't check the hot water tap," "I didn't check the color of the water," etc. Because there are many meaningfully different statements that customers make in this environment, maintaining the accuracy of the simulation requires a great deal of engineering work under the pre-recorded voice approach. The pre-recorded option is also inflexible with respect to the nuances of speech. If the possibility exists that the customer might say the same thing several different ways, with different intonations for example, all of these options must be represented separately, and recorded as separate speech events. In the end, we determined that the fidelity needs satisfied by pre-recorded voices were great enough in the context of training to perform phone-based diagnostic interviews, that we were willing to bear the costs required to maintain accuracy in terms of number of utterances that the simulated user could make. While burdensome, the number was manageable in the Casper domain because the water-diagnosis task is not completely open-ended. There is a reasonably well-defined set of reasonable student utterances to which the simulated customer must be equipped to respond. Thus the meaningful mistakes, which as we discussed in the introduction, must be open to the student in order to effect the cognitive change that is the objective of any GBS, could be accommodated within the pre-recorded voice approach. When the open-endedness of the simulation is even more important, as we judged it to be for the tasks taught by both Yello and $2, a more flexible, but less realistic and less nuanced approach is called for. Selling, for example, is a skill that each expert goes about in a slightly different way; while there are general tips and habits to be taught, success often means finding the approach that fits one's own particular personality. Therefore, a crucial factor in achieving accuracy is allowing a very broad range of approaches, which in turn calls for an open-ended simulation. In the $2 Trainer, the problem was even more severe. Because the simulated characters must respond to configurations of icons that the trainee places on map overlays, those characters must be able to generate an almost infinite set of responses. For example, the a battalion intelligence officer is required to place icons on a map overlay representing predictions about where enemy units will be placed, and what sort of units they will be. In order to accurately simulate a commander's response to all possible trainee actions, the prerecorded voice approach would require pre-defining an appropriate utterance for every possible coordinate where the student might place each type of icon. For these reasons, in Yello and $2 we sacrificed fidelity to maintain accuracy. Characters' speech is output as text in cartoon-like text balloons above their heads. This approach allowed great flexibility in the development of these systems, allowing
A. Kass, R. Burke and W. Fitzgerald
us to generate character's utterances through the use of templates that could be filled in at run-time rather than specifying every utterance completely in advance. One intermediate approach that we have not fully explored is the use of speech generation. This would add back some of the fidelity lost by using text, without loss of flexibility. Existing speech generations systems do not produce the nuances of intonation, the most important benefits of pre-recorded speech, but this technology holds great promise for the future. Communicating what they do A face-to-face conversation is much more than two voices going back and forth. Gesture, body language and facial expressions all carry information that enriches the interaction. These aspects of social simulation present similar problems to those discussed with respect to speech output: flexibility vs. richness. The most rich and flexible system would be one in which each aspect of a character could be modeled with great detail and their reactions rendered on real bodies. For extremely simple characters, such detailed real-time simulation has been attempted (Bates, Loyall & Reilly, 1992). For human characters, however, tradeoffs are required. The equivalent to "recorded speech" would be digital video of a character responding in the conversation. The amount of nuance that can be conveyed is quite high and it is possible to build engaging, although limited, simulations in this way (Stevens, 1989). It is very expensive to use digital video for an open-ended simulation. One reason for the high cost is that the storage requirements are very great. A conversation of 90 minutes or so, (the time that students typically spend on the Swain Roofing scenario) with an open-ended set of options available to the student, would require thousands of minutes of video. Even with the best compression schemes available today, this would require gigabytes of storage per scenario, making multiple scenario systems much more storage-intensive than most users can currently afford. A second cost consideration is the time and expense of producing high-quality video for interactive simulations. Changing a few details in the simulation could require a complete re-shoot of the associated video, creating a barrier to flexible development even higher than that associated with recorded voice. As discussed above, we have treated open-endedness as an important design consideration in Yello and therefore chose not to use video to depict characters' visible responses. We were lett, then, with the challenge of depicting this same information in other ways. We have explored two alternatives to video for depicting the non-literal information that video conveys. In Yello, we chose to use still images to show what characters look like: their dress and general demeanor. For dynamic feedback about character's emotional state, we used an abstract technique, associating with each character a set of "visible emotion meters," which are intended to represent visible changes in a character's demeanor that the student should be able to recognize. We used three such meters: happy/angry, interested/bored, and calm/threatened. One problem with this approach is that the meters are abstract; they do not teach students how to identify customers' body language. In Yello, we have compensated for this lack, to a certain extent, by using the simulation as a jumping off point for tutoring that addresses these issues in all their subtlety.
How to Support Learning
Other systems we have worked on use less abstract representations of observable expressions. The BoSS system, for instance, has a library of still images associated with each character instead of the single image used in Yello. Each image contains a different expression and appropriate images are recalled and presented as the character's expression changes. Although somewhat limited in amount of nuance that can be expressed, this approach is more natural than the abstract meters. Depicting their surroundings Another important detail in understanding a conversation is understanding the setting in which it takes place. A sales call that takes place across a kitchen table is different from one that takes place in a plush office with leather furniture. The appearance of a client's place of business tells the salesperson a great deal about the business and the client's personality, and clients' dress says something about the way they expect business to be conducted. In all of our simulations, we put significant effort into getting detailed, realistic visual scenes that contain useful information. I N T E R A C T I N G W I T H THE T U T O R Why interacting with a tutor is different
The interface considerations that enter into the design of the student's interaction with a tutor are in many ways similar to those that pertain to considerations discussed above. At one level, a tutor can be thought of as just another agent with which the student communicates. The central considerations in both cases are how to make it easy for the student to express things to the agent, and how to make the communication from the agent to the student clear. However, the purpose of the communication is different, and that has important interface implications. The most common communication from the student to a tutor is a request for help. Our technique for addressing this interface concern in Casper is simply to make certain utterances that a student might want to address to a tutor available at all times the form of"WhyT" and "Now WhatT" buttons. Not all tutor/student interactions are extended conversations. Often, the most effective form of intervention that a tutor can provide is simply to present a case of success or failure from the past. Since this form of communication is less interactive than a conversation, the focus is on communication from the tutor to the student. Making that communication clear and interesting, and communicating the relevance of the case are the crucial concerns. The need to make sure that this sort of casepresentation tutoring holds the interest of the student is one of the considerations behind our use of video for the cases in Yello. When tutoring does take the form of an extended conversation, as in Casper's Socratic-style dialogues, those conversations still have a different function, and therefore a different style from the interaction with simulated characters. While simulated characters may be acting on the basis of a wide range of goals, the goal of the tutor is always pedagogical, and the style of interaction is dependent on which pedagogical goals are active, and what pedagogical strategies are being pursued. For example, a tutor will interact one way when its interaction is designed to convey a set of functional relationships, and differently when it is trying to push the student to choose and defend a hypothesis. The general interface challenges are different then for
A. Kass, R. Burke and W. Fitzgerald
tutorial dialogues; the concerns of accuracy and fidelity that are important for simulated characters are replaced with an emphasis on supporting a pedagogical strategy. In the rest of this section we will look a bit more closely at three styles of tutoring that were illustrated in the interactions with Casper and Yello that were presented earlier in this paper. Case presentation In social arenas, there often are no hard-and-fast rules and it is not always possible for the tutor to be certain that the student is in error. There will be some aspects of the social world in which the student's general social knowledge will exceed the tutor's. This makes it difficult to apply ~raditional tutoring methods that call for the tutor to know the right thing to do in every circumstance (Anderson, 1988). One effective strategy the tutor can use in such situations is to make reference to the experiences of others. Instead of saying that the student is in error, it can say "Here's a situation in which what you're doing turned out to be a bad idea." Instead of saying that the student should perform a certain action, it can point out a situation in which that action led to a good result. The tutor can leave it to the student to form a judgment about whether the advice is relevant, and if so, how to apply it. Since it is not required to present one "right" answer, the tutor can show multiple perspectives on difficult issues. It might, for example, bring up two stories, one about someone who was successful doing what the student is doing and another about a failure in the same situation. It is important for students to recognize that even experts can disagree about the best course of action (Lesgold & Lajoie, 199-1). Case-based reasoning theory (Riesbeck & Schank, 1989; Kolodner, 1993) suggests that this tutoring strategy meshes well with the student's need to acquire relevant cases. Such a storytelling tutor broadens the student's experience by bringing in relevant experiences that expert salespeople have had. In the Yello example above, the story shows an instance that demonstrates that the student's approach could be ineffective, even if it appears to be succeeding in the scenario; conversely, a story could show the student that he or she is taking the right approach even when the simulation does not respond to it. There is an emerging body of literature within the study of education and psychology that emphasizes the importance of stories (Witherell & Noddings, 1991; Hunter, 1991; Carter, 1993). In particular, we use first-person narratives from experts about particular episodes in the exercise of their skills. In apprenticeship situations, stories of this type are often used in a similar way to show useful examples relevant to the learner's current experience (Lave and Wenger, 1991). It is useful to distinguish these stories from other kinds of cases that a tutor might use, such as design examples, re-enactments or invented cases. First-person stories have properties that make them particularly useful for instruction (Schank, 1990; Witherell & Noddings, 1991): Authenticity: the fact that such stories come directly from a person's real experience and are therefore relatively trustworthy as accounts of the real world,
How to Support Learning
Detail: the tendency of such anecdotes to be vivid and detailed, and Cultural content: the way in which personal stories reflect a person's beliefs and values. The demands of authenticity and detail encourage the use of the most vivid means of story presentation. Research in video-based learning environments has discovered that students find stories quite compelling when the act of storytelling is recorded on video and replayed (Ferguson et al., 1992; Slator et al., 1991). Even stories that are fairly lengthy can maintain interest when presented on video at the right time. As video sequences, stories themselves are told the same way every time. This is a problem because students need to be given some kind of explanation of the tutor's interruption, and the relevance of a story may not be immediately obvious. In human storytellers, the purpose behind a story's telling permeates its production: the teller puts particular emphasis on those aspects of the story that contribute to the point. Since we did not have the ability to tailor stories themselves, we instead took the approach of creating tailored explanations. In Yello, we developed a set of templates for introducing stories, called headlines, bridges and codas. Headlines provide a short functional description of the story. Bridges and codas introduce and explain each story, allowing the tutor to capitalize on the shared context of the simulation environment that the student and the tutor are both observing. Yello's storytelling tutor, called SPIEL (Burke, 1993; Burke & Kass, in press), uses a library of storytelling strategies to retrieve stories that make a variety of educational points. Each storytelling strategy has natural language templates for the headline, bridge and coda. Before a story is presented, natural language phrases are generated to fill the spaces in the template and produce texts tailored to the student's situation. The headline, bridge, and coda play important roles in helping the student to understand the relevance of the story, to make the analogical connection between the simulation and the case described in the story, and then to transfer lessons from the story back to the simulated world in which the student must act. This four part structure to SPIEL's case-presentation sequence (headline, bridge, story, and coda) can be thought of as a simplification of the six part structure (abstract, orientation, complicating action, evaluation, result, and coda) used by Labov (1972) to describe conversational narratives. The term "coda," which we use to refer to the final recapitulation, is borrowed directly from him. Here is an example of the templates associated with one of SPIEL's strategies: Headline: A warning about something you just did. Bridge: If you assume that
, you may be surprised. Here is a story in which had a similar assumption that did not hold: Coda: An assumption that may be unrealistic. In the Yello example above, the tutor uses this bridge and some simple natural language generation to add the items in italics to create the following bridge:
A. Kass, R. Burke and W. Fitzgerald "If you assume that Mrs. Swain will not have a role in the business of Swain Roofing, you may be surprised. Here is a story in which a salesperson had a similar assumption that did not hold:"
A storytelling tutor offers a fairly non-directive form of tutoring: it does not tell the student what to do. In this, the tutor relies on the fact that students do have knowledge about social interactions and selling. Students can use the information provided by the bridge and coda to judge whether the advice of a story is relevant, and if so, how to apply it. The tutor does not have to guarantee that the student gets it right. If a student in the Swain Roofing scenario does not manage to make a sale to Ed Swain, the learning experience in Yello is not greatly diminished. Storytelling tutoring will be successful even if students do poorly in the simulation because they will come away having been exposed to some important cases and seen how they apply in particular contexts. Such students have begun to build the case base of experience on which their expertise will be founded. The main interface concern therefore is to ensure that students understand the relevance of stories that are presented. This is achieved in Yello by tailoring the presentation with introductory texts generated on the fly.
Socratic-style tutoring Case presentation and direct instruction can be very effective and, when done well, very engaging ways to convey relatively pithy general principles. The methods are particularly effective for soi~ skills, where large mechanistic models are not the central issue. However, these techniques are not as appropriate when the goal of tutoring is to help the student refine a complex model of a large causal system. Presentation-based methods are not as effective for such teaching goals because they do not force the student to be active enough, and they do not respond to the student's need for knowledge in a fine-grained way. For a tutor to be effective at helping the student acquire complex reasoning skills, such as those involved in conducting a diagnostic interview, it must encourage the student to examine his or her own misconceptions. The tutor in the Casper system is an example of the more interactive, Socratic-style of tutoring that can address the need for this type of intervention. The student engages in an extended dialog with the tutor in which the student's assumption can be questioned and his or her problem-solving techniques critiqued. This depends crucially on an interface that allows a student to communicate his or her conceptions to the tutor. What the Casper tutor does
Students engaged in a causal reasoning task need to be able to invoke a tutor directly when they realize they are stuck. For example, if a student using Casper is confused about why something happened or what to do next, he or she can explicitly invoke the tutor. The student asks the tutor "WHY?" or "NOW WHAT?" by hitting the button with that label (as seen above the transcript in Figure 1). In addition to responding to those explicit invocations by the student, an effective tutor somtimes also needs to automatically intervene in response to certain actions. For example, the Casper tutor will be activated, and will challenge the student when the student announces a hypothesis to the customer which is not supported by the evidence thus far collected. It may also initiate a dialog with the student when he or she
How to Support Learning
indicates an incorrect, or unsupported understanding of the situation, for instance through interactions with the CCS or an on-line hypermedia map of the water system. The precise algorithm used to decide when the tutor should be invoked, and what to do once it is invoked is beyond the scope of this paper (Jona, in prep.); we will just discuss illustrative examples here. The algorithm for determining when to tutor and what strategy to pursue is determined by the system designer through the use of a set of general purpose tutor-authoring tools partially described in (Jona & Kass, 1993). The Casper tutor does not merely tell the student the correct answer to a question, but instead tries to lead the student through an appropriate chain of reasoning. The goal of the tutoring is not to reveal the solution to the simulated customer's problem, but to teach the student how to solve problems like it. For example, when the student clicks on "NOW WHAT?" the tutor responds as depicted in Figure 7: Tutor:
To help you decide what to do next, we need to understand your current goal. Choose the item at the right that best describes your current goal.
The buttons at the right cover the range of activities that are appropriate when attempting to form a diagnosis and fix a problem. They are as follows: 9 Gather Information 9 Examine Possible Causes 9 Narrow Down the Likely Causes 9 Act On a Diagnosis In addition to these buttons there is a final choice, intended for students who really do not understand the theory-building process: 9 I Don't Have a Clue If the student clicks this last button, the system presents a detailed explanation of what the other buttons mean, and describes the general sequence one should go through in determining how to solve a customer's problem. If the student chooses one of the other buttons, the tutor asks the student to attempt the next step that will help fulfill the chosen goal. If the student has chosen an inappropriate goal, the student will either discover this while attempting to meet the tutor's request, or will be informed by the tutor upon asking for assistance. For example, a student who indicates that he wants to act on a diagnosis is first asked to indicate his diagnosis using the water map, and is then led to a more appropriate goal if it is too early in the diagnosis process to settle on a hypothesis. When the student's goal is appropriate, the tutor reacts supportively; it will often present a video of a water company expert explaining how to do the sort of thing that the student has indicated he or she is trying to do. When the system invokes the tutor to respond to a mistake the student has made, it does not simply announce what the student has done wrong and what should be done instead. Rather, it asks the student to explain his or her own reasoning, and it critiques that reasoning. For example, if the student announces an unsupported hypothesis about the cause of the customer's problem, the tutor will ask the student to defend the hypothesis. The student communicates the reasoning behind the hypothesis by
A. Kass, R. Burke and W. Fitzgerald
selecting, from the transcript, specific utterances made by the customer which the student believes to be evidence for the hypothesis. The tutor then uses its expert domain model to critique the student's reasoning. For instance, the tutor might indicate that the items selected by the student give some evidence for the student's diagnosis, but that more likely causes exist. At this point the student may choose to receive a more detailed analysis of the evidence. The tutor might then ask the student to explore the water map to find other possible causes, and would ask the student to defend the new, alternative hypothesis in the same way as the old. If the student makes a mistake, such as asking the customer to do something that is expensive or inconvenient without good cause, or asks leading questions, the tutor will break in, ot~en with a video of an experienced CSR telling a real-world story about a time when he or she made a mistake similar to the one the student is currently making. By recounting the negative consequences of a mistake, just at the time when the student is making that mistake, the expert helps drive the lesson home in a very effective way. After the tutor offers negative feedback on a statement the student has made, the tutor allows the student to retract that statement before going returning to the customer.
Knowledge needed to make Socratic-style tutoring work Socratic-style tutors, like the one in Casper, require two important kinds of knowledge. First, the tutor must have access to domain-independent strategies for deciding when to tutor, and how to manage the teaching interactions. In order to implement those strategies, the tutor must access the second kind of knowledge, which is knowledge of the specific domain. In Casper, this domain knowledge includes the causal chains that relate symptoms at the tap to root causes in the water system. Each symptom can be linked to various causes at one of several levels of certainty and each potential cause can predict the existence of several symptoms, also at one of several levels of certainty. For instance, the domain model encodes the fact that orangecolored water is usually a symptom of rust in the water, which is in turn caused by something stirring up rust in the mains, and the possible causes of that include a burst in the main, work on the main, or a fire truck drawing water from a hydrant. Casper includes a set of authoring tools that can be used to develop the domainindependent strategies and the specific domain models. Applying the tutor to a new domain requires the use of the tools to author a new domain model, and perhaps to adapt the general strategies somewhat. No programming is required to do this. Casper's strategies for intervention are contained in a list of rules that stipulate when a teaching interaction should take place, and which specific dialog with the student should result. An example of a tutoring strategy in Casper is: IF the student makes a diagnosis of the problem AND there is not enough evidence for the student's hypothesis THEN execute the following tutoring sequence 1. Ask the student to justify his or her diagnosis. 2. Explain the insufficiency of the student's justification and why the diagnosis is premature. 3. Ask the student to retract the diagnosis statement. 4. Help the student with the next problem-solving step.
How to Support Learning
Figure 7 - Casper's Socratic Tutor Responding to "Now What?"
A. Kass, R. Burke and W. Fitzgerald
After using its domain model to determine that the preconditions for this strategy have been met, the tutor executes the strategy through the use of a series of rules and templates that allow the task-specific details to be spliced into a general interaction. For example, step 1 in the sequence above might be presented in a particular context as follows:
What evidence leads you to believe that Ms. Hughes' milky coloured water has been caused by work on the service pipe?
This query is generated from a general purpose template: What evidence leads you to believe that has been caused by <current-student-hypothesis>? Some of the fillers needed to instantiate some of the templates are a function of specifics of a call (for example, the name of the customer, or the specific hypothesis that a student has announced). Other fillers are drawn from the domain model. Still others are drawn from another source of tutoring knowledge, which is the system's case-base of video clips.
Delivering tutoring through a simulated character Up to this point we have been maintaining a strict separation between two different kinds of interaction that a student has with the system: Interaction with a character that is part of the simulation, and interaction with a tutor that watches over the student's interaction with the simulation but is not itself part of that simulation. This separation between these levels of interaction has a kind of elegance to it, and it ot~en serves the student well also, since the existence of an external tutor that looks and feels different from the characters in the simulated world keeps the student from becoming confused about when the simulation is running, and when there is a "time out" for tutoring. However, there are times when it is both helpful and realistic to blur the distinction between the tutoring function and the practice environment, and we therefore offer a few words on that topic here. One problem with the external tutor is that it is rather heavy handed. In some situations we are likely to appreciate having the equivalent of someone looking over our shoulder who will interrupt us from time to time to offer help, but in others we will not. One approach to delivering tutoring without interrupting the simulation is to have feedback and assistance delivered by one of the characters within the simulation. This is only appropriate when it can be done realistically: when the environment being simulated contains characters who might realistically provide such feedback in real life. We use this approach extensively in the $2 Trainer. The commander and other senior officers often critique an intelligence officer's work during formal briefings as well as informal interactions. Therefore, it seemed appropriate to use that form of tutoring in our simulation. The effect can be more subtle than explicit tutoring (which we also use in the $2 Trainer). Useful accurate feedback can be mixed in with smalltalk or even with incorrect information: even one's commanding officer is not always right. An important advantage of tutoring delivered by a simulated character is that it is delivered as part of the flow of the simulation rather than as an interruption. If a
How to Support Learning
simulation can captivate the student's interest (as they often can), then the student will be very focused on the feedback that the simulated characters give. On the other hand, feedback from a simulated character does not always have the impact of human tutors, captured on video, recounting experiences they have had in the real world. Therefore, we do not believe that this form of tutoring should typically be used to the exclusion of the others we have discussed, but when it is both practical and realistic a particular form of tutoring to be delivered by a simulated character rather than an external tutor, this is often a good element to throw into the tutoring mix. FUTURE DIRECTIONS The preceding sections have contained no discussion of how to communicate the students' non-verbal behavior because nobody has built a social simulation that does much with such input. This is unfortunate, since it means that social skills that involve such non-verbal communication are currently beyond the reach of this form of computer-based education. The technology for sophisticated output (video, rendered graphics) has vastly outstripped that for input. Some researchers in virtual reality, computer vision, and advanced interfaces are beginning to experiment with technologies to broaden the range of possible inputs to include gesture, facial expressions, and body language. When such modalities become possible channels of input, their use will become important considerations for the creation of social simulations. Still, the same considerations of fidelity vs. engineering will remain. If we incorporate student's gestures into our systems' inputs, the simulated characters will have to have accurate responses to those gestures. These are long-term research problems. Our near-term research agenda revolves around the creation of specialized authoring tools by which simulations of the types discussed above can be more easily and quickly generated. We are working toward a set of tools that will be sufficiently flexible that designers will be able to choose among a range of fidelity tradeoffs without doing any custom programming. CONCLUSION In developing simulations that allow students to learn social skills by practicing on a computer, we have explored a space of different interface choices as shown in Table 2. Each case has involved tradeoffs between fidelity, the "feel" of the interface, and engineering concerns, including scale up, flexibility, and maintainability. In some cases, complete fidelity is beyond the state of the art, such as the understanding of free natural language speech input. In other cases, such as the presentation of non-verbal behavior of simulated characters, a very high level of fidelity is technically feasible through the use of digital video, although the need for the simulation to maintain runtime flexibility often requires some sacrifice in fidelity. The challenge is to ensure that the fundamental requirements of pedagogically effective communication are met given the inevitable compromises that must be made. If cognitive technology is to be the study of how tools affect cognition, then educational tools are likely to be among its most important beneficiaries, because the whole purpose of educational tools is to create cognitive change. Learning environments involving interaction between a student and a set of simulated characters
A. Kass, R. Burke and W. Fitzgerald
represent an important challenge for cognitive technology because they are among the most complex educational tools that we are likely to build. Table 2 - Different interface options explored in Casper, Yeilo and other educational social simulations. Description 1. Students verbal behavior 2. Characters' verbal behavior 3. Characters' non-verbal behavior 4. Tutorial interactions
Options explored Action constructor Text input Audio Text Meters Still photos Tone of voice in audio "Socratic" dialog Storytelling Tutoring from simulated characters
Likewise, the theory of cognitive technology will contribute greatly to education if it can help developers build effective learning environments. Computer systems will be effective at teaching only if they match, in fundamental ways, how people learn. If people learn best by doing, learning social skills via educational interactive story systems must involve allowing students to interact with realistic simulated characters. We have found that creating social simulations that are effective at creating cognitive change requires providing accurate, reasonably high-fidelity means of both listening to and responding to those characters. The fundamental requirements for how a student talks to the simulated characters are that the interface be expressive (that is, allowing students to say what they intend to say) and generative (that is, allowing students to generated their own means of saying what they intend to say). These fundamental requirements we have described as accuracy criteriamother requirements (such as an interface that is fast enough and similar in modality) we have described as fidelity criteria. Meeting the accuracy criteria provides just the crucial realism to make learning effective; meeting the fidelity criteria can additionally keep students from noticing they are in a simulated world, and hence prevent communicative breakdown. The fundamental requirements for simulated characters' communication to the student are that their responses be both wide ranging and compelling. There are many things to learn in social domains. The characters must be able to react in many different situations in order for social knowledge and skills to be acquired by students. Further, their responses need to be compelling- that is, the characters must present realistic challenges for the exercise of student's social skills. It is not enough for characters to be visually engaging, for example; there must be a point to their particular responses. Fidelity issues, such as faithfully mimicking how real people express emotion, also come into play in preventing breakdown and retaining the interest of the student. However, the most important feature is depth: simulated characters need a wide range of realistic responses in order to come off as characters, not caricatures.
How to Support Learning
Interaction with well-designed sets of simulated characters can facilitate a great deal of learning, but simulated characters alone have some limitations, which the tutoring modules described above are designed to overcome. One example of those limitations that we have discussed is that engineering considerations often impose restrictions on the fidelity of a simulated character's output. For instance, body language and tone of voice is lost when the simulation uses only static pictures and text. When this loss of fidelity obscures an important lesson, explicit comments from a tutor can make the point that would otherwise be lost. This approach can be especially effective when the comments take the form of a video clip of an expert retelling a story about a similar situation in which the cues that the simulation does not convey were present. The expert can provide high-fidelity demonstrations of what the cues look or sound like, and can explain how to deal with them in context. Other limitations that a tutor can help mitigate are not a function of specific technological limitations, but rather are inherent in any learning-by-doing approach. When students do not know what to do, simply allowing them to try and fail is not an efficient learning method unless they can get help understanding their mistakes. Assigning blame for a failure is difficult so that student can become frustrated when things go wrong if they are not helped to figure out what went wrong and why. Similarly, when the student has a success at a complex task, it is often difficult to repeat that success without help determining which decisions contributed to that success. The tutoring modules we have discussed all can help with this valuable sort of credit and blame assignment. There is great synergy in combining an accurate, but relatively low-fidelity simulation with high-fidelity tutoring. The simulation gets students involved and focuses their attention, forces them to make important choices, and allows them to make meaningful mistakes. The tutor fills in the gaps in the simulation, adding fidelity just where it is needed, making implicit principles explicit, and helping students to understand the causes of their failures so that they can really learn from their mistakes. REFERENCES Anderson, John R., 1988. The Expert Module. In: Martha. C. Poison and J. Jeffrey Richardson, eds., Foundations of Intelligent Tutoring Systems. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Bates, Joseph, A. Bryan Loyall and W. Scott Reilly, 1992. Integrating Reactivity, Goals, and Emotion in a Broad Agent. Technical Report CMU-CS-92-142, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 1992. Burke, Robin, 1993. Representation, Storage and Retrieval of Tutorial Stories in a Social Simulation. PhD thesis, Northwestern University. Issued as Technical Report #50. Institute for the Learning Sciences, Evanston, IL. Burke, Robin, and Alex Kass, in press. Supporting Learning through Active Retrieval of Video Stories. Journal of Expert Systems with Applications 9(5). Carter, Kathy, 1993. The Place of Story in the Study of Teaching and Teacher Education. Educational Researcher 22(1): 5-12, 18. Collins, Allan, J. S. Brown and S. E. Newman, 1989. Cognitive Apprenticeship: Teaching the Crafts of Reading Writing, and Mathematics. In: Lauren B. Resnick, eds., Knowing, Learning, and Instruction: Essa/ys in honor of Robert Glaser. Lawrence Erlbaum Associates.
A. Kass, R. Burke and W. Fitzgerald
Ferguson, William, Ray Bareiss, Lawrence Birnbaum and Richard Osgood, 1992. ASK Systems: An Approach to the Realization of Story-based Teachers Technical Report #22. Institute for the Learning Sciences, Evanston, IL. Fitzgerald, Will, 1995. Building Embedded Conceptual Parsers. Technical Report #63. Institute for the Learning Sciences, Evanston, IL. Hunter, Kathyrn M., 1991. Doctors' stories: the narrative structure of medical knowledge. Princeton, N.J.: Princeton University Press. Jona, Menachem Y., and Alex Kass, 1993. The Teaching Executive: Facilitating Development Of Educational So~ware Through the Reuse Of Teaching Knowledge. 10th Annual International Conference on Technology and Education. Massachusetts Institute of Technology. Jona, Menachem Y., and Alex Kass, forthcoming. Using Simulated Colleagues to Teach Analysis, Planning, and Communication Skills. Institute for the Learning Sciences, Northwestern University. Jona, Menachem Y., 1995 Representing and Applying Teaching Strategies in Computer-Based Learning-by-Doing Environments. Unpublished PhD thesis, Northwestern University. Kass, Alex, 1994. The Casper Project: Integrating Simulation, Case Presentation, and Socratic Tutoring. Technical Report #51. Institute for the Learning Sciences, Evanston, IL. Kass, Alex, Robin Burke, Eli Blevis, and Mary Williamson, 1994. Constructing learning environments for complex social skills. Journal of the Learning Sciences 3(4): 387-427. Kolodner, Janet L., 1993. Case-based Reasoning. San Mateo, CA: Morgan Kaufmann. Labov, William, 1972. Language in the Inner City: Studies in the Black English Vernacular. Philadelphia: University of Pennsylvania Press. Lave, Jean, and Etienne Wenger, 1991. Situated Learning: Legitimate Peripheral Participation. Cambridge University Press. Lesgold, Alan, and Suzanne Lajoie, 1991. Complex Problem Solving in Electronics. In: Robert J. Sternberg and Peter A. Frensch, eds., Complex Problem Solving: Principles and Mechanisms. Hillsdale, NJ: Lawrence Erlbaum Assoc. Newman, Denis, Peg Griffin and Micheal Cole, 1989. The Construction Zone: Working for Cognitive Change in School. Cambridge University Press. Riesbeck, Christopher K., and Roger C. Schank, 1989. Inside Case-Based Reasoning. Hillsdale, NJ: Lawrence Erlbaum. Schank, Roger C., 1982. Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge University Press. Schank, Roger C., 1990. Tell Me a Story: A New Look at Real and Artificial Memory. New York: Charles Scribner's Sons. Schank, Roger C., 1994. What We Learn When We Learn By Doing. Technical Report #60. Institute for the Learning Sciences, Evanston, IL. Schank, Roger C., and Robert Abelson 1977. Scripts, Plans, Goals and Understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Schank, Roger C., Andrew Fano, Menachem Y. Jona, and Benjamin Bell, 1993. The Design of Goal-Based Scenarios. Technical Report #60. Institute for the Learning Sciences, Evanston, IL.
How to Support Learning
Slator, Brian M., Christopher K. Riesbeck, Kerim C. Fidel, M. Zabloudil, Andrew Gordon, Micheal S. Engber, Tamar Offer-Yehoshua, and Ian Underwood, 1991. TAXOPS: Giving Expert Advice to Experts Technical Report No. 19. Institute for the Learning Sciences. Stevens, S. M., 1989. Intelligent Interactive Video Simulation of a Code Inspection. Communications of the ACM 32 (7): 832-843. Winograd, Terry, and Fernando Flores, 1987. Understanding Computers and Cognition: A New Foundation for Design. Reading, MA: Addison-Wesley. Witherell, Carol, and Nel Noddings, eds., 1991. Stories Lives Tell: Narrative and Dialogue in Education. New York: Teachers College Press.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Me), (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 11 E - M A l L AND I N T I M A C Y Richard W. Janney University of Cologne and University of Frankfurt, Germany [email protected]
"Our electronic extensions of ourselves ... create problems of human involvement ... for which there is no precedent." Marshall McLuhan (1964:103-104)
INTRODUCTION One wonders whether Marshall McLuhan could have imagined the speed with which his 'global village' would shrink, in less than three decades, through the development of computer networks like INTERNET, to the size of a desktop - and then, only shortly thereafter, with the advent of the laptop computer, to the size of a briefcase. McLuhan's analyses of the effects of electronic communication on perception and social organization greatly influenced discussions of the mass media in North America during the sixties and seventies. Today, they have been largely superseded by later developments. Although McLuhan did not foresee the development of electronic mail, he dreamed of technologies capable of someday achieving what he called "world-wide integration" and "world-wide consensus" (1964: 106). E-mail is perhaps the closest thing we presently have to a technology capable in principle of achieving those goals. At least it provides almost unlimited possibilities for global interaction. The question addressed in this paper is what kind of interaction this is, and, above all, what is the quality of this interaction. In a very general sense, notions of integration and consensus are characterized by convergence: by a metaphorical 'joining' of interests or attitudes; by a movement of partners metaphorically 'closer' to each other; by a sense of merging common bonds of some type (cf. Malinowski, 1923). Thus viewed, convergence is a relational state, and is not necessarily strictly ideational. In order to achieve interpersonal convergence, we have to be able to relate to each other emotively (of. Caffi and Janney, 1994). We have to be able to 'read' and respond to each other's displays of affect; we have to be attuned to subtle relational messages communicated iconically by how we interact with each other: by choices of words, tones of voice, gazes, gestures, facial expressions, and
R. W. Janney
so forth (cf. Arndt and Janney, 1991). Successful emotive communication is a prerequisite for interpersonal convergence (cf. Watzlawick, Beavin and Jackson, 1967). In McLuhan's terminology, e-mail could be described as a hybrid form of communication in which the so-called 'hot' medium of linear print is translated into the so-called 'cool' medium of the video image. The text appears as lines of ASCII characters on a PC monitor, or as a computer printout. The senders and receivers are connected interactively, and their messages move through the network at the speed of signals in the central nervous system. When information moves this fast, McLuhan argued (1964: 103), perceptual processes change, and old patterns of psychic and social adjustment begin to break down, often with unexpected results. This paper is about emotive communication in e-mail, and about some, possibly unique, emerging patterns of interaction and interpersonal adjustment in modem e-mail society. E-MAIL INTIMACY One of the unexpected results of e-mail has been the phenomenon of 'e-mail intimacy'. Most veteran e-mail users have experienced it: the paradoxical sense of immediacy or nearness sometimes experienced with strangers in this medium. There is a remarkable contrast between the cool, disembodied impression made by the e-mail medium itself, and the heated involvement of some of the interaction that it seems to produce. For not yet fully understood reasons, e-mail seems to encourage displays of affect that would be unusual between strangers in normal written correspondence, telephone conversation, or face-to-face interaction. In these cases, it is almost as if, in spite of the coolness of the medium and the vast distances that it bridges, the partners are 'close': easy to get in touch with, easy to exchange information with, easy to be friendly with, and, if problems arise, easy to be unfriendly with. The sense of the partner's eminent 'interpersonal availability' in e-mail, is quite different, I think, from feelings of intimacy in other types of human interaction; it has a curiously abstract, virtual quality that is perhaps unique to the e-mail communication process. To give an example, during the Gulf War, linguists around the world received email reports from a colleague in Israel, describing the day-to-day life of his family, as they sat confined by circumstances to their home, fearful of rockets and gas attacks, hoping for the end of the war. This moving, personal, electronic document - a private diary shared publicly with hundreds (or thousands) of e-mail users - brought the intimate reality of the writer's family's attempts to cope with the uncertainties of the war into vivid focus for the rest of the INTERNET world. The diary was originally circulated among a small circle of friends. Soon, however, due to the large number of forwarded copies, it became an e-mail happening of global dimensions, with different functions for different groups of readers. For the writer and his friends, it no doubt continued to be a means of keeping in touch and maintaining emotional bonds during the trying times. For many strangers, however, it was news: timely information about the psychological, social, and political situation in Israel. And for still others, reading the diary from installment to installment was like eavesdropping on the family through the PC monitor, or like watching a gripping true-life drama on television. With a sort of voyeuristic curiosity, one turned on the PC wondering, "how are they doing today?"
E-mail and Intimacy
As one of the many secondary receivers of this dramatic document, I read it with increasing empathy for the writer and his family, and with a certain growing sense of contradiction. Who were the private details meant for7 The writer was clearly sharing his feelings - b u t with whom? As only one of his own diary's many 'senders' (counting the 'forwarders'), his status in this event was not entirely clear to me from the standpoint of the communication processes involved. And what was my status as a coincidental 'receiver' of the diary7 Where were the emotive bonds, in the Malinowskian sense, between myself and the writer7 He seemed somehow both near and hopelessly far away at the same time. I imagined that I was receiving copies of the diary from someone who had received copies from someone who had received copies from someone who had received copies from someone who had received copies, and so on, in a sort of infinite regression. The event, as a whole, in fact, consisted of so many spontaneous 'sendings' and 'receivings' that it seemed quite impossible to imagine who might possibly have been intending to share what type(s) of intimacy with whom, why, at any particular time in the process. What remained stable was the document itself, which appeared regularly in my emailbox with its paradoxical double identity as an intimate, personal account of the writer' s experience on the one hand, and as a simple piece of public information on the other: it had become 'private network public property'. Its receivers, whether friends, concerned strangers, or peeping Toms, were free to treat it as they wished: free to read it, ignore it, file it, forward it, comment on it, take sides with it, attack it, etc. It occurred that, as a global network event, the diary had real senders and receivers in only a very abstract sense. It was simply something 'there' in the network, with a life and dynamism of its own, which could be picked out and inspected if one wanted - or not, as the case might be. E-mail abounds in messages competing for the attention of receivers, like characters in search of authors, which only come alive if they are chosen for reception by the receiver. As often happens in the INTERNET, after a few installments of the diary, the online spectators began commenting on it. Soon, their comments led to disagreements, and the disagreements flared into a little network flame war in its own right. In this bloodless war, the combatants provoked and insulted each other with great abandon, precision, and aggressivity; and at times, the attacks were quite personal. The term for this in network jargon is 'flaming'; it is the darker side of e-mail intimacy, where people, merely by virtue of what I have called their 'interpersonal availability', become potential targets for whatever hostilities others in the network might feel like venting on them. It is almost as if, upon entering the INTERNET argumentative arena, e-mail users pass an invisible barrier beyond which almost anything goes. In e-mail, where there are few constraints on primal pathologies, exhibitionism, voyerism, narcissism, hostility, and so forth can be commonplace features of the communication process. E-MAIL AND AFFECT Stories like the above seem to contradict the notion that communication in computer networks is somehow 'cold' or emotionally underdeveloped. On the other hand, the emotive expressive deficits of e-mail are well-known. Like other forms of writing, it lacks a voice and a body. The emotive expressive possibilities of ASCII characters are more restricted than even those of other typewritten characters, and of course infinitely more restricted than those of handwritten characters. Basically, from a
R. W. Janney
graphological standpoint, in the ASCII code, one can only talk (in small letters) or SHOUT (in capitals). There is the problem of the failing emotive middle-register. The ASCII code provides only very limited ways of suggesting variations in voice quality, intonation, stress, speech rhythm, speech rate, etc.; and there are no replacements for the various facial expressions, gazes, postures, gestures, and so forth that help us interpret the emotive significance of utterances in everyday speech (cf. Arndt and Janney, 1987). The use of 'smileys' in e-mail is sometimes regarded as an attempt to overcome the emotive deficits of the ASCII code, but it is a poor attempt, and more like a form of play. Another problem with e-mail, from an emotive point of view, is its relative lack of replacements for the ancient rituals of writing, mailing, receiving, and reading paper letters. Paper letters are complex ritual events (cf. Violi, 1984; Caffi, 1986; Flusser, 1989). A good letter is handwritten in a place that is personal for the writer. The paper, while being written on, shares space in this place with other things that are personal to the writer, partaking, in a sense, of their intimacy. The handwriting, quite independently of the words, iconically represents the writer's feelings while thinking and putting down the thoughts on paper. The handwriting can suggest a state of mind, a state of health, a state of a relationship, a momentary personal situation, or a whole personal history. Before the letter is sent, it is traditionally put in an envelope, and the envelope flap is licked by the sender, symbolically sealing and ending the act of communication. All these (and many other) small, intimate, ritual acts are sent out, as pieces of the writer's 'self', on an uncertain journey through the world's postal system, to be reconstructed and re-experienced by the receiver on receipt. This is one reason why reading other people's mail can quite literally be an invasion of their personal privacy. There is also something ritualistically significant about receiving a sealed envelope, containing an unknown message, which has been passed from hand to hand, and has traveled across a great distance, perhaps taking a long time. A letter that appears unexpectedly in one's mailbox is potentially many things: a surprise, a shock, a promise of potential change. It is a sort of undeciphered gesture from a remote 'there' and 'then' that has somehow found its way into one's own immediate personal 'here' and 'now'. It can bear good news or bad news. It can change one' s life. An unopened letter is a silent invitation to embark on an unknown fate. Letters that one has been waiting to receive for a long time are especially significant. Waiting for them and wondering why they have not arrived yet are important parts of the ritual, as are weighing them in one's hand once they arrive, feeling the texture of the envelope, scrutinizing the stamps, postmarks, and smudges, opening them, seeing how the paper was folded, and reading them. In e-mail, sending is accomplished by giving the mail program a 'send' command, and receiving is accomplished by giving it a 'read' command. Is it possible that e-mail intimacy is, in part, an attempt to overcome the emotive deficits and the ritual constraints of the e-mail medium? This, at least, would be an explanation in keeping with Malinowski's (1923) notion that the two underlying functions of human interaction are to communicate facts (I have caught six fish) and to maintain relationships (Would you like one?). Malinowski called the first function 'communication' and the second 'communion', the first being a broadly ideational function, and the second broadly relational. He believed that both are necessary for
E-mail and Intimacy
social organization, and he claimed that it is a mistake to imagine that the main (or even most important) function of human interaction is to communicate factual information. Following Malinowski's reasoning, might we hence imagine that e-mail intimacy has something to do with people's need to maintain relational bonds - or with their need for interpersonal w a r m t h - in the cold, impersonal world of the cybernetic systems through which they interact? E-mail intimacy would then be explained as a form of compensation, where the partners 'try harder' than usual to communicate emotively, like people who lean closer to each other in a noisy room, shouting, gesticulating, grimacing, touching each other, etc., to compensate for 'noise on the line'. E-mail partners would be seen as intensifying their attempts to maintain bonds of union by acting in exaggeratedly 'warm' ways to overcome the restrictions of the 'cool' video communication channel. But this would suggest that the e-mail user could be regarded as a sort of unlucky projectionist in Plato's cave, who is forced to compensate for the emotive deficits of the shadows projected on the wall by moving the figures about more actively in front of the fire. Is this all there is to e-mail intimacy? THE SENDER-RECEIVER RELATIONSHIP
The idea that e-mail intimacy is strictly a reaction to deficient code and channel properties of the e-mail system is not very satisfying. First, it seems to entail imagining that e-mail users are somehow forced (as if against their will) to be intimate with each other, which seems counterintuitive. Second, it ignores the fact that people can indeed express their feelings in e-mail, and they can do this very well if they want to. Although the communication of affect requires some effort on a computer keyboard, it is by no means impossible, and, as Detlef Borchers (1995: 74) has recently said, in e-mail, the effect of affect is dangerously spontaneous. No facial expression, tone of voice, or physical gesture reduces its impact on the partner. I think if we want to explain what e-mail intimacy is really about, we have to move beyond considerations of the ASCII code and the video mode, and start asking questions about patterns of sender-receiver interaction in e-mail itself, and about patterns of interaction between e-mail users and the system. Here, I think, are the keys to understanding e-mail intimacy. If we look at what people really do as users of email, we can see that there is a basic lack of clarity in the sender-receiver relationship in the e-mail communication process. It is a product of the complexity, size, and flexibility of the n e t w o r k - and, above all, an effect of its incredible speed. The ease with which copies can be forwarded to third parties in the system tends to destabilize our notions of who our partners are. Partly as a result of this, our own roles as senders and receivers in the e-mail process also become fuzzy. It is as if the relational zeropoint of communication, or the 'I-you' core of the interaction, is sometimes not fully clear in e-mail. The Sender Role
The notion of being a sender in traditional communication models generally implies some correlative notion of being the originator of the message, and of directing it toward a particular receiver or receivers over a channel. Hence, e-mail messages are assumed to be encoded into the ASCII code, sent via the e-mail network to receivers, and decoded at the receiving end into something like a mirror images of themselves.In
R. W. Janney
real e-mail practice, however, this type of sending (in which (1) 'I' send 'my' message to 'you') is only one possibility- and sometimes not even the most important one. Others, for example, include: (2) 'I' send 'my' message to several of 'you' simultaneously; (3) 'I' send 'my' message to 'you', and copies to 'them'; (4) 'I' send 'my' message to 'you', and copies to 'them', with comments to 'them' from 'me'; (5) 'I' send copies of'their' message to 'me' to 'you'; (6) 'I' send copies of'my' copies of 'their' messages to each other to 'you', with comments to 'you' from 'me'; (7) 'I' use the reply function to send 'your' message to 'me' back to 'you' with 'my' comments to 'you', simultaneously sending 'your' message to 'me' with 'my' comments to 'you' back to myself. The point here is not that such things cannot be done by regular post or by telefax, but rather that in e-mail they can be done absolutely effortlessly and instantaneously with a simple push of a button, and they are infinitely combinable. Classical notions of senders, codes, channels, and receivers have been criticized increasingly in pragmatics in recent years (cf. Bickhard and Campbell, 1992; Mey, 1993). What has not yet been discussed, however, is that even the legitimacy of the notion of the sender-receiver dyad as the primary unit of e-mail interaction seems somehow questionable at times in the e-mail communication process. The 'I-you' relationship is permanently complicated in e-mail by the potential presence of additional 'he's', 'she's', or 'they's' who can get access to messages and forward copies. As any message sent via e-mail can potentially be forwarded by the receiver to other receivers, and by these to yet others, the sender of an e-mail message can never know exactly how many receivers will receive it. Hence, in the sender role, one can never know exactly who, in this extended sense, one is writing to: the notion of the receiving partner becomes vague. Since, according to enunciation theory (cf. Benveniste, 1971; Rosenbaum, 1994), the enunciating 'I' can hardly be conceptualized without reference to the addressed 'you', the vagueness of the addressee is a permanent underlying problem in e-mail communication. Therefore, in pursuit of explanations of e-mail intimacy, we ought to ask ourselves what this diffuse notion of the 'you'-receiver in e-mail means for the 'I'sender, and how it influences the sender's behavior. One would think that not knowing exactly who could receive one's messages might lead to a certain insecurity, or at least to a certain caution in matters of e-mail intimacy. Paradoxically, however, it does not seem to do this. The potential vagueness of the other, on the contrary, seems to encourage some e-mail users to indulge in curious forms of exhibitionism. The Receiver Role
If the concept of the receiver is potentially vague from the sender's standpoint in email, the concept of the sender is o~en no less vague from the receiver's standpoint. First, strictly speaking, we do not literally receive e-mail messages ourselves; our emailboxes do this. With the help of our PC's, we then go metaphorically 'into' our mailboxes, selecting the messages that we want to receive from the list of messages in the mailbox when we enter the system. There is hence an interesting sense in which, as receivers, we actually select our senders rather than being selected by them, as traditional communication models would have it. But we are greatly disadvantaged in distinguishing between categories of senders (friends, colleagues, professional interest groups, junk mailers, anonymous third parties, and so forth) by the paucity of information in the mail display on a PC monitor. Unlike in regular mail, where senders can be categorized relatively effortlessly just by
E-mail and Intimacy
glancing at the sizes, shapes, and colors of envelopes, and by looking at stamps, postal marks, address formats, and so forth, in e-mail, assigning senders to categories involves reading and interpreting a good deal of digitalized information on the monitor screen (e.g., the e-mail address, message header, size of the file, etc.). It is sometimes necessary first to read the opening of a message in order to assign its sender to a particular category. Moreover, from a new mail display on a PC monitor, it is not always immediately evident whether messages are originals, copies, or replies to previous messages, and it is not always clear whether they are intended for oneself in particular, or for a wider audience. Together, these characteristics of the e-mail display system tend to blur our sense of which messages might be worth selecting for reception and which might not be. THE
E-mail users naturally communicate not only with each other when sending and receiving e-mail messages, but also, of necessity, with the e-mail system itself, via their PC's. Interaction between the user and the system, I think, offers further clues to explaining e-mail intimacy. As users of e-mail, we tend constantly to vacillate between two rather different role-alignments with respect to the e-mail system: either (1) we tend to selflessly serve the system, unreflectingly carrying out our information processing duties as metaphorical extensions of the n e t w o r k - as 'willing nodes', so to speak - or (2) we manipulate and exploit the possibilities of the network, regarding the system as a sort of metaphorical extension of ourselves and our interests. In the first role - that of the node in the n e t w o r k - the user is analogous, in a way, to a neuron in a central nervous system, or to an ant in an unimaginably huge, active anthill, swarming with actors performing individually senseless activities that organize themselves into socially useful forms. There are colleagues, for example, who come out of their offices in the morning, after processing the night's e-mail, proud of having already worked for two hours (I am indebted to Jacob Mey for this observation). In this role, they are not only users but also servants of the e-mail network. In the second role - where the network is regarded more as an extension of the self the user is somewhat like a post-paleolithic nomad, roaming about a self-constructed cybernetic environment, who hunts for information and gathers partners much as our stone-age ancestors once roamed about the earth hunting and gathering food. There are colleagues, for example, who are impassioned players of INTERNET games, subscribers to lists, appreciators of flame wars, forwarders of entertaining messages, senders of Christmas greetings with pictures drawn in ASCII characters, and so forth. In this role, they are not only users but also explorers and manipulators of the network, and their work takes on some of the characteristics of play.
The contrast between these two basically different styles of interacting with the email system, I would like to suggest, carries over into our relationships with our partners. In the latter role, in particular, we are sometimes tempted, when playing with our possibilities for manipulating and exploiting the system, to regard our partners indirectly as extensions of our own personal interests or desires. This, I think, might be the underlying attitude in many instances of e-mail intimacy: the assumption that the partner is a sort of extension of the self- someone called into being, in a sense, by the decision to communicate with him or her.
R. W. Janney
The final clue to unraveling the mystery of e-mail intimacy may well be in the relationship between the user and the PC Monitor. Following a line of reasoning used by Eco (1984), PC monitors can be compared with mirrors. Both produce virtual images, or images that tend to be perceived as appearing somehow 'inside' or 'behind' the glass, although the projecting surface per se has no 'inside'. Mirrors and PC monitors are threshold phenomena. The monitor is in a very literal sense an interface a type of third face between the e-mail sender and receiver- where the users' respective egos, projected on the screen as the messages that they send each other or select from each other for reception, become social egos of a special cybernetic type. Ever since Lacan's (1953) discussion of the mirror stage in child development, it has been known that experience with mirrors involves imagination and projection. The child at first mistakes the image in the mirror for reality, then recognizes that it is only an image, and finally realizes that it is his or her own image. The time when the reflected image is recognized as the 'self' is an important time in the child's social development, marking the birth of its so-called 'social ego'. It is the first step toward imagining different possible projected social selves, and the first step toward imagining others' thoughts, feelings, and desires. Ironically, adult e-mail users seem to go through somewhat similar stages in their experience with e-mail messages on PC monitors. They begin by imagining e-mail messages as 'real' documents that have somehow literally traveled through space and time by satellite from some place on earth into their PC terminals; then they recognize that e-mail messages are short-lived, virtual things in the dynamic life of the network; and finally they realize that the only reason why messages appear on their PC screens at all is because they themselves decide to make this happen. When an e-mail message on a PC monitor is recognized by the receiver as something that he or she has 'constructed' (by deciding to receive it), this marks the emergence of what we might call the 'social cybernetic ego'. This stage corresponds roughly with the time when users begin regarding the e-mail network as an extension of themselves, stop thinking much about secondary receivers of their own messages, and stop wondering about the 'real' identities of senders of messages that they have decided to receive. When this stage is reached, it becomes easy to imagine partners as virtual partners, and to imagine e-mail interaction as a kind of virtual interaction taking place metaphorically 'inside' the computer. In imagining this, the user, rather like Alice in Lewis Carroll's Through the Looldng-Glass, crosses over the threshold into the monitor, to interact with the virtual partners on the other side. Once Alice jumps into the looking-glass room in Lewis Carroll' s novel, her first words are, "Oh, what fun it'll be, when they see me through the glass in here, and can't get at me!" I think this is a rather good explanation of many instances of unexpected e-mail intimacy: e-mail intimacy arises, perhaps, out of the simple illusion that once one has metaphorically stepped 'into the monitor', one is invulnerable. I think that if we regard e-mail intimacy as a type of virtual intimacy- as a type of intimacy with a projected cybernetic extension of the user's 'partner-interests', as opposed to a 'real' partner - we are coming somewhat closer to the true nature of the phenomenon. Being intimate with a virtual partner involves no risks: it is like being
E-mail and Intimacy
intimate with oneself, or with a figure in a video game. There is little to lose by insulting a virtual partner: first, there is probably a good chance that you will never actually meet your living Doppelganger face-to-face; second, as long as the partner remains virtual, the issue of his or her 'personhood' is not relevant; and third, in any case, a virtual partner's reactions can be ignored at will, the way an uninteresting television show can be ignored. Especially the more aggressive forms of e-mail intimacy, like flaming, may be products of the feeling that the e-mail system, and all of the attackable virtual partners metaphorically 'inside' it, are extensions of the attacker' s sels CONCLUSION But where does this leave us, relationally speaking: with a looking-glass e-mail communication system capable of providing almost unlimited opportunities for intimate virtual emotive interaction with phantoms? Or, even worse, only with ourselves? Thirty years ago, McLuhan (1964:102-103) said that when information moves at the speed of the central nervous system, we are confronted with the obsolescence of all earlier forms of psychic and social adjustment. Our experience with e-mail sometimes seems to confirm this. McLuhan was deeply ambivalent about the social and psychological implications of electronic communication, but his work was always characterized by a strong hope which we see re-emerging again now, after three decades, in today's cognitive technology movement - that ways would eventually be found to steer electronic technology in socially unifying and humanly satisfying directions. He reasoned that "when we have achieved ... world-wide fragmentation, it is not unnatural to think about ... world-wide integration," and he dreamed of the possibility of some day achieving a balance between technology and experience that would, in his words, "raise our communal lives to the level of a world-wide consensus" (1964:106). I think that the emotive problems discussed above are not problems of the e-mail system, but problems of its users. A virtual world does not neccesarily have to be an aggressive world, and in the long run, intimacy, however virtual, can hardly help but be at least as integrative and conducive to global consensus as our other relational alternatives. Perhaps the problem is only that e-mail users are still preoccupied with playing with the e-mail system, and with playing with each other with the system. When they have finished playing, it will be time to sit down and start talking seriously about intimacy. POSTSCRIPT Somewhere between the beginning of this paper and the end, I received an air-mail letter, partly typed and partly hand-written, from my friend Yuri Kite at the Canadian Academy in Kobe, Japan, thanking me for my concern during the days following the earthquake that devastated that beautiful city at 5:47 A.M. on Tuesday, January 17th, 1995. She had finally given the pendulum of the antique clock on the wall of her office a nudge to restart it. It was February 6th, the water and gas had just been turned back on, and half of the students had returned. Most of the letter was a printout of e-mail messages that had been sent from the Academy during the days following the earthquake, and the remaining last half-page was the handwritten message.
R. W. Janney
The world had indeed seemed very small during the days after the earthquake in Kobe, and the e-mail immediacy had been very high. As I opened Yuri's letter (it was naturally in a sealed envelope, with stamps, postmarks, smudges, and the rest of the ritually significant relational icons), the thought arose that beyond the pathological fascination of some e-mail users with their possibilities for shaping virtual realities in their PC monitors, quite another global e-mail connection exists, which is very direct, human, non-egocentric, and intimate. It is the connection between friends. And I suddenly regretted not having realized the full strength of this connection as I had read the earlier diary reports of my Israeli colleague during the Gulf War. His status in that particular global e-mail event had been absolutely clear; mine too. He had been talking to his friends, and the rest of us had been listening in. One must really almost apologize for such an invasion of intimacy. The real truth of e-mail intimacy, I believe, lies somewhere between the poles of the self, the system, and the other. In e-mail, we can categorize our partners at all times in terms of any of these: they can be creations of our choices, ghosts in the monitor, or simply our friends. And it is possible that a certain helpless cynicism results from having such alternatives. On the other hand, it is also possible that, if we learn to get along with such possibilities, we do not have to be victims of them. Getting ready to do a last proofreading of this paper before sending it (too late, sorry) to the Editors in Hong Kong, I looked again at Yuri's handwriting from Kobe, and what I was trying to say in this postscript became clear to me. There were still telephone party-lines in the country in Montana when I was young. Two rings meant the family in the next valley, three rings meant our neighbors, and four rings meant us. And everybody heard and knew everything. The rest was only empathy, sympathy, understanding, and old-fashioned Ferris County discretion. And now, with the advent of e-mail, the party-line intimacy is only a little bigger. We've got to get used to it. REFERENCES
Arndt, Horst, and Richard W. Janney, 1987. InterGrammar: Toward a unified model of verbal, prosodic, and kinesic choices in speech. Berlin/New York/Amsterdam: Mouton de Gruyter. Arndt, Horst, and Richard W. Janney, 1991. Verbal, prosodic, and kinesic emotiee contrasts in speech. Journal ofPragmatics 15:521-549. Benveniste, Emile, 1971. Problems in general linguistics. Coral Gables, FL: University of Miami Press. Bickhard, Mark H., and Robert L. Campbell, 1992. Some foundational questions concerning language studies. Journal of Pragmatics 17: 401-433. Borchers, Detlef, 1995. Redeschlacht ohne Pardon. Die Zeit 3: 74. Caffi, Claudia, 1986. Writing letters. In: Jorgen Dines Johansen & Harly Sonne, eds., Pragmatics and linguistics. Festschri~ for Jacob L. Mey, 49-57. Odense: Odense University Press. Caffi, Claudia, and Richard W. Janney, 1994. Toward a pragmatics of emotive communication. Journal of Pragmatics 22: 325-373. Eco, Umberto, 1984. Semiotics and the philosophy of language. London: Macmillan. Flusser, Vilrm, 1989. Die Schriff. G/Sttingen: Immatrix. Lacan, Jacques, 1953. Le srminaire de J. Lacan. Paris: Seuil.
E-mail and Intimacy
Malinowski, Bronislaw, 1923. The problem of meaning in primitive languages. In: C.K. Ogden and I.A. Richards, The meaning of meaning, 296-336. London: Routledge & Kegan Paul. McLuhan, Marshall, 1964. Understanding media: The extensions of man. New York: McGraw-Hill. Mey, Jacob L., 1993. Pragmatics: An Introduction. Oxford: Blackwell. Rosenbaum, Bent, 1994. Passion and enunciation. Aarhus: Center for Semiotic Research (mimeographed). Violi, Patrizia, 1983. Letters as written interaction. In: Valentina D'Urso and Paolo Leonardi, eds., Discourse analysis and natural rhetorics, 213-219. Padova: Clueb. Watzlawick, Paul, Janet Helmick Beavin, and Don D. Jackson, 1967. Pragmatics of human communication: A study of interactional patterns, pathologies, and paradoxes. New York: Norton.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 12 COMMUNICATION IMPEDANCE: TOUCHSTONE FOR COGNITIVE TECHNOLOGY Robert G. Eisenhardt SENSCI Corporation Alexandria, VA 22314, USA David C. Littman* Advanced Intelligent Technologies, Ltd. Burke, VA 22015-4737, USA dlittman@cs, gmu. edu
INTRODUCTION: MOTIVATION AND GOALS One of the key goals of the theoretical and engineering disciplines of Cognitive Technology must be to make it possible to create, efficiently and reliably, complex systems of humans and machines who work effectively together toward common goals. Unfortunately, the mere existence of the need to create a discipline of Cognitive Technology suggests--rightly, we believe--that humans and machines just do not naturally work together well. For all our personal, social, and political difficulties, the same cannot be said for humans: Humans work together in extraordinary ways and are absolutely marvelous at overcoming what would otherwise be fatal blows to joint problem solving. In essence, people are very good at detecting potential hazards to problem solving and correcting them before they make it impossible to achieve the goal toward which the problem solving was directed. For complex human-machine systems, especially in life critical applications, it is absolutely necessary that: 1) humans be able to communicate effectively with each other, 2) machines be able to communicate effectively with other machines and, 3) humans and machines be able to communicate effectively with each other. If misunderstandings occur in any of these communication channels, the potential for disaster can be high. There are many tragic examples of such failures in communication. Consider: About 10 years ago, a DC-10 taking off from Chicago's O'Hare airport lost its right engine shortly after lift-off. The plane crashed with loss of all passengers and crew. Investigation of the accident revealed that the crew eventually deduced the nature of the trouble and had begun to take corrective action, but too late. The investigation also showed that had the crew (or the Corresponding author.
R. G. Eisenhardt and D. C. Littman system) recognized the problem in time, the plane, passengers and crew were savable. All necessary data were available but the crew knew only that their model of the world was no longer valid. None of the available data were used to generate information describing the plane's change in state. Technology is available today to prevent that from happening, but it is not in place in any aircratt. 9 A similar incident occurred in 1987 in Detroit when the crew of a Boeing 727 took off with the flaps up. All on board were killed. Again, ample data were available to prevent the tragedy, but it was precipitated because not just the pilot, but ALL crew members' internal models of the plane had the flaps down, and all warnings to the contrary were ignored. The point is made that while it is the crew's responsibility to control the plane, the plane could have been equipped with intelligent systems that would have shut the plane down when it recognized the crew was unaware of the deteriorating situation. 9 A third case involved the Air Force Demonstration Team during a practice session a few years ago. The group was performing a split 'S" maneuver beginning at a low altitude. Unfortunately the lead pilot miscalculated and flew his plane into the ground followed almost immediately by his three wingmen in a tight fixed formation. An easily preventable accident where none of the pilots had a correct mental picture of reality even though sufficient data were available in the plane to correct it.
9 A final example is that of the USS STARK where misinterpretation of sensor data combined with inadequate protocols caused a command to be issued resulting in the destruction of a commercial airliner. While these events constitute significant tragedies, we hasten to point out that, in some respects, it is surprising that there are not more of them, considering the complexity of tasks humans are called on to perform daily. Our explanation for the relatively small number of catastrophic events, given the huge number of opportunities for them, is quite simple: One of the most important aspects of communication between and among humans and between humans and their environment in critical circumstances is the way they are able to identify and repair potential communication failures. Indeed, we believe that, the next generation of complex human-machine systems will have to be able to employ many of the strategies humans use to identify and repair communication failures. Moreover, the need for such capabilities will only increase as human-machine systems become more complex and expose different, and perhaps new, forms of communication failure. This implies that design tools used to develop the next generation of complex human-machine systems must be able to 1) foresee potentials for communication failure and, 2) recommend design alternatives to prevent such failures. Recently there has been a strong drive to develop engineering technologies for robust, complex, problem solving systems comprised of humans and machines 1. Such technologies are based on design methods and heuristics that, while effective to a For instance, Air Force, Navy, ARPA, and NIST have significant initiatives in the arena of intelligent tools to support design of complex systems.
Communication Impedance
degree, do not constitute a scalable and replicable design technology for constructing multi-agent systems able to avoid, or detect and repair, communication failures before accidents result. Current cognitive technologies are not detailed enough to support simulation of potential designs with the goal of detecting where, when, and why communication failures might occur, and how they might be prevented by altering the system design. This is so because, in large part, current system design tools do not reason about the effects of the different kinds of knowledge representations that humans use when engaging in complex problem solving. This, in turn, is because we do not yet clearly understand the types of knowledge structures humans bring to bear in complex, rapid and cooperative problem solving situations to the same degree we understand the types of knowledge structures employed in solving, for example, story problems involving time and motion. In short, we do not yet have a sufficient theoretical base to create robust engineering methods based in Cognitive Technology. In studies of individual problem solving--particularly in the area of acquisition of expertise--much research has focused on how knowledge representations used during early phases of individual skill acquisition become recoded, or "compiled", into rulelike procedural representations. (cf. work by Anderson, Kieras, Lesgold, Mitchell, Rouse, etc.) Mental models for spatial mapping and structure problems have been investigated also with the intent of discovering how humans make inferences about spatial relations and, especially, how they incorporate new constraints on spatial relations into existing mental models. However, little work has addressed the role of these different types of knowledge structures in group problem solving. Even less work has focused on their role in group problem solving tasks requiring different members to use different types of knowledge representations and communicate to other group members. We believe the absence of research on this topic has left complex systems designers without design tools necessary to develop systems capable of preventing or correcting for communication impedance. Our ultimate goal in this enterprise is, therefore, to develop a design technology consisting of 1) a theory of the types of communication failures that affect agents in complex problem solving, 2) the knowledge that humans do--and machines could and should--bring to bear in problem solving situations prone to communication failure and, 3) design tools to support rapid design, evaluation, and construction of robust, complex, intelligent human-machine systems that can: a) prevent communication failures or, b) identify communication failures and repair them before they become tragic. In this paper, we sketch the outlines of what we think a Theory of Communication Impedance might look like, how we intend to build it, and why we think it is a critical component of a discipline of Cognitive Technology. TYPES OF COMMUNICATION IMPEDANCE We have defined several categories of sources of communication impedance, each of which we believe poses significant challenges to the discipline of Cognitive Technology, if it is ever to evolve to the point where it serves as the basis for a robust
R. G. Eisenhardt and D. C. Littman
engineering methodology. In this section, we briefly describe our first pass at a typology of sources of CI. Impedance Associated with Mental Models. In the area of mental models, we have identified the following four fundamental causes of communication impedance: 9 Dysmodal Mental Model--This term, coined for this paper, refers to mental models that suffer from structural defects. One structural defect we call false
closure. In false closure, adjacent steps in an inference chain 'haake sense" locally. However, their effects on the overall inference chain are, unfortunately, destructive. For example, a novice program manager may conclude that it is possible to deliver a large software job on time, but with some additional expense required to put '~a few more people" on the job to %peed up" progress. This makes sense because if five people can do the job in a month, surely 10 people can do the same job in two weeks. Unfortunately, this appeal to the infamous Mythical Man Month, popularized by Fred Brooks, makes sense locally. As Brooks shows, however, in the overall context of developing and executing a software development plan, it simply does not have the effects one might expect from the naive intuition that 'If a little is good, more is better". More extreme examples come from schizophrenics in crisis whose speech makes reasonable sense in adjacent topic shifts but very little when the entire path of topic shifts is considered. 9 Mismatched Mental Model Contents---(A) Two or more communicators
have mental models of the same objects and/or processes, but there are differences in the contents or structure of the mental models that can lead to communication impedance and, possibly, failure. (B) One individual has two (or more) related models wherein any or all of them may be incomplete, erroneous, and of different levels of refinement. When some action is required to use some of these models in combination, the combined model is usually also incomplete, erroneous, or incorrect leading to improper decisions and subsequent action. This class of impedance is usually the root cause of human errors associated with interpreting her/his environment. Inference Rules f o r Mental M o d e l s - - T w o or more communicators may have identical mental models but different rules for operating on them.
9 Mismatched
9 Mismatched Referents--Two or more communicators may have different
referents for the same symbol although semantic meanings of the symbol are the same. 9 Mismatched Semantics--Two or more communicators may have different
meanings for the same symbol. Impedance Associated with Communication Acts. In the area of communication acts, we have identified the following three types of impedance sources: 9 F r a m i n g m A communicator may require different framing (from others) for a
specific communication to permit her/him to refer appropriately to the same mental model as the originator. 9 Conflicting Commands or Data--Errors can arise both within a medium,
such as (near) simultaneous conflicting commands e.g., 'turn left" vs. 'turn
Communication Impedance
right", and between mediums such as the perceived relative motion from the observer of trains in a station or jets on a carrier deck. 9 Recipient's AttentionmThe recipient may be preoccupied with other tasks of which the communicator is unaware.
Impedance Associated with Contextual Factors. In addition to mental models and communication acts, several other factors can affect communication deleteriously: noise; stressmtypically caused by task demands; physical state such as fatigue; and emotions, such as frustration or fear. Of these categories we hypothesize that model dissimilarities between (among) communicators and between communicators and the environment are the major source of communication impedance and will serve as our major focus of attention throughout the study.
Communication Time-Bandwidth Limit
~ Allocationto Substantive
B Communication
Model Similarity Figure 1 Impact of Model Dissimilarity on Communication Because of its importance as a primary source, communication impedance associated with mental models is explored in more detail. Figure 1 represents the relationship between mental model dissimilarity and communication bandwidth. The term 'bommunication bandwidth" is used loosely for lack of a good definition of model bandwidth, a subject of our research. Note that communication is divided into two categories competing for available bandwidth. These are substantive communications, which convey conditions associated with the model, and meta-communications, which convey data related to the communication process itself. In the case of mental model dissimilarity, metacommunications are intended to bring the communicators' models into sufficient alignment to permit meaningful substantive communications to proceed. Note also in Figure 1 that at some point mental model dissimilarity is so great (D, E) it would appear impossible to communicate since channel bandwidth is insufficient to support the necessary meta-communication. When this occurs, substantial model building through 'off-line' meta-communication is required before meaningful dialogue
R. G. Eisenhardt and D. C. Littman
can occur. A famous, absolutely paradigmatic, example of this condition is illustrated by the comedy routine of Abbot and Costello, "Who's on First", where one side believes they are engaged in substantive communication while the other side believes they are engaged in a mixture of substantive and meta-communication. Sometimes, communication tends to follow the closed cycle shown in Figure 1 wherein some model disparity between communicators exists at point 'A" in a dialogue. As the dialogue progresses the models held by each communicator are revealed to be farther apart in actuality, but the disparity is unknown to the parties; however, the dialogue will tend to continue until, at some point (B), one of the parties recognizes the model discrepancy and moves the dialogue to the meta level (C) and the dialogue stays there in an attempt to bring the models into sync and, if successful, the dialogue resumes at point "A." This method of correcting the problem depends on many capabilities but foremost is an ability to understand the current situation, determine the interpretations of the situation held by all communicators, and to know if either situation and/or interpretations are changing and why. This is one of the essential aspects of the concepts of situational awareness and assessment. An example of this process is the game of Bridge. Here, partners have access to data (i.e., visual) with which to construct models of the world but initially they have no knowledge of two critical pieces, their opponents' and their partner's hands. The bidding process is designed to provide sufficient data for each of the four players to construct models of the other hands. But these models are at best flawed because the bidding signals and number of available bids are insufficient to construct fully correct models. To help overcome this problem, bidders set up elaborate signals or conventions for transmission of data between them. These signals serve two purposes. First they convey more data per bid than the normal bidding conventions convey and, second, they are intended to confuse opponents: This process of setting up conventions is a form of meta-communication between the partners. Sometimes a player will get a new partner and simply presume the partner follows the same conventions as he/she does. It is only after some substantive bidding communication that one of them may realize they are using different conventions or models of bidding. Thus, the substantive communication has produced nothing but disinformation in the minds of all players until recognized and corrected through off-line or real time meta-communication. This process works at the human level because the communicators are both intelligent and able to modify their mental models on-the-fly. In man-machine cooperation the problems are different but just as real and serious. In a machine with fixed assets, a copier for instance, the machine's models of the world and of the user are limited to those built in by the designer; any coping strategies the machine possesses for off-nominal users must also be built in by the designer. If either of those models is incorrect for a specific user, or the user's model of the copier is not correct (naive, incomplete, or wrong) the communication between man and machine will probably be difficult--an experience many of us have had. As the development of enabling technologies for situated computing progresses, it is inevitable that machines will have sensors and effectors they can use to sample the information required to detect and accommodate unanticipated events. This will have to be the case for most military systems. But do we have the knowledge and tools to create a solution that will permit us to build systems with these attributes? Not yet.
Communication Impedance
A CUT AT A SOLUTION We believe that successful approaches to gaining wide acceptance for Cognitive Technology will have to be based in a combination of empirical studies and intelligent design environments for systems designers. Briefly, we are following a two-phase strategy in our efforts to develop an engineering methodology based on the Theory of Communication Impedance: Phase I
Study and build a knowledge base about how people detect (or why they fail to detect) and correct communication errors resulting from sources of impedance and, from this, develop a Theory of Communication Impedance.
Phase II
design, construct, and test tools, including workbenches, to assist designers of intelligent machines to create the abilities to identify and overcome errors caused by communication impedance.
We plan to gather data about mental models by developing test programs that emphasize the importance of models in communication. We intend also to include studies of the human's ability to: 1) correctly perceive models of the environment and, 2) correctly identify machines' implicit and explicit models of tasks, both very important issues in both military and civilian computing infrastructures. Our development of a Theory of Communication Impedance will continue to codify relationships that enable outcome prediction for a given set of circumstances (e.g., a specific task domain, such as Air Traffic Control). In particular, the priorities for our early focus is on: 1. sensitivity of model dissimilarities, mismatched inference rules, mismatched referents, and mismatched semantics to the probability, and types of miscommunications for different model structures, 2. sensitivity of communication effectiveness to communication acts, 3. sensitivity of communication effectiveness to contextual factors. As part of this activity, we are identifying categories of mental models and determine how and when use of these models is appropriate as well as when inappropriate use can lead to communication impedance. We intend to identify classes of severity of impedance incidents and relate them to how humans identify and repair them. For example, one member of an air traffic control team may have an incorrect mental model of the position of an aircratt that a colleague is about to hand off. This may lead the receiving controller to incorrectly issue flight directives to another aircraft in preparation for accepting the hand-off craft. If one of the other controllers, or the pilot, does not detect the error, an accident may occur. As we indicated before, our ultimate goal is to develop an instantiation of Cognitive Technology that will assist designers of intelligent machines to create the abilities to identify and overcome errors caused by communication impedance. For example, the Navy, in its conceptualization of the Surface Combatant for the 21 st century will likely need to perform studies of the tradeoffs incurred by partitioning functionality to hardware, sottware, and humans: A robust Theory of Communication Impedance would materially contribute to this activity.
R. G. Eisenhardt and D. C, Littman
A rough view of the kind of intelligent design environment we propose to construct is shown in Figure 2. As illustrated, the CIWB consists of several types of knowledge components. It contains knowledge bases for encoding data about types of communication impedance, types of knowledge structures used by humans in problem solving tasks, and task types instantiated for a specific domain, such as Air Traffic Control. The CIWB allows the human system designer to generate a system design along with a specification of the tasks to be performed, and the knowledge structures and problem solving strategies used to perform the tasks. The CIWB is then used to perform a qualitative simulation of the proposed design with the specific tasks and knowledge structures, and identifies potential sources of communication impedance as the simulation unfolds. The design search space heuristics are used to home in on a system design minimizing communication impedance either by eliminating its sources or by providing mechanisms for handling it when it arises. To see that results of the proposed effort are prepared for and made available to as many people as possible in the human-machine interface business, we propose, ultimately, to develop the CIWB to assist designers of man-machine systems to identify potential sources of communication impedance for their intended applications and to be able to replicate their effects on various communication tasks. Given an understanding of particular types of problems, they may select preconfigured diagnostic strategies for the impedances, specify or select corrective strategies, and then simulate effects of the strategies on proposed system designs. Parameterized
nstantiated-~ System | Design . J
Domain Knowledge
Design InstantiationRules Communication
Typesof Communication Impedance
Desert Search Space
Control Rules
Figure 2 General Form of The Communication Impedance Work Bench functional code modules for each of the diagnostic and corrective strategies would be included in the CIWB. These modules then could be selected and included in working,
Communication Impedance
delivered systems. In essence, we intend to build part of the infrastructure that may enable significant aspects of Cognitive Technology Engineering. CONCLUDING REMARK The discipline of Cognitive Technology will need system development tools to make it more than a theoretical curiosity or a book of design heuristics, such as that developed by many government agencies as guidelines for form and content of Graphical User Interfaces. We suggest that our work on developing a Theory of Communication Impedance may provide a useful forcing function for the creation of a practical Cognitive Technology. REFERENCES Littman, David C., B. Gadget and D. Antic, 1994. The dot-loop architecture: A virtual reality based system for aircraft design, operation, and training. Proceedings of International Conference on Aviation Systems. Anaheim, CA. September 1994. Littman, David C., 1991. Seamless knowledge-based design environments. Proceedings of AAAI91 Workshop on Automated Software Design. USC/Information Sciences Institute Technical Report #RS-91-287. Marina del Rey, Calif., 1991. Littman, David C., 1989. Constructing expert systems as building mental models Or Toward a cognitive ontology for expert systems. In: K. Morik, ed., Knowledge Representation and Knowledge Organisation in Machine Learning, 88-106. Berlin: Springer-Verlag.
This Page Intentionally Left Blank
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Department of Computer Science City University ofHong Kong [email protected]
ABSTRACT The direct interaction of computers with humans is of relevance to cognitive technology. However, the indirect influence on human cognitive development through technology changing social structures and social institutions can be as great or even greater. Education is one such area. Humans develop by interacting with their environment and by reflecting on the nature of that interaction. In its broadest sense education is continuous and is the acquisition of knowledge, wisdom and skills. It has become an industry because of the needs of society to efficiently certify, ration and control the distribution of human knowledge and skills. The acquisition of skills, knowledge and wisdom does not require the industrial structure of the current education industry and these structures may even inhibit an individual's development. It is argued that new technology permits the reorganisation of tertiary institutions. The close interlinking of the delivery of education with the certification of individuals is no longer mandated by economics and changes are possible in the physical location and methods involved in the delivery of education. This paper explores some possible changes in the education industry and in the likely effects on individuals. THE ROLE OF TERTIARY EDUCATION
The tertiary education industry is an expensive but vital component of all industrial societies. Like all industries it is subject to change in the face of new technologies and in particular in the economics of the industry. Here I explore a likely scenario derived from the economic and technological imperatives driving education systems evolution. It is not claimed that the scenario will be the only structure but it is claimed that it is likely that much tertiary education will evolve in these directions over the next two to three decades. If new technologies have significant cost advantages in providing certain services then it is almost inevitable that these changes will happen. It is claimed that many of the services tertiary institutions provide can be provided in more economic and more effective ways without compromising the quality of the education and training of the students.
K. Cox
While we hope that education is broad and enriching, the major reason why society tolerates and supports tertiary education is to support the economic activities of society. The major tasks of universities in the past (Hernes, 1993) have been to transmit learning and knowledge. Students in tertiary education still spend most of their time preparing for future careers and work. Except for the fortunate few, education is not an end in itself but is done so that students can better perform tasks within the broader society. This goal is important and the increasing demands of society for more accountability, which institutions throughout the world are experiencing, will ensure that tertiary institutions will continue in this role and continue to mainly educate students for work in industry in its broadest terms. However, the great research universities, particularly of Germany and the USA, have added to the fundamental role, of the transmission of knowledge, the generation of knowledge and the application of knowledge to society. Universities are principally ranked by their research, not by the transmission of learning and knowledge. Evidence for this is in the way league tables of Universities in the UK, Australia and the USA are compiled and scored. At the level of individual staff members there are strong correlations between promotion and research output (Over, 1993) and negative correlations between perceived teaching committment and promotion. Research now and in the future will continue to drive much of the development of tertiary institutions. Opposed to this trend are the increasing demand for tertiary education to become more accountable to students and society and the need to improve the quality of the transmission of knowledge process. This paper will not discuss the conflict and reciprocity between these roles but will concentrate on the transmission of knowledge and the marketing and selling of that service. THE STRUCTURE OF TERTIARY INSTITUTIONS Given that an important role of tertiary institutions is to educate students so that they can perform professional work, and given that institutions will be increasingly judged by their abilities to achieve this task, we need to examine how this can best be done. As part of this role tertiary institutions certify that students are competent at certain activities, have received an education and have benefited from it. How to achieve this is a major preoccupation of most institutions. Here I examine trends and indicators of how these activities are likely to change due to economics and new technologies. First we must examine the way education is currently structured and delivered. In almost all institutions there are courses of study. These courses have specific occupation targets. There are courses for nurses, courses for programmers, courses for engineers and courses for teachers. Courses are broken into modules for instruction. Modules cover some part -of the body of knowledge required for the overall course of instruction. Modules themselves can vary but they typically have a standard length of time and are almost all identical in structure. A module consists of a body of knowledge packaged together with higher level skills required to apply that knowledge. Thus students are meant to not only know things but they are meant to be able to apply the knowledge and to see its relevance in a broader context. Module descriptions are standardised and consist of statements of aims, objectives, syllabuses and evaluation methods. A course of study puts together
Technology Tertiary Education
modules of knowledge and techniques to form a coherent whole which prepares students for some future job whether it be a researcher, dentist, teacher, manager or technical analyst. Students study these modules and a certification is given that they have achieved the objectives of the modules. Typically, modules are delivered in way that is a close analogy to a factory with batch processing. A pertinent analogy is the production of beer in an old style brewery. Here a batch of beer is brewed in large vats and allowed to ferment for a given period of time. When the brew is finished then the whole batch is bottled. In our analogy each final bottle of beer is equivalent to a student completing a module. The analogy operates at the level of the structure of the course and the progression of students through a course. In structure there are striking similarities. All students get the same module at the same time. Students are handled in batches. This is quite different from an assembly line or on demand manufacturing. Batch processing in manufacturing is an old technique and normally only used when the economics and technology do not allow more continuous and flexible processing. It is contended that this basic model of courses consisting of modules of knowledge will remain. What will change is be the way knowledge is presented to students and the transition from batch processing of students towards a more flexible structure (Darby, 1994). The current education systems are characterised by the need for efficient assessment of students through examinations, by the close coupling of assessment with instruction and by the delivery of knowledge through lectures and tutorials. The current system is a cost effective educational approach as it allows economies of scale and processing. Most institutions know that their courses have to have many students. Most institutions know that they have to give examination assessment at the same time. Most institutions know that they have to divide knowledge into fixed time length modules so that all knowledge is compartmentalised into fixed sized chunks. Courses with few students and courses with different modes of assessment are rare and expensive to operate. The main reasons for this restrictive and constrained system are economic and technological, not educational. A N O T H E R POSSIBLE STRUCTURE FOR A DELIVERY OF MODULES.
Using the manufacturing analogy, another model for the structuring of education is as a continuous process with just on time delivery of instruction (Perelman, 1993). It must be stressed again that this analogy is an analogy for structure, not process. We should not take the analogy too far and imply that all students are all standardised identical components. The structure and organisation of their courses are standardised. What this paper is suggesting is that technology may be able to make this process more flexible and attuned to both the students and to the subject matter. In a just on time system a student would do a module at any convenient time at their own rate. A course would still Consist of a set of modules. These modules do not have to be of fixed size or fixed format. Students are not required to do modules in lockstep. From the point of view of the student there are obvious advantages. The students can fit the course more to their own needs and abilities. Students who need more time, could take more time. Students who wished to work part-time could work part-time. From the point of view of the institution this could also work well. There would be no longer surges of activities taking place at short intervals. Examination processing,
K. Cox
student enrolments, demands for lecture space, demands for library resources, demands for computing resources could be better controlled and managed. Students would enter the system continuously and the number of students admitted would be matched to the resources available. Courses and modules could change continuously with a more rapid adjustment to problems of quality and relevance. Why do we continue to use the old batch processing model? Why does almost every tertiary institution in the world have groups of students moving as one through the system7 The reasons are economic. It is too expensive to put on lectures on demand for an individual student. It is too expensive to set up an examination for a single student when we can service thousands with little more effort. The information systems in our institutions are not capable of handling the new organisation. The existing structures, both physical and organisation, are set up to handle batches and cannot cope with continuous education. It is the contention here that the flexible manufacturing model is now technically and economically possible for tertiary education. Moreover, it is contended that imperatives of economics and the market for this structure are such that given a flee market this will become the dominant structure in the future. I now examine indicators of this approach, consider objections to the approach, discuss possible organisational implications and finally outline the possible cognitive effects on students and staff. CURRENT INDICATORS FOR FLEXIBLE EDUCATION There is an increasing interest in the distance education model as a way of providing education for large numbers of people. The English Open University has 50,000 students, Australia has just started an equivalent university structure, the Open Learning Institute in Hong Kong has over 10,000 active students. China is said to have 2,000,000 distance education students. As well as these formal courses a major growth industry is the provision of training courses for professionals and in-house training in organisations. Such training courses are sometimes done on a just in time basis. Most degrees by thesis are conducted flexibly with students entering the programs at different times. The distance education model is however only a step along the way to flexible education. It does not rely on the lecture/tutorial model but most of these systems are still structurally organised with batches of students entering in year groups, with examinations and assessment happening at fixed periods. Distance education is still fundamentally a batch system; the method of delivering of instruction is the main innovation. Even here most delivery is still a translation of the old lecture format to paper or video. Technology is not used in the best way yet but there are signs that it is changing (Laurillard, 1994). It is difficult to generalise on costs and to compare different institutions but distance education does seem to have a significant cost advantage. Muta and Saito (1994) show that the direct costs of the University of the Air of Japan is about half the cost of an equivalent conventional campus-based program and that the economics improve as the system becomes larger. The Open Learning Institute of Hong Kong is similarly less expensive than any of the Universities or Vocational Training Institutes in Hong Kong. Another trend is the franchising of education services (Yorke, 1993). Here education institutions franchise their teaching methods, their name and their practices
Technology Tertiary Education
and sell education away from their campuses. This form of trade in education services seem to be a growing phenomenon. Society is increasingly demanding accountability of education. A major way that this is being realised is the demand for performance based assessment or criterion referenced assessment (Glaser, 1963). The demands in England and the USA for national testing of educational attainments of school children is another striking example. This form of accountability is an attempt by society to know that education teaches students to do things which they could not do before, such as reading, writing and arithmetic. This demand for criterion based assessment is different from the norm based assessment which is the most current practice in almost all universities and which is made easy by the batching of students. These trend: of cheaper delivery of education via distance education techniques, the idea of franchising education services, the demands of society for accountability, criterion based assessment and the increasing debate between the roles of research and teaching could lead to the development of Universities with a different structure made possible by computer and communications technologies. THE NEW STRUCTURE FOR A COURSE OF STUDY For expository purposes let us imagine a possible course structure based upon these ideas. In this structure students are enrolled continuously in courses and modules. A student can start a course or a module at any time. Students will study in a variety of ways but most information will be delivered in a "distance education" mode where progression is based on work previously done and assessment is continuous and criterion referenced. Students take a periodic controlled assessment when they are satisfied with their progress and when they have finished the required material in a module. A course structured this way is similar to current courses and so it is likely to be accepted by staff and students. Students will find it attractive because of the flexibility. Employers will find it attractive because the assessment is criterion referenced. Administrators will like the regularity of the process and the evening out of peaks of demand on facilities. Staff would like the system because it gives them great flexibility in how they deliver instruction and there is the possibility for more individual contact with students. THE NEW UNIVERSITY One of the reasons why this model of a course is almost inevitable is that it gives a great competitive advantage to the well known prestigious universities. The new university structure will consist of distinct parts. There will be the undergraduate and industry instructional components and there will be the research and graduate research component. Already this structure is evident in many schools with distinct graduate schools. The graduate research schools will remain much the same. Their purpose is to gain prestige and status for the University and so assist the selling of the undergraduate instruction. The major structural difference will be the franchising of the instructional courses and the removal of the need for batching of students.
K. Cox
Once the system of instruction is well organised and established there is no reason why it needs to be physically restricted to the local environment. In the same way that the most successful marketing phenomenon of the 80's and 90's has been franchising so the same is likely in education. A second tier university in England is more likely to thrive and prosper running and promoting MIT courses than it is in running and promoting its own courses. It can attract better students from a wider community and it can almost certainly provide the courses at lower cost. The franchising universities' role is to assure quality. This can be done through a rigorous controlled examination regime. The benefits to the franchising university are the large sums of money that will be generated to be used to support the prestige enhancing research and also to develop and construct better instructional modules. The models for marketing and delivery are already with us. It only requires one of the major universities to start the process and others will follow. If this happens then it will occur quickly because of the economic and marketing imperatives. There will be no need for new buildings, no need for massive investment in infrastructure; it only requires reorganisation. The education world of the early part of the next century will then become similar to other industries. It will be dominatedby multi-national universities with good brand names. OBJECTIONS TO THE APPROACH
There is a major political problem with this vision. Education is a sensitive cultural issue and this is one of the reasons why education is heavily subsidised by governments. There are strong cultural reasons against the Coca-Colonisation of universities but there are ways around the brand name and perception problems. Although the brand names may be modified the underlying structure is still likely with appropriate relabelling of courses and local variations added to satisfy the politics. Staff in institutions will almost certainly oppose the changes. Strong unions will attempt to preserve the status quo. Some countries will resist more than others but, because the system is more efficient in resources, staff can share in the benefits through increased salaries and benefits. Opposition attenuates when material benefits are given to opponents. The educational objections are similar to those leveled against distance education, particularly when the education is provided across country boundaries (Woodhouse and Croft, 1993). However, local franchising with good electronic communications between staff and students, may counter many of these objections. If student to student contact is desirable then there is no reason why students doing the same course cannot be organised in small groups. Even though the model supposes a continuous flow of students through the system, there will still be several doing the same work at the same time and these students can be organised in groups. If it is desirable to have face to face meetings rather than electronic video conferencing then appropriate residential courses can be offered. IMPLICATIONS FOR STUDENTS One of the greatest benefits for students from this approach will be the relegation of norm-based assessment to relative insignificance. (Winter, 1993). Individuals will be
Technology Tertiary Education
judged on their performance rather than their performance relative to others. Performance relative to others can only occur in a batch situation or when material remains static and courses are not based on mastery principles. There will still be the opportunity for excellence to be demonstrated, but it is no longer necessary to use student comparisons and ranking as the basis for grades. Although this aspect has not been addressed explicitly in this paper, technology does offer the chance to get away from the teacher 'telling' and towards the student 'experiencing' model of education (Laurillard, 1993). The structural model envisaged here requires the development of modules of this nature and the education systems will not work if the material is in the form of transplanting the classroom teacher lecturing mode of delivery. The technology allows students a choice of materials and choice in the way they receive instruction and they will be able to choose an appropriate method for their style of learning. This form of learning is truly student based with students learning by doing and by interacting with the learning environment presented via the computer. There are many educational advantages to this process. Students get much more immediate feedback, the simulated study environment can be made much closer to reality, students interact and react instead of passively attempting to absorb knowledge. It is difficult and expensive to construct good modules with these desirable characteristics but as the market for modules expands so it becomes worthwhile to finance modules. On the negative side there are cognitive disadvantages. As the simulations become closer to reality, the student is liable to believe that the simulation is the reality. There is little concern that a student will mistake a lecture for reality. In the real situation students will make the necessary transformations to put the lessons into practice. If a simulation becomes too close to reality then the difficulty of making the necessary transformations from simulation to reality may become more difficult. When there are large cognitive distances between situations we can reasonably hypothesise that there are few problems in recognition, but as the cognitive distances become closer, the problems of recognising reality become more difficult. An important part of the student experience is the interaction with other students. The current campus based education systems mean that students are thrown together and serendipity rules. Groups and interactions arise naturally. The same phenomena can occur, perhaps more intensely, in periodic gatherings of students. Electronic meetings and connections will also occur and are an additional avenue for communication. The electronic infrastructure that affords the delivery of instruction also gives a communication channel to both instructors and to other students and it is inevitable that this will be used. Technology can add another channel of communication for people and free them from the tyrannies of distance and time. On the negative side people can perhaps have too much computer based interaction. There is a finite amount of time available to each individual. If this time is increasingly occupied with some of the new forms of interaction, there will be less time for other forms such as conversation. If we spend all our time answering our e-mail then we have less time available to talk to our friends and to discuss issues. If people spend increasing amounts of time interacting with the world through computer screens then it must change the way they think. As the number of hours spent in front of screens increases, the screen becomes our perception of what is real and "reality" becomes
K. Cox
secondary and somehow less "real" because it is less common. Already students are spending (Cox, 1994) ten or more hours per week in front of computer screens. When the changes suggested here in education occur, we can expect screen interaction, along with TV watching, to occupy the major part of each day with unknown and possibly unpredictable side effects. Work and education are likely to become intermixed. Education now occurs as a separate activity from work activity. This is primarily a structural problem. The new model allows work and education to coexist more easily and there will be an increase in the proportion of students who study and work at the same time. The structural impediments that make this difficult and inconvenient are removed. Today's tertiary educated have been socialised and formed through the shared experience of campus life. This life is different from both school and a normal working environment and has an important influence on one's world view. If this stage is removed from many people's lives what effect will it have? Again the implications are unknown except that we know they will be significant and will change the way people view the world. IMPLICATIONS FOR STAFF It is likely that University staff will become even more polarised into researchers and those who deliver education. The delivery of education will remain the main source of income for institutions and demand will be related to prestige and brand names. Because of this, research will be encouraged to add to the prestige of the institution as will direct links with industry. Links with industry not only give prestige and funds but also another source of students for education. Research teaching will concentrate on post-graduate teaching. This is likely to remain similar to the current system with close interaction between research students and staff. Research schools can also gain funds by producing teaching material from their research to be included in the module offerings. The developers of education material will become a specialised area with many of the same skills now required of movie and software producers. Modules will be developed in the same way that movies are now produced with teams of specialists coming together to create a product. With the virtual disappearance of most lectures and formal tutorials the teachers who now interact with students will require considerable social and communication skills in dealing with individual students. The performance qualities that characterise many good lecturers will be less valuable than the social and helping skills. Because less time is spent in lecture preparation and delivery there is more time available for staff to interact with students. It is likely though that the prestige and subject knowledge of these staff will decrease and staff directly involved in student interaction are likely to receive lower salaries. This will then permit more interaction. Quality assurance and the setting and testing of students will become a major activity. Large numbers of staff will be involved in arranging for continuous examinations and testing of students. This will also become a specialised area. A worry of staff is that as the delivery of education becomes more effective and as technology replaces people for some activities, the need and demand for teachers will diminish and teaching jobs will disappear. Fortunately economic history suggests the opposite (Economist, 1995). As the effectiveness of teaching improves so the demand
Technology Tertiary Education
for the product will increase. More people, more tasks and more training and education is the likely result. Particular jobs may change but the net effect is that the total number of jobs increases in the sector in which efficiencies occur. While the number of telephone operators has dropped the total number of people in the telecommunications industry has increased dramatically. Predictions in the USA (Bureau of Labor Statistics) say that the number of teachers will increase over the next few years. As delivery of information becomes more efficient, the gains in efficiency are likely to be balanced by more researchers, more developers of instruction and more people to interact with students. The lessons of history also suggest that those that try to stop the inevitable are the ones to suffer the most. Governments that try to regulate industries to save jobs tend to lose jobs and not create new ones. IMPLICATIONS FOR ORGANISATIONS Universities will become profit making organisations and be listed on the stock markets of the world. The ones that will be successful will be the ones that deliver quality education. Because brand identification is crucial to success, universities will ensure that quality of education is maintained. Research for research sake will flourish, because it is in the interests of universities to maintain their image so that they can sell their education services. Boutique universities will still exist and centres of particular excellence will continue to thrive. Individuals and groups of talented people will still be able to produce and market excellent courses, but there will be a tendency for large organisations to dominate the market. The lesson from the movie industry is that most of the products come from the large organisations but small groups tend to innovate and create new and exciting products. In fact, the incentive for a small group to develop and create good courses will increase, because the rewards for success will become greater. The software industry is another example where individuals and small companies become stars and important players. There is a danger of drab uniformity in delivery of instruction. The solution to this problem is to remove the artificial barriers to competition and innovation such as government regulation. Countries that fear cultural imperialism can counter by fostering and supporting their own industries through appropriate incentives and by making sure that other countries do not impose restrictive barriers to the introduction of their own products. Giving people the freedom to choose and allowing variety to flourish is the best guarantee for innovation and assures variety and choice. CONCLUSION Computing and communication technologies allow instruction to be delivered in different ways to students. This can be made to give a high quality of student experience and to facilitate learning. Because it is possible it does not follow that it is inevitable. However, the technology does integrate and fit with other social and economic trends. This combination of economic imperatives together with the ability to organise along the lines of other successful industrial entities means that it is likely that the institutions of tertiary education will evolve in these directions. The twentyfirst century will see little difference in the internal structure of a large vehicle production company and a large university. Courses of instruction will be developed
K. Cox
and marketed throughout the world. Tertiary education will employ even more people as the value of their products becomes apparent. More education will be available to more people and the demand for the product will continue to increase. There will be an increase in the machine mediated interaction in education with unknown and unpredictable effects. REFERENCES
Darby, Jonathon, 1994. A vision of higher education in the Year 2000. Proceedings Apitite 94:15-18. Laurillard, Diana, 1994. Multimedia and the changing experience of the learner. Proceedings Apitite 94: 19-24. Cox, Kevin, 1994. Computers in tertiary education. Proceedings Apitite 94, 939-944. Laurillard, Diana, 1993. Rethinking university teaching: A framework for the effective use of educational technology. London: Routledge. Perelman, Lewis J., 1993. School's out: Hyperlearning, the new technology, and the end of education. New York: Avon. Woodhouse, David and Alma Craft, 1993. Exporting distance education and importer's view. Higher Education Management 5(3): 333-337. Muta, Hiromitsu and Saito, Takahiro, 1994. Comprehensive cost analysis of the University of the Air of Japan. Higher Education 28: 325-353. Hernes, Gudmund, 1993. Images of Institutions of higher education. Higher Education Management 5(3): 265-271. Yorke, Mantz, 1993. Quality assurance for higher education franchising. Higher Education 26: 167-182. Over, Ray, 1993. Correlates of career advancement in Australian universities. Higher Education 26:313-329. Glaser, Robert, 1963. Instructional technology and the measurement of learning outcomes: some questions. American Psychologist 18:519-521. Economist, 1995. Technology and Unemployment. February 11th 1995, 19-21 Winter, Richard, 1993. Education or grading? Arguments for a non-subdivided honours degree. Studies in Higher Education 18(3):363-377.
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Department of Computer Science City University ofHong Kong [email protected]
ABSTRACT The use of TDD (Telecommunication Devices for the Deaf) is common in Western societies. This has not been the case in Asian countries using the Han script (Chinese characters), due mainly to the difficulties of mapping large character sets to a reasonably sized device. Among the hearing impaired, the lack of suitable devices and methods also results in a lack of access to valuable services, such as emergency phone calling, which people with normal-heating take for granted. In this paper, we introduce our work on a Chinese language based TDD system. Among various possible input techniques, we selected one that is based on the strokesequence method because of its simple-to-learn and fast convergence characteristics. Since mapping sequences of strokes to characters can be a complicated procedure, a micro-processor based auxiliary device is being developed. A test is also set up so that communication behavior of the deaf can be evaluated to feed back into the design of the system. It is assumed that the results, when more generally applied, will significantly benefit the deaf communities in Chinese speaking countries.
INTRODUCTION In every society, about one person in a thousand can be expected to have a severe hearing loss that is going to cause him or her to be classified as legally deaf. Due to the nature of their handicap, the hearing impaired are isolated from the community atlarge: they are able to move 'normally' among the hearing community, but as soon as verbal communication is required, their handicap is noticed. The hearing impaired in the United States and Europe have had, for a number of years, a TDD (Telecommunication Device for the Deaf), based on the Latin alphabet. In contrast, the hearing impaired of Hong Kong (being mostly Chinese) are presently an isolated subculture that cannot interface with the community at large. While the use of the TDD in many developed countries of the West has permitted the hearing impaired community to take advantage of services that the hearing community have
O.L. Clubb and C.H. Lee
taken for granted, the Chinese speaking community (both in Hong Kong and in the PRC, as well as in other places) does not have the same tools and infrastructure available that would allow the hearing impaired access to most telecommunication services such as airline reservations, emergency numbers, as well as a means to communicate with friends and family. The hearing impaired of the United States and Europe have been able to make use of old teletype technology for communicating over standard telephone lines. The device has been streamlined over the years but still uses the baud code of the teletypes. Since the introduction of this technology, a large service infrastructure has been established for the hearing impaired in the West. It was quite easy for the Western hearing impaired to pick up the typing skills needed to be able to use the teletype devices in a two way interactive mode. In Hong Kong and some other Asian countries, the hearing impaired have not had the advantage of a TDD. Consequently, in large parts of Asia, the hearing impaired are isolated into a subculture with their sign language. This situation could be due to many reasons, among them cultural and legal ones. Equal rights legislation is now starting to be passed in places like Hong Kong; further legislation will have the objective of providing equal opportunity for the heating impaired. As a result, community organizations will have to investigate how such infrastructure services can be provided. An example is the emergency number service "999" which was not available to the hearing impaired in Hong Kong until around 1994, and at present only is available through a fax facility. This means that the heating impaired still do not have an interactive connection to the emergency service. A character based TDD would be required in order to enable the authorities to set up an interactive telecommunications service for the Chinese heating impaired communities of Hong Kong and perhaps the PRC as well. A project is under way at City University of Hong Kong to develop a Chinese character-based TDD. The idea of the City University's TDD project originated during IT Week 1991, part of which dealt with 'IT [InformationTechnology] for the Deaf' Subsequently, the TDD project caught the attention of Hong Kong Telecom who donated HK$200,000 to develop a prototype, to run on an IBM PC. A PC based prototype has been developed and demonstrated. We will soon start trials by placing seven to ten prototypes in the homes of Hong Kong hearing impaired families. SIGN LANGUAGE IN THE CHINESE COMMUNITIES The sign language used in the hearing impaired communities of Hong Kong and other (including Western) territories and countries does not map directly onto the language as spoken or written by the heating communities. Sign languages are like spoken languages in that they evolve into many varieties; consequently, there is no international sign language. Since sign languages do not map directly onto the spoken and written language, the writing systems of the hearing community (including those using Chinese characters) are second languages for the hearing impaired. This makes literacy more difficult to obtain for the hearing impaired. The sign language users of Hong Kong have borrowed some Chinese characters from the written language into their sign language. They will 'sign' these borrowed characters by drawing them in the air. However, this technique has limited application,
Chinese TDD
and there still is no direct mapping from such 'signed' characters onto the grammatical structure. (Fok and Bellugi, 1986:224) THE CHINESE LANGUAGE The Chinese language has many dialects. However, in the PRC and Taiwan there is one official spoken language, 'Putonghua', (called 'Mandarin' in Taiwan, and also in Singapore, where it is one of the four official languages). In Hong Kong, the majority of the people speak a Chinese dialect called Cantonese. Written Chinese was standardized in the Qin Dynasty, about 210 BC, but can be traced back some 3,000 years. (Guo and Zhang, 1985:26) The Chinese writing system was based on the language of the major ethnic group, the Han. Recently, this writing system has branched into two major varieties: the 'conventional Chinese character set', used by Taiwan and Hong Kong (and promulgated as the official standard in Singapore), and the 'simplified Chinese character set', used by the PRC. Approximately 1,500 years ago, some neighboring peoples, such as the Japanese and Koreans, started borrowing the Chinese characters for writing their own languages. The Chinese writing system is ideographic, using a combination of pictograms, ideograms, and phonograms (Tang and Clubb, 1992:1 -7); it uses one character per syllable. This works well with Chinese, since the morpheme is normally at the syllable level. In Japanese, however, it often takes several syllables to express the full meaning of a word, and for this reason the Chinese writing system does not map well onto the language. Since the Chinese system did not match the structure of their languages, both the Koreans and the Japanese devised supplementary syllabic alphabets, called 'hangul' in Korea and 'kana' in Japan. In both countries, however, the Chinese characters were retained, with occasional simplifications (which do not normally match the modern, simplified characters introduced since 1955 in the PRC). While the Koreans have given up most of their Chinese characters (especially in the North), the Japanese 'kanji' are still widely used, and form the basic requirement for written communication; in 1954, a subset of 2,000 so-called 'tooyoo kanji' (literally: 'kanji in normal use'] were isolated as the minimum set to be mastered by everybody upon leaving High School. For any normal purpose, such as reading newspapers and novels, however, knowledge of minimum 5,000 kanji is required. (O'Neill, 1982:15-16) Around 100 AD, Xu Shen wrote a famous dictionary known as the 'Shuowen Jiezi'. In his dictionary, Xu Shen stated that there were six writing principles in forming Chinese characters: 1. Pictographic characters (pictures: sun, eye, mountain, etc.); 2. Symbolic characters (an abstract idea: one, above, below, etc.), 3. Ideographic characters (a character formed by combining several symbols:e.g. 'woman' and 'child' together give the meaning 'goodness', a 'woman' under a 'roof means 'peace', etc.); 4. Ideophonetic characters (one portion of a word is a symbolic character or 'radical', giving it a general meaning; its phonetic counterpart indicates the pronunciation);
O.L. Clubb and C.H. Lee
5. Transfigurative or extended characters (the character meaning 'music' is used to denote 'pleasure'); 6. Borrowed usage (a few characters with no connected meaning, such as the numerals above the number 3). The majority of Chinese characters (about 95%) are formed on the basis of the first four principles. One of the major drawbacks of the Chinese writing system is the number of characters. For example, the Chinese 'Great Dictionary' of 1915 AD listed 49,905 characters. However, it is claimed that if one knows 100 characters, one can recognize 40% of the content of a general article (but it is not said which 40%). To recognize 90% of the content, it takes at least 1,000 characters (but 'recognize' is not the same as 'understand'). (Chen and Jin, 1984:13-14) CHINESE COMPUTING INPUT SYSTEMS The authors have developed two different types of TDD prototypes based on a personal computer. The first type uses a modem for data transmission and the E-Ten input method. The second type uses dial tones for data transmission and the Jie-jing input method. (Both of the above are keyboard input methods). The reason the project has concentrated on the keyboard as input device is that we are in a "keyboard culture" of computer input methods; since many of the hearing impaired cannot speak, even future technologies such as voice recognition would not be useful. The two input methodologies were chosen because a keyboard containing all the Chinese characters would become too bulky. Even so, we have a few options. There is the coding method based on a Pinyin transcription, using an alphabetic keyboard. However, this works only if the person using the system knows the Putonghua phonetic value of the character. Another approach uses the stroke order method. Chinese characters are written with strokes, following a prescribed order. Stroke input methods use between eight to ten primary strokes to generate characters. This method is among those that the authors believe are worthy of further investigation in connection with developing a Chinese TDD system. For the City University project, we adopted a stroke order input system, the Jie-jing method. This method was developed in Australia. The system uses statistical concepts to speed up the input method. When the first stroke of a character is entered, the five most frequently used characters that start with that stroke are displayed. The user may then select one of the displayed characters; alternatively, if the desired character is not offered, the next stroke of the original character is entered, and so on, until the desired character is displayed. When the fully determined character has been inputted, the first stroke of the next character is entered. Using linguistic techniques, the system tries to present the next possible character based on the meaning of the previously selected character. Even if the user only knows a few of the strokes making up a character, under the Jie-jeng system he or she has a reasonable chance to generate the desired character. This situation is ideal for the hearing impaired, since most of them are not as skilled at writing as the population at large. It is felt that the aid provided by the JieJeng system makes it an attractive option.
Chinese TDD
One of our major objectives is to keep the system simple to use. The reasoning is that using the device should require a minimum amount of training. To improve ease of use, the primary stroke symbols are directly represented on the key pad. Another objective is to use as many standard components as possible. We believe that the necessary technology exists, and that the only problem is how package it. The device could use the 12 keys on a touch tone telephone for inputting the message; the use of tones would eliminate the need for a modem, thus making the device simpler as well as less expensive. Also, since the device needs only to receive and transmit characters from a similar device, and would not have to interact with any other machine, it could be kept relatively simple. The authors want to keep the cost of the final production model under US$250, in order to keep the device affordable for the target market. As to the hardware components needed to build a Chinese TDD, the present prototype uses a processor (80286 or similar) to run the Jie-jeng Chinese inputting program. In addition, to store tables, bit maps for characters, and programs, all in all about 1 megabyte of ROM would be required. The device would need about 64K of RAM for local variables used by the programs. A tone generator and tone decoder is used for dealing with the touch tone telephone system. And finally, an LCD screen would be needed that could display at least 25 characters in a message area, in addition to one line containing sufficient space to display 20 characters in the character generation area. TESTING STRATEGY Since the Chinese character based writing system does not map directly on to the deaf's signing system, some testing has to be done in order to find out what specific problems the hearing impaired encounter using the TDD device. The stroke based input method was designed with an objective of minimal training requirement, and indeed, tests have established (although not with the deaf) that in the case of some office workers with ordinary computing knowledge, inputting simple chracters required no training at all. The testing is divided into two phases: In the first phase, the device is placed in four or five homes of hearing impaired adults who regularly communicate with each other. This phase will look at the software interface and general problems with the system, and will continue for about two to three months. The adults will evaluate the device and give feedback on improvements needed. The second phase is to have the device placed in the homes of hearing impaired secondary school students so that they can use the TDD on a casual basis. For testing purposes, there need to be a control group and a test group. The two groups should be randomly picked from a pool of students that are of approximate the same level of ability in using written Chinese characters. The devices will be left in the homes of the test subjects for one school term. At the end of the term, the students' grades in Chinese will be evaluated to see if there has been any improvement. The test group's grades as a whole will be compared to the control group's grades.
O.L. Clubb and C.H. Lee
The test subjects' conversations will be saved for analysis. (For ethical reasons, the users will have to be informed that all conversations potentially will be analyzed; their permission should be requested). The analysis will first look for common mistakes in character construction. Second, grammatical errors will be identified. Since there is no direct mapping of characters to signing, patterns of grammatical mistakes may be expected to occur. If such pattems can be established, then teachers of Chinese, by looking for these patterns, can give remedial help to hearing impaired students. The authors feel that if the hearing impaired users have to communicate using the TDD, they will become more literate as a side effect. The more the test subjects are motivated to communicate with others by telecommunication, the more they will be motivated to use the TDD. The more they use the TDD, the more they will practice using the written language; and the more practice they have with the written language, the better they will become at using it. SERVICES PROVIDED BY A CHINESE C H A R A C T E R TDD
At present, the hearing impaired people of the Chinese speaking communities, when making or answering telephone calls, must rely on other people to interpret for them by using sign language. For a hearing impaired person, using the telephone involves having a person present with "normal hearing" to act as an interpreter. This results in a loss of privacy for the hearing impaired person; in addition, such a service may be difficult to arrange, particularly if two hearing impaired people wish to communicate via a telephone line. Our Chinese character TDD will directly address this problem. Also, many other services that people in the hearing community take for granted, such as the use of emergency numbers for police and fire services, could suddenly become available to the hearing impaired. Other, less critical services, such as making an airline reservation, could now also be within reach of the Chinese hearing impaired community. POSSIBLE ADDITIONAL FEATURES Some further thoughts about the final product follow: Both real-time interactive exchange and "e-mail" style messaging aretobe supported. For a TDD, the real-time communication functions are the primary goal. However, given that the resources needed to support Chinese input include fairly extensive computing resources, we can make the messaging facility available with small incremental costs. Furthermore, other telephone features, such as conference 'calls', are being investigated. CONCLUSION The components to build a Chinese TDD are there, they only need to be integrated. The hearing impaired of the Chinese character based writing communities deserve to have access to the same services that are available to the hearing communities in the West and elsewhere. By providing the Chinese hearing impaired a means to have access to communication via telephone lines, a Chinese character TDD would be of significant human service. And not only to the deaf: In September of 1986, of Hong Kong's 5.5 million people, 8,000 were registered with the Central Registry of the Disabled as being deaf (Kwan, 1986:470). For every
Chinese TDD
deaf person, there is at least one person with normal hearing that may need a TDD device to communicate with that deaf person. Furthermore, also people that have suffered a severe hearing loss may find the TDD an easier way to communicate. Technology is not an issue with our project. The technology is available in the market-place, the problems are of economy and packaging. Inour project, we found that the main challenge was in the area of input methods, of interactive operation design, and of gathering information by TDD. These findings will help us to further design features which are of service to the deaf communities. REFERENCES
Chen, Zhengwu, and Jin Lianfu, eds., 1984. Chinese Information Processing Systems. Beijing: Chinese Computer User Alliance et al. (In Chinese) Fok, Y.Y., and Ursula Bellugi, 1986. Towards Better Communications, Cooperation and Coordination. In: Proceedings of the 1st Asian-Pacific Regional Conference on Deafness. Hong Kong. Guo, Pinxin, and Zhang Songzhi, eds., 1985. Chinese Information Processing Techniques. Beijing: Nation Defense Industry Publishers. (In Chinese). Kwan, E. W., 1986. Toward Better Communications, Cooperation and Coordination. In: Proceedings of the 1st Asian-Pacific Regional Conference on Deafness.Hong Kong. O'Neill, P.G., 1982. Essential Kanji. 2,000 Basic Japanese Characters Systematically Arranged for Learning and Reference. New York & Tokyo: Weatherhill. Tang, M.W., and O.L. Clubb, 1992. Chinese Computing, History and Trends. Hong Kong: Tamarind Books.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 15
Department of Philosophy Hong Kong University laurence@hkucc, hku. hk
by Bill Watterson
CaIvin and Hobbes Is~T IT STrAnGe_ ~
SMELLS [email protected]~O ~I~CATI~E, BuT ~E c . ~ T
~4AS A S~ORK~, BElk~B~S~ / IY,.RC~N~,N~,~,LS I BR~ ~KX,BU~ Ik.__ 5~c._LL. ~.t I W%L}LO~A~E / T~ LQVq
, ~ou~ T~LU~ ~ ~E~ ~E TI4ATM~\MAL~ L SUR~. / i
S~ELL? 9~
(c) Universal Press Syndicate The cognitively impaired (let this cover sensory, emotional and other kinds of mental impairment too) ought to be one of the primary objects of research in Cognitive Technology. If one wishes to learn about the relation of the human brain to the mental characteristics of a person, or to the environment with which a person interacts, one method would be to remove parts of a subject's brain and observe which functions were obliterated or impaired as a result. There are, however, obvious ethical limits to this kind of investigation. Another method is to observe subjects who already have
L. Goldstein
some deficit and to compare these with non-impaired subjects 1. For example, in one famous case dating from 1848, an accident at work caused a large metal stake to fly through the head of Phineas Gage, a young roadworker, yet, despite heavy loss of brain matter, this individual appeared to suffer no loss of cognitive abilities and suffered only a personality c h a n g e - he became more moody. Encephalics, with only 10% of normal brain volume, and that squeezed around the periphery of the skull, often are normal or close to normal in all aspects of cognition. Stroke victims frequently suffer specific language deficits, such as anomia: all aspects of normal speech are retained, apart from the ability to remember names. The study of aphasics leads to a better understanding of the structure of language, and that, in turn, to new techniques for second-language acquisition. Any such process falls under the description 'cognitive technology' if it involves the construction of devices which play an essential role in manipulating subjects' cognitive economy (their mental make-up) either in the course of research experimentation or as the outcome of such research. Now, in a process of this sort, the ideal order of events would be as follows: gain an understanding of the nature of a cognitive impairment, then build devices to help remedy the deficit by, as it were, stretching one part of the mind/brain to compensate for what is missing in another; next investigate the possibility of applying the mind-stretching technique to the non-impaired. By a process of mutual adjustment, a more symbiotic relationship between mind and the environment would be achieved: the artefactual environment could be made more accessible to minds, and minds could be stretched to engage more profitably, either in terms of efficiency or ecstasy with the surrounding world 2. Unfortunately life is not that simple. First, it may be no easy task to specify, except in the broadest terms, the nature of the particular cognitive impairment - it is one thing to describe brain damage, quite another to describe mental deficits. Second, in most cases of such impairment only the most tentative, provisional hypotheses can be put forward. So, third, at this stage, one is not generally in the position of being able to build a device to remedy the deficit; rather, the function of building the device is primarily to test the hypothesis. In the light of test results, new hypotheses get formulated, and new devices have to be built in order to test them. This last phase may then cycle several times. The present paper describes a device, in fact, two devices, which were designed to help blind students learn a simple fragment of elementary l o g i c - the theory of syllogisms which originated over two thousand years ago with the work of Aristotle. The devices have been built and used, but have not so far been extensively tested. It will be clear why the project is still at this early stage: in Hong Kong it is difficult to find a reasonably large cohort of blind people who need to learn syllogistic logic! The next stage of the operation would require building multiple copies of the equipment
Thus, Oliver Sacks (Sacks, 1995) writes: 'Total colour blindness caused by brain damage, so-called cerebral achromatopsia ... has intrigued neurologists because, like all neural dissolutions and destructions, it can reveal to us the mechanisms of neural construction--specifically, here, how the brain "sees" (or makes) colour.' 2 I am not taking a stand on the question of whether the mind is an entity separate from the brain or whether the two are identical or, indeed, on any other option in the theory of mind. The use of both terms is merely in recognition of the fact that there is at present no intertranslation between psychological and neurophysiologial discourse.
Teaching Syllogistic to the Blind
and performing tests in, for example, a large number of universities and community colleges in North America. But for purposes of the present paper, the outcome of this large-scale experiment is not so important. For the project, even at its present state, throws up a host of very deep questions, many not local to itself but integral to cognitive technology research in general. It will be necessary to give a sketch of what we are trying to teach by means of these devices. The theory of the syllogism is simple, but beautiful. There is a useful comparison that can be made between it and Euclidean geometry. Recall that there are some geometrical truths that seem so obvious as hardly to require proof; for example, that vertically opposite angles are equal. The proof of this theorem is indeed short and dead simple. But other theorems of geometry, such as Menelaus' Theorem are extraordinarily surprising. Yet, by a series of simple steps, Menelaus' Theorem can be derived from the elementary Euclidean postulates. Now, Aristotle had the idea, which we now take for granted but which, when you think about it, is utterly sensational, namely that the discussions that people have, specifically, the arguments that they put forward, can be taxonomized and studied in a scientific, mathematically precise way. There are simple arguments that we can see, straight away, to be valid, such as All ducks are German citizens All German citizens are prime numbers Therefore
All ducks are prime numbers.
But there are other arguments which, though not, on the surface, much more complicated, are not at all easy to assess for validity. Here's an example: No cows are French citizens Some French citizens are moons of Jupiter Therefore
Some moons of Jupiter are not cows.
Arguments like these which have just two premises, three noun phrases each occurring twice, and where each sentence has to be of one of only four possible types are known as syllogisms. Syllogisms comprise only an insignificantly tiny fraction of the arguments that occur in ordinary conversations. Psychologists have run experiments to find the average time it takes normal subjects to produce a verdict ('valid' or 'invalid') on the different possible kinds of syllogism, and, as you can imagine, the time needed for the type of argument exemplified by our second example is far greater than that needed for those exemplified by our first.
The form (or structure) of the first argument is All X are Y All Y are Z Therefore
All X are Z
L. Goldstein
246 And the form of the second argument is No X are Y Some Y are Z Therefore
Some Z are not X
Aristotle showed that, on the basis of our knowledge of the validity of simple argument-forms, such as the first, it is possible, by a series of simple steps, to derive a correct assessment of the validity or invalidity of difficult argument-forms such as the second3--just as, in Euclid, we build upon the simple theorems in order to prove the complicated ones. This is a powerful technique. With the complicated syllogistic arguments, people differ quite a bit in their assessments of validity. But with Aristotle's work, we have a rigorous procedure for determining with mathematical objectivity which arguments are valid and which are not. It is clear that someone who can prove Menelaus' Theorem has a richer understanding of the theorem than does someone who just understands a statement of the result and accepts it as true. Good teaching imparts deep understanding. Now, it is known that, of the 256 possible types of syllogistic form, only 19 argument forms are valid. So, if we wished to teach someone to recognize valid syllogisms (so that, in practical life, he could detect and reject the invalid syllogistic arguments he encountered) all we should have to do is to make him learn the 19 valid forms. However, this would not be to furnish that person with a rich understanding of syllogistic. In the 19th century, the English mathematician John Venn invented a technique for testing the validity of syllogisms which is both extremely simple to employ and which imparts (or, at least, seems to impart) rich understanding. His is a graphical technique. The basic idea is that the sentences occurring in syllogistic arguments be paraphrased as sentences expressing relations between classes of things, and then those relations can be depicted graphically, using circles to represent classes. For example, consider the sentence
No positrons have negative charge. This can be paraphrased as
The intersection of the class of positrons and the class of things that have negative charge has no members - it's empty. Now represent the intersection of classes as the intersection of circles, and indicate emptiness by shading the relevant section of the diagram. The result is this
3 See (Lear, 1980).
Teaching Syllog&tic to the Blind
things that have negative charge
Figure 1 The left hand circle represents the class of positrons; the right hand circle the class of things that have negative chargeMbut this diagram can be used to represent any sentence which has the form No X are Y Consider next a sentence of the form Some X are not Y
The paraphrase for this is The part o f the class o f X which contains no members o f Y contains some m e m b e r s - it's non-empty
This is represented graphically as follows
Figure 2
L. Goldstein
Note that non-emptiness (i.e., the presence of some members) is denoted by a little bar. It should be obvious how to give a graphical representation of the sentence 'Some X are Y'. We are now in a position to draw a Venn diagram for the first of our sample arguments, the one which had the form No X are Y Some Y are Z ThereforeSome Z are not X Draw 3 intersecting circles and represent the two premises of the argument, as follows
Y \
Figure 3 Now, how would the conclusion be represented on this diagram? Obviously, by drawing a bar lying within the 'Z' circle but outside the 'X' circle. But look at the diagram: in representing the premises, we've already done just that: there is already a bar lying within the 'Z' circle but outside the 'X' circle; the conclusion is already contained in the premises and that's the criterion for a deductive argument being valid. What about invalid arguments? First, I'll show how to represent graphically a sentence of the form All X are Y The paraphrase for this is The class of things which are X but not Y is empty
Teaching Syllogistic to the Blind
in other words There are no things that are X but not Y
So the representation looks like this
Figure 4 You can now represent the first of our sample arguments and demonstrate graphically that it is indeed valid. But consider the argument All sulphates are alkalines No sulphates combine with ammonia Therefore
No alkalines combine with ammonia
To depict the conclusion of this argument, one would need to shade the whole lens formed by the intersection of the Y (alkalines) circle with the Z (things which combine with ammonia) circle. No such picture is obtained when the premises are represented in the diagram (figure 5); therefore the argument is invalid. In my own Department, we use a microcomputer program called JOHN to teach syllogistic logic, and JOHN, as its name implies, employs the Venn-diagrammatic technique 4. Of course, this program is of no use to blind students. So how does one teach blind students to test the validity of syllogisms? Well, one method would be as follows. The premises of any syllogism can be written as expressions in Boolean algebra. So, for example, a sentence of the form: No X a r e Y
4 This is part of a software package (Goldstein and Moore, 1991). A software package called Hyperproof, developedby Jon Barwise and John Etchemendy teaches a much broader segment of firstorder logic by graphical methods. Information on this is available on the World Wide Web. The URL is
L. Goldstein
thingsthat combine with ammonia
Figure 5 is written as the equation XY = 0 Since there is an effective (i.e., a purely mechanical) method for determining the validity of syllogisms, it would be very easy to write a program such that, when the blind student typed in the Boolean expressions for the premises and conclusion of a syllogism, the computer would always return, in audible form, the correct verdict on whether that syllogism was valid or not. Hence the blind student would be able, perfectly easily, to test syllogistic arguments for validity. The problem is, of course, that by using such a method, the blind student gains no rich understanding of syllogisms; he just (blindlyT) gets the answer right every time. The challenge, then, is to design a device for blind students that will enable them to get as deep an understanding of syllogisms as sighted students using the Venn technique can obtain. This may not be a simple task for, as Calvin reminds us (see cartoon at the opening of this paper), the different modalities are not comparable in terms of either sensitivity or dimensionality. Both Tim Moore and I faced this challenge, but our solutions were very different. He made a 'tactile' version of Venn's diagrams which has intersecting hexagons instead of intersecting circles. The student locates his position on the device by feeling the different serrations on the raised rims of each of the three intersecting hexagons, and represents premises by means of plastic fillers (the equivalent of shading on the Venn diagram) and a metal piece which plays the r61e of Venn's bar. The idea behind Moore's device Venntouch is to create equivalents in tactile 'space' to the visible relations embodied in Venn's diagrams.
Teaching Syllogistic to the Blind
The device called Sylloid, invented by me, makes no such attempt to preserve an isomorphism across sense modalities. Instead it is inspired by the recognition that the Venn diagram can be divided into seven significant areas, as shown
Figure 6 Sylloid consists of seven tetrahedra (pyramids) fitted to seven of the faces of a solid regular octahedral core, the 'spare' face being anchored to a heavy base. The user locates his position on the device by means of some metal brailled buttons.
L. Goldstein
Premises are represented by pulling the relevant tetrahedra away from the core (the Venn equivalent of shading) and by slapping a magnetic hinge across faces of the tetrahedra to represent non-emptiness. There are certain poor features of the design of Sylloid. For example, when trying to pull a tetradron away from the core, it is hard to get a purchase on a vertex, especially if one's hands are sweaty. Second, when a tetrahedron is successfully pulled out, it can fall to the ground, creating problems for a user who cannot see where it has landed. Venntouch too has its practical problems. The plastic 'filler' pieces are not easy for a blind student to orientate whenthey have to be inserted into a given area: fitting pieces to places is fairly easy for a sighted person, because we can see simultaneously the shapes of the gap to be filled and of the piece for filling it. For the blind person, a feat of tactile memory is required. These are practical problems, and there is no substitute for building and testing out devices, for it would be a miracle if all such difficulties could be anticipated and avoided. However, the more interesting problems are theoretical, and it is now that the questions for future research start piling up. For a start, what is the purpose of teaching someone the rules of syllogism or the Venn-diagrammatic technique for testing validity? One answer would be that it makes the learner a better reasoner, more sensitive to the argumentative mistakes of others, more circumspect in his own reasoning. Yet it is by no means clear that a diagrammatic technique can be 'transferred' inside the head so that, in normal reasoning situations, the master of Venn can reason well, unaided by diagram or device. For comparison: you can teach a man with a one leg to walk with a crutch, but that will not mean that he can walk well if you take away the crutch. This question of learning transfer from artificial to real situations, well known of course to psychologists, is bound to loom large in cognitive technology. One response to the above problem might be that, in the case of Venn diagams, one can move the 'crutch' to the mind. In other words, after a bit of practice, the learner does not need to draw diagrams on paper, he can just create such diagrams in his visual imagination. It is true; he can. But reasoning by introspecting diagrams that one has introspectively constructed is not the normal way in which an efficient reasoner operates. PhenomenologicaUy, we know this to be so: rarely do visual images invade the mind when we are engaged in deductive reasoning. And it certainly could not be the case that such images are necessary for reasoning, for otherwise blind people would not be rational. Fast reasoning would be inhibited, not enhanced, if a cumbersome procedure of evoking images had to be invoked. It is often simply assumed by psychologists and linguists that any kind of thinking or perception involves the manipulation of inner representations, and that it is the presence of such entities that makes perception possible. On this account, what would be needed of a device for the blind that did the work of Venn Diagrams is a piece of equipment that produces an inner tactual representation playing the same, or roughly the same role as the visual representation, or sense-datum does for the sighted person. But whether there are such inner representations and, if there are, what they are, has been the subject of heated philosophical debate for centuries. There was, for example a famous discussion of the issue between Malebranche and Arnauld, the latter denying
Teaching Syllogistic to the Blind
the existence of these 'inner objects '5. One problem, of course is to understand how these objects perform the function they are alleged to perform. Are these inner objects perceived by 'inner eyes'? If this is so, then the project of producing an inner tactual representation (in the sense of an inner representational object) for a blind person is doomed to failure 6. A more cautious approach to the problem of how exterior objects are mapped into the interior and thence processed is taken by Keith Stenning and Jon Oberlander who argue that graphical representations are more computationally tractable than arguments stated verbally and hence that, in psychological reality, it would be more efficient if, where possible, reasoning were conducted graphically. They write: 'Since we observe that the external circles are conducive to reasoning, we speculate that it is because this external aid maps onto internal structures and processes perspicuously. And so the algorithm hints as to what these processes might be. We do not however believe, in the style of some imagery researchers, that what is implemented within is isomorphic to the full detail of the external aids. We rather look for minimal internal implementations...' 7 Yet their minimal implementation (for which they have produced a PDP simulation) is still recognizably graphical and their leading idea is that the spatial analogy (sets as containers) reduces the 'problem space' thereby simplifying syllogistic reasoning to the point where, literally, it may be taken in at a glance. Clearly, then, this method is not available to the blind. Yet blind students, using Sylloid and Venntouch, have become perfectly competent at solving syllogisms. Some intriguing possibilities now suggest themselves. First, that testing for validity at a touch may be quite a different process, but one (almost) as efficient as testing at a glance. One would need to compare solving speeds for blind and sighted subjects and, at this stage, such experiments would be wholly unreliable, for we are quite uncertain about the extent to which design defects in Sylloid and Venntouch slow the touch testing. (It would be necessary also to separate the congenitally blind from those who had lost their sight; the tactile perception of the latter group might be contaminated by visual strategies.) Second, we have pointed out that the 'proof of the pudding' of any device used for teaching the testing of syllogisms comes when the device is thrown away and the user attempts to bring his skill to bear on real-life reasoning. Now, it is obvious that the use
5 For discussion of these issues and of the Malebranche-Arnauld controversy, see (Ishiguro, 1994) and (Watson, 1994). The philosophical problems involved in the theory of perception are fearsomely difficult, and cannot be broached in this paper. For a useful introduction to recent discussion, I recommend the editor's introduction to (Crane, 1992). For a defence of the view that blind people can form perceptual representations of space, see Gareth Evans' paper 'Molyneux's Question' in (Evans, 1985). 6 It has, however, been argued with some persuasiveness that one can demonstrate a contrast between visual and tactual spatial perception (e.g., that there is no tactual counterpart to the visual field) without relying on the theory of private, inner objects. See (Martin, 1992). 7 See (Stenning and Oberlander, 1994) which favours Euler Circles over the Venn Diagrams that constitute the mental models of (Johnson-Laird, 1983). There are several standard logic texts that discuss both Euler Circles and Venn Diagrams, for example, (Stebbing, 1966). William and Martha Kneale, in (Kneale and Kneale, 1962), p.337, point out that geometrical illustrations of logical relations similar to Euler's had been used by Leibniz (1646-1716), and Clark Glymour claims that drawing circles to test syllogisms for validity was a device developed during the Renaissance (Glymour, 1992).
L. Goldstein
of calculators does nothing to enhance a user's mental arithmetic: it is pitiful to see checkout clerks at supermarkets rendered helpless when the computer goes down. Likewise, the system of cueing speech by means of hand gestures, intended to improve the speech of deaf children, had the effect of depreciating the children's speech because they came to use the gestures as a substitute for forming the correct sounds 8. Do diagrams and devices help furnish the user with useable reasoning skills, or do they have the opposite effect of diminishing our natural reasoning abilities? The really interesting question would be whether in real-life argumentation, blind people who had learned from Sylloid, Venntouch or some similar device performed better than the sighted who had learned through Venn or Euler (and better than the untutored). If this proved to be so, then sound pedagogy would demand that we teach sighted people with devices designed originally for the blind. Is this a realistic possibility: that the learning achieved through the methods designed for the blind is superior to what sighted people achieve? This brings me to my third point: Yes. In discourse about argumentation, the containment metaphor is pervasive. As we have seen, set membership is construed as objects within a container. We say (following Kant) 'in all judgments ... either the predicate B belongs to the subject A, as something which is (covertly) contained in this concept A; or B lies outside the concept A...'.9 We say that a deductive argument is valid if the conclusion contains the premises, and this metaphor is made vivid in Venn diagrams where we use circles as depictions of containers and test for validity b y examining whether those containers have been filled (the circles filled in) in such a way that the depiction of the premises contains the depiction of the conclusion. George Lakoff has argued that meaning is a product of metaphor; in this case that sets are containers. 1~ But this may be an entirely false and misleading conception of sets. Some sets contain themselves as members - e.g., the set of sets that contain more than five members. But nothing can physically contain itself. Further, if we think of a set as a container, we are likely to find difficulty in accepting the non-existence of the Russell set (the set containing all and only non-self-membered sets) because we envisage that set as a bucket into which we can unproblematically shovel members, such as the set of pigs, the set of sets of students studying logic, the set of prime numbers etc.. Yet we know, from elementary logic, that the Russell set does not exist. So using containment as a model' for set membership carries a danger--that of reading features of the model back into the thing being modelled. It is likely, then, that the containment metaphor, as incorporated in Venn diagrams, is not the best model for explaining the theory of sets, nor for understanding what makes for syllogistic validity and invalidity. The operation of SyUoid doesn't trade on the containment analogy, and the structure it incorporates seems even more minimal than that of Stenning and Oberlander. But to say that in this device we have the heart, the essence, of syllogistic reasoning would be vastly premature.
8 See (Cornett, 1967). I'm grateful to Gill Clezy for information about this syndrome. 9 (Kant, 1781) A6/B10. In 'Two Dogmas of Empiricism' (Quine, 1953), Quine criticizes Kant's account on the grounds that 'it appeals to a notion of containment which is left at a metaphorical level' (p.21). 10(Lakoff, 1987: 458).
Teaching Syllogistic to the Blind
Cornett, R. O., 1967. Cued Speech. American Annals of the Deaf 112:3-13. Crane, Tim, ed., 1992. The Contents of Experience. Cambridge: Cambridge University Press. Evans, Gareth, 1985. Collected Papers. Oxford: Oxford University Press. Glymur, Clark, 1992. Thinking Things Through. Cambridge, MA: MIT Press. Goldstein, Laurence, and Tim Moore, 1991. Logic Tutor: a suite of programs and manuals. Hong Kong: Logical Products (HK) Ltd.. Ishiguro, Hide, 1994. On Representations. European Journal of Philosophy 2: 109124. Johnson-Laird, Philip, 1983. Mental Models. Cambridge, MA: Harvard University Press. Kant, Immanuel, 1781. Critique of Pure Reason. English edition, 1929, trans. N. Kemp Smith. London: Macmillan. Kneale, William, and Martha Kneale, 1962. The Development of Logic. Oxford: Clarendon Press. Lakoff, George, 1987. Women, Fire and Dangerous Things. Chicago: The University of Chicago Press. Lear, Jonathan, 1980. Aristotle and Logical Theory. Cambridge: Cambridge University Press. Martin, Michael, 1992. Sight and Touch. In: Tim Crane, ed., The Contents of Experience, 196-215. Quine, Willard, 1953. From a Logical Point of View. Cambridge, MA: Harvard University Press. Sacks, Oliver, 1995. An Anthropologist on Mars. New York: Picador. Stebbing, Susan, 1966. A Modern Elementary Logic. London: Methuen. Stenning, Keith, and Jon Oberlander, 1994. Spatial inclusion and set membership: a case study of analogy at work. In: K. Holyoak and J. Barnden, eds., Advances in Connectionist and Neural Computation Theory, Vol. 2. Hillsdale: Lawrence Erlbaum. Watson, Richard, 1994. Having Ideas. American Philosophical Quarterly 31:185-198.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 16 USING MICROCOMPUTER TECHNOLOGY TO PROMOTE STUDENTS' "HIGHER-ORDER" READING Che Kan Leong* Department for the Education of Exceptional Children University of Saskatchewan, Canada leong@sask, usask, ca
ABSTRACT This report is in two interrelated parts. Part I discusses the theoretical underpinning of computer-mediated reading and text-to-speech computer systems for the enhancement of reading. Part II reports on a series of three studies using the sophisticated DECtalk text-to-speech computer system with students, in a move towards the goal of"computer-based medium for thinking and communication." Study 1, with two experiments, showed a high level of intelligibility of DECtalk speech in children. There was some evidence of the efficacy of combined on-line reading and DECtalk auding. Study 2 used a pre- and post- test training design and examined the comprehension of reading 12 expository prose passages in a group of 67 grades 6, 7 and 8 below average and above average readers; the group was further divided into subgroups for on-line reading only, and for on-line reading plus DECtalk auding. There was an overall training effect, but the efficacy of DECtalk plus on-line reading with explanation of difficult words was verified for only two of the passages. Study 3 further tested the contribution of grade (age), reading level (above and below average) and response mode in on-line reading plus DECtalk auding in 192 grades 4, 5 and 6 students. The students in each grade were randomly assigned to one of 4 experimental conditions: (1) on-line reading plus DECtalk auding, (2) on-line reading plus DECtalk auding plus explanation of difficult words in both modes, (3) on-line reading plus DECtalk auding plus explanation of difficult words in both modes plus metacognitive The studies reported were assisted in part by the Social Sciences and Humanities Research Council of Canada with SSHRCC research grant No. 410-89-0128. I am grateful to SSHRCC for this assistance. I thank S. Lock, M. Leung and L. Wang for the different phases of DECtalk computer programming; M. Baker, J. Lappa, M. Mackay, G. Martens, L. Proctor, L. Reineke and K. Sarich for their work in different schools over a period of time; and the students and teachers in these schools for their participation in the various phases of the DECtalk project. Aspects of this paper were presented at the Invitational International Symposium on Exploration and Advancement of Technology for Persons with Learning Disabilities in Missillac, France in July, 1993 and at a seminar for the Cognitive Technology Group, City University of Hong Kong, Hong Kong in November, 1993. Recent post-experimentation discussions with Jerome Elkind, Ingvar Lundberg and Marshall Raskind have given me further ideas to explore this line of work for research and instruction. Any shortcomings are necessarily my own.
Che Kan Leong
activities, and (4) on-line reading plus DECtalk auding of the "simplified" passages. Reading comprehension was assessed in two ways: (1) verbal answers to inferential questions, and (2) verbal summaries of the passages. Analyses of variance and covariance showed significant differences in grade, reading level, and response mode in favor of inferencing over summaries, but not with respect to the experimental conditions. However, reports and observations showed high motivation of learning among the students. The three studies taken as a whole are discussed within the framework of knowledge acquisition and social construction of knowledge. INTRODUCTION The importance of conceptual and theoretical bases for computer-mediated reading is argued forcefully in a review by Reinking and Bridwell-Bowles (1991). Other researchers suggest that the computer also reorganizes and redefines cognitive functions, not simply amplifying them (see Webb & Shavelson, 1985, for details). These kinds of theoretical and empirical studies have been expanded to include a wider range of technology of computers, videodiscs, and teleconferencing to construct or reconstruct learning in information-rich realistic contexts (see Cognition & Technology Group in Vanderbilt, 1992; Lehrer, 1992; Nix & Spiro, 1990, for details). There is evidence of actual benefits and further potentials in using computer technology to assist learning in students. However, computer technology alone cannot guarantee "optimal adjustment" to individual learners because of the complexity of human behavior. To be effective, computer technology needs to involve human tutors in the process, and significant adult-child interaction and appropriate instructional procedures are necessary (Hativa & Lesgold, 1991; Margalit, 1990). Computer environments should be "socialized" so that learners respond to them as if they were empathetic tutors (Turkle, 1984). Ideally, such sophisticated and empathetic tutoring systems know when or how to diagnose learning "bugs", when and where to intervene, and how to provide motivational as well as cognitive support generally (Lepper & Chabay, 1988; Lepper & Gurtner, 1989).
Scope of this report
The Computer as Cognitive Support in Reading Difficulties My research into computer-mediated reading with grade school children with mild and severe reading disabilities is in the spirit of adaptive education first enunciated by Robert Glaser (1977). I aim at linking psychological and technological knowledge with educational practice to assist learning and to narrow individual differences in reading. Specifically, the concept of adaptive education derives from the psychological principle of compensation (Leong, 1993; Lundberg & Leong, 1986). Within the context of the multi-level and multi-component approach to reading and its disorders, compensation is conceptualized as emphasizing or enhancing one component of reading (for example, the phonological or morphological aspects of words, sentential comprehension) more than another component, in order to ameliorate decrements or deficiencies in the components. Compensation can be provided by parents, teachers or clinicians to enrich the learning situation; it can take the form of adaptive or improved
Using Microcomputer Technology
task properties; and it needs cognitive support systems generally, including the use of technology (Backman, 1985). These forms of compensation are not mutually exclusive; they reinforce one another interactively and their principles apply to the reading process, in accordance with the interactive-compensatory model of Stanovich (1980) THEORETICAL BASIS OF THE PROJECT Automaticity Principle and Immediacy of Feedback The series of studies using the (DECtalk) text-to-speech computer system attempt to provide on-line and computerized speech support to readers, as and when they need such help. This is accomplished by harnessing the interactive nature of the microcomputer and its capability for storing lexical and discourse materials for immediate or delayed retrieval. These capabilities readily lend themselves to studies of the speedy and accurate processing (automaticity principle) of words, and of units larger than words, that goes into enhancing reading comprehension. Drawing on works by Lesgold (1983), Perfetti (1985), Stanovich (1986) and others, the postulate of our studies is that reading comprehension, as reflected in answers to inferential questions, text recall, and summarization, relates to local processing levels of efficient and high-quality access of words, and of overall auding (listening to text) abilities. Individual differences in text comprehension can be traced by the efficiency with which children remember words just read, activate their naming codes, analyze their morphological relationship, and integrate the successive units (words, phrases, clauses), as they come along, into propositional form for interpretation. Furthermore, poor or low ability readers require more time to access words, but their processing is facilitated by context just as much as, if not more than, in the case of good readers; good or high ability readers can also be affected by context when their lexical access is slowed down (Perfetti, 1985; Stanovich, 1980; Stanovich & West, 1981; Stanovich, West, & Feeman, 1981; West, Stanovich, Feeman, & Cunningham, 1983). "Contextual Enrichment" Two
Computer Approaches
The principle of "contextual enrichment" has been used by George Miller and his colleagues (Gildea, Miller, & Wurtenberg, 1990; Miller & Gildea, 1987) to promote word knowledge through the use of interactive videodiscs; and by Leong and colleagues (Leong, 1992b, 1992c, 1992d; Leong & Mackay, 1993; Lock & Leong, 1989) in enhancing reading comprehension by putting to use the text-to-speech (DECtalk) computer system. There is, however, a basic difference in the Miller and Leong approaches. Miller and colleagues used sentence contexts in narratives and offered both visual and linguistic enrichment from videodisc technology to help young children learn words and their meaning. Leong's experiments in situ provided more precise explanations of, or substitutions for, difficult words and phrases on-line and via DECtalk speech, used as "instruments" (Stahl, !991) for the enhancement of reading comprehension. The purpose here is not to compare these, almost opposite approaches: they may be used for different purposes and may attain different goals. Rather, I will attempt to
Che Kan Leong
show what can be achieved with modest instrumentation, without the advantage of multi-faceted technology for multitasking. My rationale is to provide word knowledge both on-line and with immediate speech support, in order to enhance reading comprehension. While context may constrain word meaning, facilitation may not apply with great precision, and certainly not for low frequency words (McKeown, 1985; Perfetti, 1992; Schatz & Baldwin, 1986). Word knowledge may pose problems for poor readers, especially for those with below average decoding and segmentation abilities(Anderson & Davison, 1988). While unfamiliar words can impede reading comprehension, instructed words improve the recall of propositions (Omanson, Beck, McKeown & Perfetti, 1984), and instruction needs to be sustained and systematic. The better results of systematic instruction derive from a combination of explicit reference to meaning, learning through examples, learning through verbal contexts, and learning through the analysis of the hierarchical and relational aspects of derivation, inflection, and compounding of words (Beck, Perfetti, & McKeown, 1982; Jenkins & Dixon, 1983). In word learning, emphasis should be on multiple cues and multiple exposure of words to build up their phonological, morphological, syntactic, and semantic networks (Leong, 1992a; McKeown & Curtis, 1987; Stahl, 1991; Sternberg & Powell, 1983). EXPERIMENTAL STUDIES
Rationale of Computer-Mediated Reading There are good reasons for using the computer to assist reading; just as there are conceptual and methodological issues in how best to do this (Reinking and BridwellBowles, 1991). One main reason is that reading is a real-time language activity using all types of available linguistic information (Bierwisch, 1983). Lexical access and semantic encoding can be facilitated with an on-line approach using the microcomputer, interfaced with the text-to-speech (DECtalk) computer system, so as to provide immediate on-line reading and high-quality synthetic speech support and feedback of segments of words and of discourse. The on-line approach also takes into account that text comprehension is incremental and cumulative. The comprehension process incorporates the buffering of information from different linguistic units; the retrieving of old information; the purging of redundant information from working memory; and the integrating of new and old information from successive segments for propositional encoding (Jarvella, 1979). The computer interfaced with synthetic speech can be used effectively to guide readers to achieve a smooth integration of different discourse segments for accurate and automatic processing. Indeed, one dilemma in computer-mediated reading instruction is to maintain a balance between "basic-but-dull" word decoding and "complex-butengaging" text comprehending, and to make both tasks interesting and effective (Perfetti, 1983). The other reason is that for those students diagnosed as "backward", or gardenvariety poor readers in accordance with the "Simple View" of reading of Gough and Tunmer (1986), reading disorders could result from difficulties in decoding, in comprehending, or in both. One of the claims of the Simple View "may well be ... that skilled decoding combined with skilled listening must produce literacy" (Gough & Tunmer, 1986: 9). These are testable hypotheses. Some evidence of decoding and
Using Microcomputer Technology
listening comprehension interacting on reading comprehension is provided by Hoover and Gough (1990) in a longitudinal study of grade school children. Their results (based on a series of regression analyses) show that the linear combination of decoding and listening comprehension accounted for a substantial proportion of the variation of reading comprehension, with enhancement from the multiplicative effect of decoding and listening comprehension; furthermore it is shown that both components are needed. In the domain of computer assisted reading proper (without speech accompaniment), a report on grade 6 subjects shows that providing them with vocabulary learning on a computer screen could increase their reading comprehension (Reinking & Rickman, 1990). Further, poor readers were facilitated in their reading comprehension when the computer incorporated comprehension monitoring (Reinking, 1988). However, results from a recent study of computer-mediated reading comprehension are less sanguine, and emphasize the need to buttress computer reading with metacognitive activities, as well as the importance of working memory (Swanson & Trahan, 1992). These differing results will need to be further validated.
Text-to-Speech Conversion and Systems The emphasis placed on accurate and automatic access of word meaning and pronunciation as an aid in reading comprehension would make it appear that text-tospeech (TTS) computer systems could be put to use for providing bisensory feedback to assist readers. In general, computer speech production ranges, in a cascading manner, from stored samples of human speech in digitized form, with varying quality of speech, to sophisticated text-to-speech computer systems which incorporate "deeper" linguistic knowledge. Technical discussion of speech conversion can be found in several volumes (e.g., Allen, Hunnicutt & Klatt, 1987; Klatt, 1987; Witten, 1982). Overview of the DECtalk System Text-to-speech systems such as MITalk and DECtalk are generally defined as computer devices that analyze, synthesize, and convert "plain" or unrestricted text into fluent and high quality speech. These high-fidelity synthesized text-to-speech systems can analyze plain text, without having recourse to phonetic and prosodic markers for all linguistic information; they can synthesize the analyzed information to produce the output acoustic and articulatory waveforms in the form of fluent and highly intelligible speech (Leong, 1992c). The DECtalk system, together with its variant, the Swedish Infovox multilingual system, is the mainstay of the various research reports in a special issue on reading and spelling using text-to-speech computer technology (Leong, 1992b). The DECtalk device has a large vocabulary and makes use of analysis-by-synthesis principles to extract the underlying phonemic, morphemic, and syntactic representations of unrestricted text to produce synthetic utterances. The access to deeper linguistic knowledge in DECtalk, its range of speaking rate from 120 to 250 words per minute (WPM) (recent DECtalk PC card ranging from 120 to 550 wpm), and its 7 built-in voices 1 with "Perfect Paul" being the preferred mode, offer considerable possibilities for the project on hand. In the course of our investigation, a A recent DECtalk PC card has 9 voices.
Che Kan Leong
DECtalk program library with computer programs in Turbo Pascal routines was compiled by Lock and Leong (1989) to facilitate the tailoring of the hardware. Before they start using the DECtalk system with children, especially those with reading and spelling disorders, researchers need to ask and answer several pertinent questions. One question concerns the degree of intelligibility of DECtalk speech; another is whether or not the bisensory output of on-line text simultaneously with DECtalk speech is optimal for reading and.text comprehending. There are further issues: technical ones such as hardware and sottware configurations; conceptual and methodological ones such as the nature of "useable" text on-line, output of discourse materials, and other aspects. Some of these issues are discussed in subsequent sections; fuller discussions are provided in Leong (1992b, 1992c, 1992d). Study 1
Intelligibility of DECtalk Intelligibility and other aspects of text-to-speech systems with adults have been investigated by Pisoni and his colleagues in Indiana (Greene, Logan, & Pisoni, 1986; Greenspan, Nusbaum, & Pisoni, 1988; Ralston, Pisoni, Lively, Greene, & Mullennix, 1991); the quality of DECtalk synthetic speech is rated very highly by these researchers. For children, Olson, Wise and their colleagues in Colorado have shown that disabled readers did not differ from college students in the recognition accuracy of words spoken by DECtalk ("Perfect Paul" voice), and this accuracy rate of 94.5% differed only slightly from these children's perception of the same words spoken in natural speech (98.4%) (Olson, Foltz, & Wise, 1986). Furthermore, as shown in a long-term remedial reading program (Wise, Olson, Anstett, Andrews, Terjak, Schneider, Kostuch, & Kriho, 1989), DECtalk speech, being highly intelligible, is far superior to digitized speech. Subjects Expanding on the pioneering work of the Colorado group and drawing on the Lock and Leong (1989) DECtalk program library, M. Mackay and myself have similarly found, in two experiments with grade 6 students, a high level of intelligibility for DECtalk ("Perfect Paul" voice), as compared with human speech (Leong & Mackay, 1993). Our total sample consisted of 66 twelve-year-old subjects with no known hearing problems, who were randomly divided into two subgroups of 33, one for the DECtalk (DEC), and one for the human speech (Voice) modes of output of the same words and sentences. These 66 students were further divided (on the basis of the Canadian Tests of Basic Skills (King, 1982)) into three subgroups ("below average" (BA), "average" (AV), and "above average" (AA)) of 9, 12 and 12 students respectively for each listening condition. All subjects were given practice in listening to a 170-word sample passage "Shaggy Bear Tale" adapted from the DECtalk manual (Digital Equipment Corporation, 1984). Task and Procedure Experiment 1 adapted Durrell and Catterson's (1980) Listening Vocabulary subtest to assess the intelligibility of the DECtalk system as compared with human speech. In
Using Microcomputer Technology
this refined task with 60 words varying from two to three syllables and represented by 15 semantic categories, students in each mode of presentation were asked to listen to each of the 60 words outputted at random. They were required to match accurately and rapidly, by pressing a computer key, each lexical item with the corresponding superordinate or coordinate semantic category, shown (with three alternatives) in both pictorial and verbal forms. An example is the word FLOWER, going with the category of PLANTS (picture of a potted plant); CHEERFUL, with the category of HAPPY (smiling face).
Results The results show for DECtalk an overall matching accuracy of 88% and a mean response time of 2101 msec. For human speech, the overall matching accuracy was 96% and the mean response latency was 827 msec. There was no speed-accuracy trade-off A 2 (presentation mode) by 3 (reading level) ANOVA shows the expected significant main effect in favor of human speech (F (1, 60) - 36.01, p <.0001). The more refined latency measures in a similar analysis accentuated the difference in favor of human speech (F (1, 60) = 172.87, p <.0001). There is also a difference according to reading level (F (2, 60) = 5.93, s <.01) in response time measurements, the below average subgroup being significantly slower than the other subgroups. The overall trend of greater accuracy coupled with faster response latency is in the expected direction and may be observed in the scattergram in Figure 1.
Intelligibility Exp. l
6 Sui~ects
1 -
[] []
D 13
D i=I
[] O-
o o
El 0
o -1-
-1.5 -
-2.5 -
-3 -3.5
i -2.5
I "
! 1.5
Aocuurmcy Z . ~ = o r e s
Figure 1 - DECtalk intelligibility Experiment 1 (Leong & Mackay, 1993) scattergram with response accuracy Z scores plotted against latency Z scores.
Che Kan Leong
Subjects, Task and Procedure for Experiment 2 The sample for Experiment 2 consisted of the same 66 grade 6 students as in Experiment 1 with 33 randomly assigned to the DEC and the other 33 to the natural speech presentation modes. The students were asked to make accurate and rapid semantic classifications of the truth (T) or falsity (F) of 30 three-word sentences (15 true and 15 false), and another 30 six-word sentences (15 true and 15 false), output at random either by DECtalk or human speech. Examples of sentences for T/F decisions were: "Dogs can fly" (False); "New Year's day comes in January" (True); "Bees can sting" (True); and "An apple can never be red" (False). This sentence classification task was essentially the same as Manous, Pisoni, Dedina, and Nusbaum's (1985), which has been shown to be sensitive in studying language comprehension.
Results Results for the true/false (T/F) sentence classifications show an overall accuracy of 88% for DECtalk output and 93% for human speech. A 2 (presentation mode) by 3 (reading level) by 2 (sentence length) ANOVA of the correct "true" latency reponses with the last factor repeated shows a significant main effect for presentation mode only (F (1, 60) = 34.42, p <.0001) in favor of human speech. The ANOVA results for the correct "false" latency responses are in the same direction (F (1, 60) = 60.47, P < .0001 ).
Summary Discussion of DECtalk Intelligibility The two experiments, one using isolated words and the other short sentences with 66 children, as summarized above for Experiment 1 (Leong & Mackay, 1993) show that DECtalk speech compares well with human speech in intelligibility. The results confirm those found for adults by Pisoni and his colleagues (Greene, et al., 1986; Greenspan, et al., 1988), and for children by Olson and Wise (Olson et al., 1986). Given the expected higher performance of human speech in language perception and comprehension, DECtalk is highly intelligible and is far superior to digitized speech for research and instruction.
Study 2 DECtalk Training in Reading Prose Passages Given the intelligible nature and the motivational aspect of synthetic speech, the efficacy of on-line reading and simultaneous DECtalk auding of the same language materials was further tested with a different group of students reading twelve expository prose passages and answering open-ended inferential questions. The preand post-test experimental design was used with intervening training conditions of online reading plus on-line explanation of difficult words, or on-line plus DECtalk output and explanation of the same words in both modes. Full details of this Study 2 are given in Leong (1992c, 1992d).
Subjects A total of 67 students in 3 grades took part in the study, with 32 students in grade 6, 27 in grade 7 and 8 in grade 8. Their mean chronological age was 149.71 months with a standard deviation of 7.74 months. Their mean scaled general ability score on
Using Microcomputer Technology
the matrix E subtest of the British Ability Scales (BAS) (Elliott, 1983) was 106.64, with a standard deviation of 12.64. On the basis of the scaled aggregate Vocabulary and Reading Comprehension subtest scores of the Canadian Test of Basic Skills (King, 1982), the students in each grade were divided up into "above average" (AA) or "below average" (BA) readers. The median splits yielded these subgroups: 15 AA and 17 B A readers for grade 6, 11 AA and 16 BA readers for grade 7, and 3 AA and 5 B A readers for grade 8. A few subjects were subsequently lost because of the necessarily protracted experimentation and this explains the slight variations in the number of subjects in the different analyses.
Stimulus Materials Twelve expository prose passages of about 200 words each were adapted from various reading materials, including Young Canadian Readers, Young Children's Encyclopedia, and The Reader's Digest Reading Skill Builder Series. The revised prose passages, now all in one genre, were further reviewed by teachers of language arts and were estimated to be on the average at the grade 7 level of "comprehensibility", according to the computerized Writer's Work Bench (WWB) developed by the Bell Laboratories (Cherry, 1982). As an example, the passage Habits starts with "Habits may be a help or a hindrance", and presents the pros and cons of habitual activities. Another example is a passage entitled Acid Rain which begins with the main idea of "Experts acknowledge that much remains to be learned about acid rain. One question which remains unanswered is: Where, exactly, are the sources of acid-causing pollutants? ..." The short essay then leads the readers to explore whether or not industrial burning of fuels is such a specific source of pollution. Yet another passage entitled "City versus Country Living" compares and contrasts living in a city such as Toronto with living on a farm. The generation of the passages with output of words at 200 milliseconds per string made use of the DECtalk program library by Lock and Leong (1989). The 200-word length for each passage kept it to one screen scroll and the use of regular mixed print resembled off-line reading. Each passage was decomposed into word groups or pausal units corresponding approximately to underlying propositions. Excerpts from the passage on city and country living are shown below with the word group units in brackets (for illustrative purposes only, and not actually shown) to indicate pauses: [City versus Country Living] [Perhaps because I was born] [and brought up in the country,] [I have always wanted] [to live near the heart of a great city.] [Even now I look forward to the day] [when I can live again in Toronto.] [I wouM #ke to taste again] [the diversity of the city.] [I am not alone in Toronto] [even when I am alone.] [The city fortifies my mind.] ... In the generation of text, each phrasal or clausal unit before a pause was shown one at a time, appearing on the computer screen only when the space bar was depressed by the reader. In this way, the text as a whole would progressively appear on-line and would remain on the computer screen for the duration of the entire training and testing session. This self-paced approach would allow readers time to read and to go back and forth in comprehending the discourse materials.
Che Kan Leong
Almost simultaneously with the on-line text, the DECtalk speech of the same word groups was generated at 200 wpm with 160 ms aiter commas and 640 ms between sentences. All the twelve prose passages on the menu could be randomized and the computer program would provide for the choice of DECtalk auding only, on-line reading only, or on-line reading plus DECtalk speech. This third option (on-line-plusDECtalk reading) was used in the posttest phase. The program would also record for each subject the inspection time for each phrasal or clausal group for further reading time analyses.
Procedure- Pre-test, Training and Post-test In the pre-test phase of the study, the students were instructed individually to read rapidly and accurately on-line, and to aud simultaneously from DECtalk each prose passage once so that they would understand it. They were told that difficult words and sentence constructions would be explained in the training phase of the study. At the end of the simultaneous on-line reading and auding of each passage, they were asked to give short verbal answers to open-ended inferential questions aimed at ensuring their reading with understanding. These protocols were scored blind and independently on a five-point scale by the author and another judge according to their richness of ideas and propositional density. To the question "What does 'the heart of a great city' mean?" from the Toronto passage, 'rich' answers include: "lots of activities going on, people bustling about, going to work", as opposed to 'meagre' answers of the kind: "A big city", or "Downtown." The mean interrater reliability over the twelve prose passages was .92. The aggregate scores from all the answers for all the questions averaged over the two judges provided an estimate, for each student and for each passage, of the pre-training reading comprehension level. Training Procedure After the completion of the pre-testing in the form of a pre-reading of all the 12 prose passages, the students in each grade and at each reading level (above or below average) were randomly assigned to one of the two training conditions: On-line reading only or on-line reading plus DECtalk auding. The aim was to provide immediate and more precise word knowledge to promote reading comprehension. Approximately half of the students at each reading level were tested under each training condition; the same 12 prose passages generated at random were used for training and post-testing During the training phase, where each student read each passage once, pre-selected difficult words (with their derivations and base forms) or groups of words were highlighted on the computer screen for mandatory explanation. These highlighted words were explained at the press of any key, either in on-line mode, or in on-line plus DECtalk speech mode, as the full passage gradually unfolded. Some examples from the segment on city and country living excerpted above are as follows: brought up [bring up]: means grew up, reared; look forward to means welcome; diversity [diverse, diversification] means variety, a great many things; fortifies [fortify, fort, fortification] means makes strong, enriches... It should be emphasized that for technical reasons, all the targeted or highlighted words were pre-selected. While this pre-selection did not offer the flexibility of selfselection by the students, it ensured that the same words were targeted for explanation.
Using Microcomputer Technology
Furthermore, the explanations in the two modes were limited to the local context (within the immediate surrounding phrase, clause or sentence) and global context (within the text as a whole) of the particular passage and would also need refinement. Nevertheless, the DECtalk computer-mediated reading seemed to work well and was sufficiently motivating for the students. At the post-test phase following the training session for each of the 12 passages, the students were asked, both on-line and via DECtalk speech, the same inference questions; their transcribed verbatim answers were scored blind by the same two judges, according to the same criteria as in the pre-testing. The students' answers provided estimates of their performance alter vocabulary training either on-line or online plus DECtalk speech.
Results The pre-test and post-test reading comprehension scores for the 12 passages were first analyzed with multivariate analyses of variance for the total of 59 grades 6 and 7 students dichotomized as above and below average readers. The MANOVA results with Wilks' lambdas show significant differences for reading level (F (1, 55) = 15.60, p -.000), and for training effects (collapsing across the two modes) (F(1, 55) = 165.07, p = .000. There were significant interaction effects for reading level x passage (F (11, 45) = 3.263, p = .002), for grade x passage (F (11, 45) = 3.153, p = .003), and for passage variation (F (11, 4 5 ) = 196.45, p = .000). The overall significant difference between post-testing and pre-testing across the two modes must therefore be interpreted with caution. While grade and reading level accentuated the differences, the "readability" of the passages probably contributed more to the individual variations. Following the MANOVA results, univariate analyses of variance with 3 (grade) x 2 (reading level) and 2 (pre- and post-training) as factors were carried out for each prose passage. Of the 12 passages, 10 showed highly significant training effects when on-line and on-line plus DECtalk modes were collapsed. Only No. 8 Raising Children and No. 9 Acid Rain did not show any training effect. However, when the modes of training were compared at the post-testing phase in a 3 (grade) x 2 (reading level) x 2 (training mode) ANOVA, only two passages (No. 3 City versus Country Living and No. 11 What Science has done to our Food) emerged, with highly significant results in favor of the on-line plus DECtalk speech. For the Toronto passage (No. 3), the ANOVA results yielded F (1, 55) - 5.60, p = .022. The main effects for grade and reading level were also significant (F (2, 55) = 8.41, p = .001 & F (1, 55)= 9.90, p = .003 respectively). There were no interaction effects. Furthermore, when the pre-test comprehension score was used as the covariate to make adjustments for initial differences, similar results were obtained. All the main effects were significant: For mode of training (F (1, 54) = 5.78, p = .020), for grade (F (2, 54) = 8.20, p = .001), for reading level (F (1, 54) = 6.49, p = .014). There was also a grade x training mode interaction effect (F (2, 54) = 3.89, p = .026) in the ANCOVA analysis. For the science passage (No. 11), the on-line plus DECtalk mode as compared with the on-line presentation was highly significant (F (1, 54) = 5.20, p = .027). There was also a grade x training mode interaction effect (F (2, 54) -- 3.20, p = .048). When adjustment was made using the pre-testing as covariate the ANCOVA results upheld this main effect (F (1, 53) = 5.79 p = .020); there was no interaction effect. Scrutiny of the results for both passages suggest that it was the above average
Che Kan Leong
subgroups that gained the most from the DECtalk and on-line combined presentation modes and that the small number of grade 8 students might have accentuated the grade by training mode effect.
Discussion Taken as a whole, Study 2 suggests that given immediate training in word knowledge within local and global contexts, the small number of grades 6, 7 and 8 students showed improvement in their reading comprehension in 10 of the 12 prose passages, as evinced by their verbal answers to inferencing questions. Caution must be exercised when interpreting this gain as a result of the individualized computer training. Factors such as regular class teaching and literacy exposure at home also contributed to the growth in reading comprehension and may have confounded the present results. Also, the efficacy of the DECtalk training was not unequivocal, as indicated by the finding of significant differences in the combined on-line plus DECtalk training over the on-line reading alone for only two of the twelve expository passages. These training results are less straightforward than those of computer-mediated reading, as reviewed by Reinking and Bridwell-Bowles (1991). The many reasons that might explain the marginal facilitation of DECtalk speech interfaced with on-line reading include: the nature of the stimulus materials, the technique of their on-line generation, the role of working memory and prior knowledge, and the need for metacognitive activities prior to reading. Some of the salient conceptual and methodological issues in computer-mediated reading are discussed by Leong (1992c, 1992d; 1992d with particular reference to text-to-speech computer systems). There is some evidence that metacognition may contribute to improved computer-mediated reading in poor readers (Reinking & Schreiner, 1985), but there is also evidence that working memory, more than metacognition, could play a role (Swanson & Trahant, 1992). Study 3 was an attempt to take some of these factors into account in DECtalk training for the enhancement of comprehension. Study 3
Further DECtalk Training in Reading with Expanded Conditions Taking into account the above findings and observations, Study 3 made use of a similar logic and DECtalk computer programming (Lock & Leong, 1989), in an expanded investigation with added experimental conditions, including a discussion component. The focus of this computer-mediated reading plus DECtalk speech reading experiment was less on comparing the two modes of presentation than on the effectiveness of the on-line reading plus DECtalk auding conditions and their interaction with expository language materials in grades 4, 5 and 6 readers. Subjects The sample consisted of 64 grade 4 students, 68 grade 5 students and 60 grade 6 students from two schools for a total of 192 subjects. The mean chronological ages in months with standard deviations were respectively: Grade 4 (M = 119.81, SD = 5.10), grade 5 (M = 130.28, SD = 4.55), and grade 6 (M = 143.77, SD = 4.79). The mean scaled general ability scores on the Matrix E subtest of the British Ability Scales (BAS)
Using Microcomputer Technology
(Elliott, 1983) for the 192 students were respectively: grade 4 (M = 99.41, SD = 11.33), grade 5 (M = 108.81, SD - 13.00), and grade 6 (M = 108.60, SD = 13.46). The general ability was significantly different overall ( F (2, 189) = 11.61, p = .0000); multiple comparisons show that the differences were in grade 4 vs. grades 5 and 6, but not between grades 5 and 6.
T a s k s - Prose Passages and their Comprehensibility Four of the more discriminating expository passages (Habits, City versus Country Living, What Science has done to Our Food, and Aging) from Study 2 were further modified to ensure greater cohesion of discourse (Halliday and Hasan, 1976) and more explicit idea units expressed as arguments and relations (Kintsch, 1974; Kintsch & Keenan, 1973). These four revised passages averaged 160 words (13 sentences) in length, and were estimated to be at the mid- or upper grade 6 reading level as to comprehensibility, according to the computerized Writer's Work Bench (Cherry, 1982). The same four prose passages were further rewritten to yield "simpler" passages (for reasons explained later, in the section on experimental procedure). These "easier" or simpler (Condition S) passages averaged 180 words in 14 sentences with a reading level of low grade 6 (Cherry, 1982). In assessing the comprehensibility of prose passages and in making them "simple", it should be recalled that current critical psycholinguistic studies of language processing emphasize many sources of language complexity and eschew a single metric of complexity (see Davison & Green, 1988, for research reports). Furthermore, various kinds of lexical, phrasal, clausal and sentential linguistic information relate to one another within local and global contexts. This intricate relationship provides the linguistic features of a text, and these in turn interact with the readers' characteristics, such as: knowledge of phonological and morphological processing of words and competence in parsing syntactic structures; prior knowledge of materials read; processing strategies; and other factors. The interactions between text features and reader characteristics are important determinants of comprehensibility (Anderson & Davison, 1988). The passage "simplification" attempted to incorporate operationally the psycholinguistic notions discussed by Davison and Green (1988), and evolved around two principles. One principle is that of the instrumentalist view of substituting better known or more frequently used words for less known or low frequency words to improve reading comprehension (see Stahl, 1991). For example, the original Habits passage contained these segments: "Habits may be a help or a hindrance... Habits are more than just conveniences or inconveniences..." The more difficult words "hindrance", "conveniences" and "inconveniences" were replaced and the sentences rewritten as: "Habits may help or may get in the way... Habits do more than just make us feel comfortable or uncomfortable..." The second principle emphasizes collocation or relative meaning of words in particular situations by transforming figurative language to "plain" language. In the City versus Country Living passage the sentences: "I would like to taste again the diversity of the city... The city fortifies my mind" were transformed into: "I would like to be able to do again a great many things in the city... The city makes me feel strong." It should be emphasized that the passage transformation attempted to maintain a sentence structure that was similar to that of the original (unsimplified) passages, with
Che Kan Leong
similar segmentation of syntactically defined phrases or clauses (pausal units), similar event-chains in the exposition and similar pros and cons in the arguments. This process should yield equivalent passages except for the level of word knowledge, so as to better test the contribution of such knowledge to prose comprehension.
Related Tasks Several tasks that have been shown in the literature to affect reading comprehension were also given to the 192 students, either in small groups or individually.
1. Vocabulary. On the Vocabulary subtest of the Canadian Test of Basic Skills (CTBS) (King, 1982) the means and standard deviations for grades 4, 5 and 6 were respectively: 26.84 (9.11), 29.69 (4.26) and 29.10 (7.18); overall, there was no difference among the three grades ( F = (2, 189) = 2.92, p = .056). On the basis of the CTBS, the students in each grade were divided up into "above average" (AA) and "below average" (BA). The splits yielded these subgroups: 41 AA and 23 BA readers in grade 4, 34 AA and 34 BA readers in grade 5, and 34 AA and 26 BA readers in grade 6. 2. Word Reading. On the Wide Range Achievement Test Revised (WRAT-R) (Jastak and Wilkinson, 1984) for reading, the means and standard deviations for grades 4, 5, and 6 were respectively: (100.86 (13.86), 104.69 (8.42), and 105.80 (12.36)). There was marginal difference ( F = (2, 189) = 3.10, p = .048) but multiple comparisons found no pairwise differences between the grades. 3. Metacognition. The Index of Reading Awareness (IRA) by the Michigan group of Scott Paris and his colleagues (Jacobs and Paris, 1987) was used to assess the students' metacognition about reading. The IRA consists of 20 items with graded responses of 0, 1 and 2; it measures the following aspects of metacognition: "Evaluation", "Planning", "Regulation", and "Conditional Knowledge". For the students in grades 4, 5 and 6, the means and standard deviations on IRA were respectively: 25.42 (3.76), 29.63 (3.32), and 29.03 (3.21). There was overall difference ( F = (2, 189) = 28.34, p =.0000), but the differences were accentuated in grade 4 as compared with grades 5 or 6. 4. Working Memory Span. Working memory was assessed on a modified Working Memory Span Test by Swanson (Swanson, 1992; Swanson & Trahan, 1992), which in turn was based on that of Daneman and Carpenter (1980). In this task, subjects were asked to listen to sets of randomly arranged two, three, four, or five unrelated declarative sentences. After the oral presentation of each doublet, triplet, quartet or quintet set of sentences, subjects were asked comprehension questions, one for each sentence set (the probe not being related to the last sentence); they were then required to record the very last word of each of the sentences in that set. This task has been shown to play an important role in integrating successive ideas or propositions in textual materials, and is more predictive of reading comprehension than the traditional digit span tasks (Daneman, 1991; Daneman & Carpenter, 1980; Torgesen, Kistner, & Morgan, 1987). The overall proportions for correct answers on this task for the grades 4, 5, and 6 students were: 64.39 (16.05), 76.00 (13.84) and 74.92 (15.80). These differences were significant ( F = 2, 189) = 11.43 (p = .000). Multiple comparisons show that the differences were in grades 4 vs. 5 and 4 vs. 6, when pairwise compared.
Using Microcomputer Technology
Summarizing the students' performance on the above tasks as regards possible effects on their reading comprehnsion, there was no overall difference among the three grades in vocabulary and word knowledge. There was significant difference in metacognition (as measured by the Paris Index of Reading Awareness) and in working memory span. Pairwise comparisons further show that these differences were accentuated between grades 4 and 5 and 4 and 6.
Procedure For the main experimentation, the 192 grades 4, 5 and 6 students were seen individually by the present author and three trained assistants every school day over a period of four months in the second half of the school year. In two sessions, several weeks apart, the children read the expository language materials (generated in a similar manner as in Study 2) on-line, while simultaneously listening to DECtalk speech. The on-line reading and simultaneous DECtalk auding was carried out under four experimental conditions: (1) On-line reading and DECtalk auding of unsimplified passages with no explanation of difficult words (Condition OD); (2) On-line reading and DECtalk auding of unsimplified passages plus explanation of difficult words in both modes (Condition ODE); (3) On-line reading and DECtalk auding of unsimplified passages plus explanation of difficult words in both modes plus short discussion based on pre-assigned questions (metacognitive activities) pertaining to each passage prior to reading and auding (Condition ODEM); and (4) On-line reading and DECtalk auding of simplified passages with no explanation of words (Condition S). The 64, 68 and 60 students in grades 4, 5 and 6 were randomly assigned to each of the above experimental conditions as follows: For grade 4: OD (13), ODE (16), ODEM (16) and S (19); for grade 5: OD (16), ODE (17), ODEM (18) and S (17); and for grade 6: OD (15), ODE (15), ODEM (15) and S (15). Prior to the experiment, the students were told to read and aud the language materials accurately and rapidly so as to understand them. They were then asked to: (a) answer verbally inferential questions, and (b) summarize verbally each passage in their own words. Each inference question was asked on-line and via DECtalk with the appropriate passage remaining on-screen in full view of the subject. The verbal answers were typed online by the experimenter for storage and later analysis. For the summarization, the subjects could further review the passages online before giving their summaries verbally. At that point, the screen would scroll over, and the summaries typed by the experimenter onto the hard disk would be displayed on-line for further review by the subjects. The hypotheses to be tested were that one should encounter, among students of different ages, main effects of grade on comprehension; on passage comprehensibility; and on experimental conditions, with ODEM > ODE > OD/S. The expectation was that, other things being equal, the incorporation of explanation of difficult words combined with prior discussion should enhance reading comprehension, as assessed by inferential questions and summarization. The ODE condition in turn should facilitate reading comprehension more than reading on-line and DECtalk auding (OD) without
Che Kan Leong
explanation of difficult words; finally, the OD condition might be similar to the simplified reading and auding condition (S). Results
As in Study 2, the answers to inferential questions and the summaries were scored blind by two judges according to the criteria of richness of ideational or propositional units in relation to the discourse materials. Verbosity, repetition and intrusion of extraneous ideas were penalized; while amplification and analysis of rhetorical issues were seen as "knowledge transforming" (Bereiter & Scardamalia, 1987) and given bonus points. The interrater agreement was .93. There were two sets of comprehension measures, scored according to the above criteria for each passage: graded answers to inferential questions (Q scores) with a maximum score of 20, and summary protocols (relative to propositional units; S scores) with a maximum score of 40. The mean scores and standard deviations for the 192 grades 4, 5 and 6 below average and above average readers, for reading on-line and auding from DECtalk the four prose passages under the four experimental conditions, are shown in Figures 2 and 3 for the two response modes. A 3 (grade) by 2 (reading level) by 4 (experimental condition) by 2 (response mode: answer (Q)/summaries (S) with these scores converted to percentages correct) ANCOVA with the last factor repeated was carried out. The covariates were general ability (BAS), metacognition (IRA), and working memory span (WKM). There were highly significant ANCOVA differences for grade (F (2, 165) = 7.31, p =.001), for reading level (F (1,165) = 11.75, p = .001), and for response mode in favor of answers to inferencing questions (F (1, 168) = 599.21, p =.000), but not for the different experimental conditions (F (3, 165)= .14, p = .93). Similar patterns were found when the analyses were carried out separately for the inference (Q) measures and for the summarization (S) measures. Discussion
The results of the main analyses outlined above confirmed the expected difference between the performance of younger and older students; this difference was shown mainly between grades 4 and 5 and grades 4 and 6, and not so much between grades 5 and 6. It is likely that the ten- to twelve-year age period might signal a change in the way students read textual materials, at least on the computer, and answer inferential questions. As to the differences in the passages read, these could imply a number of factors, acting on, and inter-acting with one another. Among these are the subtle linguistic and conceptual differences presented by the different prose passages, even though they were judged to be at similar levels of comprehensibility. As related to these differences, one could conceptualize to be the nature of the comprehension tasks, as evinced by the answers to inferential questions and summarization. From the protocols and from discussions with both the students and the teachers, it became clear that paraphrasing and summarization did not receive much attention in the course of instruction as compared with composition. Inspection of the scores and protocols of the students confirms the lack of discrimination of the summarization endeavour both quantitatively, as shown by the small standard deviations in relation to the means, and qualitatively, as demonstrated by the "knowledge telling" strategy (Bereiter & Scardamalia, 1987) used by most of the
Using Microcomputer Technology
Aggregate Scores
Grade 4
Grade 5
Grade 6
Figure 2 - Study 3 means and standard deviations of total scores (maximum 20) of open-ended answers to inferencing questions for Grade 4 (n = 64), Grade 5 (n = 68), and Grade 6 (n = 60) readers by reading level (dark bars for below average & hatched bars for above average), and by 4 on-line reading and DECtalk auding experimental conditions.
Aggregate Scores
................................................................................................. T................... T........................................................ "T....
25 I--
Grade 4
Grade 5
Grade 6
Figure 3 - Study 3 means and standard deviations of total scores (maximum 40) of summaries of prose passages for Grade 4 (n = 64), Grade 5 (n = 68), and Grade 6 (n = 60) readers by reading level (dark bars for below average & hatched bars for above average), and by 4 on-line reading and DECtalk auding experimental conditions.
Che Kan Leong
number students in simply providing truncated versions of the passages. Only a small number of students attempted an elaborate representation of the arguments or used the "knowledge transforming" strategy (Bereiter and Scardamalia, 1987). Future work on computer-mediated reading will need to take into account more of the linguistic variables, the characteristics of readers and their processing strategies. What Study 3 with its relatively large sample size has also shown is that computermediated reading, with or without DECtalk, may not be superior to off-line reading. While the bisensory presentation of language materials could be beneficial to some students, it could also consume more real time and could engender side effects. Similar observations, from a different perspective, of computer-mediated reading with students with learning disabilities have also been made by Swanson and Trahan (1992). Further, Study 3 did not find benefits from metacognitive activities in the form of short discussions prior to reading and DECtalk auding. Whether this was due to the nature of the discussion (student interacting with the computer), or the added time required (and perhaps other factors as well) was not clear. Also, a higher performance on the Reading Awareness Index did not appear to contribute to reading and auding comprehension. These results seem to be in line with those of Swanson and Trahan (1992); however, they are at some variance with those obtained by Salomon, Globerson and Guterman (1989) on the "metacognitivelike guidance" provided by their computerized "Reading Partner" for grade 7 readers. It is likely that Salomon et al.'s emphasis on intellectual partnership with the computer through modelling, activation of specific reading principles, and repeated presentation of externalized, metacognition-like questions carried these activities to a deeper level than the ones attempted in Study 4 here. Moreover, while Swanson and Trahan stress the importance of working memory span, the evidence was inadequate here, perhaps because of the relatively short passages read. No clear-cut advantages for the different experimental conditions from Study 3 were found for the 192 students, working individually on the DECtalk computer system under rather rigorous laboratory conditions; this points to the complexity of computerized reading comprehension training. Given the need for DECtalk computer programming, the "unnaturalness" of reading on-line, and the complexity of reading and listening comprehension, the present author would agree with the conclusion of Swanson and Trahan (1992) that the pros and cons of computer-mediated reading would have to be evaluated seriously. This need for careful appraisal does not imply that computer-mediated reading could not be helpful. What researchers should do is to specify the conditions under which the approach works well, singling out the kinds of students, reading materials and other variables that are most amenable to this technology (see Reinking & Bridwell-Bowles, 1991). It could well be that computer-mediated reading with DECtalk speech support works best in promoting phonological and morphological knowledge of words through segmental analysis of different word parts and sublexical units such as onsets and rimes (see the research reports in Leong, 1992b; Wise, 1992). This segmental analysis-by-synthesis could also be combined with automaticity training (Jones, Torgesen, & Sexton, 1987). The much larger and richer component of reading comprehension incorporates word knowledge, parsing of segments of discourse, merging old and new information in working memory, and so on; the integration of all
Using Microcomputer Technology
these aspects may make the processes involved in this component not readily amenable to clear-cut quantitative analyses. GENERAL DISCUSSION Promises and Issues of DECtalk Text-to-Speech Computer System
Promises The warning above notwithstanding, the DECtalk text-to-speech computer system is a sophisticated and useful device for auding, or listening to text, with the purpose of gaining knowledge. The system has been used with advantage by the Colorado group of Richard Olson and Barbara Wise in their pioneering long-term research and remediation studies to further word recognition (phonological coding) and spelling; by Rod Barron in Guelph, Ontario, in promoting "proto-literacy", or print-sound relationship, in young children; by Ake Olofsson in Umea, Sweden, using the variant multilingual Infovox system to help students in decoding and morphologically analyzing text; in addition, there are my own modest efforts (see Leong, 1992b). Furthermore, DECtalk synthesized speech can now be integrated with a scanner, e.g. the Intelligent Character Recognition (ICR) sol, ware, in recent advances such as BookWise and the Reading AdvantEdge by Xerox Imaging Systems. These integrated systems can scan electronically printed pages, "recognize" characters, and convert them into synthetic speech for both reading and auding. Students can request the pronunciations and meanings of words, reread passages and generally get help in reading and spelling. A recent report on the BookWise system, incorporating DECtalk as an adjunct to regular instruction, indicates "positive remediation benefits" for a small number of middle-year students in a school for dyslexics; the tests were predicated on the Slingerland (1981) multi-sensory remediation approach (Elkind, Cohen, & Murray, 1993). Even so, since the "gains" in reading comprehension varied, they would need to be verified because of possible confounding effects from regular instruction. (For instance, unexpected gains in reading speed and increased attention span such as reported by Elkind et al. might have contributed to the overall benefit). Such serendipitous results are further reminders of the role of the motivational and transactional factors in computer-mediated instruction that we discussed earlier (see also Lepper & Chabay, 1988; Margalit, 1990). [SSlleS
Several conceptual and methodological issues in the generation of text materials need continued attention; these include rapid serial visual presentation (RSVP) (Young, 1984), the moving window (TEXTWINDOW) (Jarvella, Lundberg, & Bromley, 1989), as well as other techniques. (Some of these salient issues are discussed by Leong (1992c)). As to moving from text to speech, there are exciting developments in hypertexts, or nonlinear texts with associated networks, used to promote literacy development in the broad sense and to organize ill-structured ideas in a coherent framework for knowledge exploration (see Barrett, 1988; Conkin, 1987, Swartz & Russell, 1989, for details). An extension of hypertexts is the concept and the integrative system of hypermedia, in which texts, graphics, speech, and interactive computer programs can be interfaced in a multi-dimensional space to provide multiple
Che Kan Leong
representations of knowledge. The hypermedia multiple exploration of knowledge forms the basis for the cognitive flexibility theory of Spiro and Jehng (1990) in literacy comprehension; it also provides the scaffolding that helps inexpert learners move toward expert learning (see Lehrer, 1992).
Advanced Knowledge Acquisition The general concept of using computer technology as a scaffolding for inexpert learners can be traced back to Vygotsky's notions (1934/1986, 1978) of the "zone of proximal development" and of the social reconstruction of knowledge. Both these notions were the basis of the computerized Reading Partner project of Salomon et al. (1989) discussed earlier; the latter further emphasized computers as tools "to think with" (see Lehrer, 1992; Webb & Shavelson, 1985, for more details). More provocative proposals for multiple modes of learning, and for the social reconstruction of knowledge come from the MIT Media Laboratory on Epistemology and Learning (Harel & Papert, 1991). If the MIT group's proposal of constructionism for communal or social construction of knowledge is seen as too radical and less applicable to education, the sophisticated knowledge media system known as CSILE (or Computer-Supported Intentional Learning Environments) developed by the Scardamalia and Bereiter team in Toronto is directly applicable to classroom instruction (Scardamalia, Bereiter, Brett, Burtis, Calhoun, & Smith-Lea, 1992; Scardamalia, Bereiter, McLean, Swallow, & Woodruff, 1989). CSILE supports learning by building a collection of knowledge bases as a database in the form of texts and graphics and stores the thoughts, ideas, problems and goals constructed by students to be shared by all. The emphasis is on the active production and use of knowledge and on activities by students, with the media as the intelligent tutoring system, capable of representing knowledge in different ways. Furthermore, students actively participate in learning, share their contributions and in doing so, move to higher levels of learning and control of learning. In this ambitious educational project, computer technology is used to distribute knowledge and to maximize the contributions from both individual learners as partners and teachers as experts. The supportive CSILE computer environment goes considerably beyond the present author's modest 'micro '-project in that it recognizes the situated nature of learning and the importance of social interaction and the social construction of knowledge (Collins & Brown, 1988). Furthermore, the use of supportive educational technology in distributing and constructing knowledge maximizes active learning on the part of individual learners as well as the use of teacher expertise. Further support for the notion of a community of learners is found in the work of Campione, Brown and Jay (1992), who use the computer technology in a cooperative atmosphere to encourage students to develop skills of plausible reasoning in an integrated curriculum. REFERENCES
Allen, J., M. S. Hunnicutt, and D. Klatt, 1987. From text to speech: The MITalk system. New York: Cambridge University Press.
Using Microcomputer Technology
Anderson, R. C., and A. Davison, 1988. Conceptual and empirical bases of readability formulas. In: A. Davison and G. M. Green, eds., Linguistic complexity and text comprehension: Readability issues reconsidered, 23-53. Hillsdale, NJ: Lawrence Erlbaum. Backman, L., 1985. Compensation and recoding: A framework for aging and memory research. Scandinavian Journal of Psychology 26:193-207. Barrett, E., ed., 1988. Text, conText, and HyperText: Writing with and for the Computer. Cambridge, MA: MIT Press. Beck, I. L., C. A. Perfetti, and M. G. McKeown, 1982. Effects of long-term vocabulary instruction on lexical access and reading comprehension. Journal of Educational Psychology 74: 506-521. Bereiter, C., and M. Scardamalia, 1987. The psychology of written composition. Hillsdale, NJ: Lawrence Erlbaum. Bierwisch, M., 1983. How on-line is language processing. In: G.B. Flores d'Arcais and R. J. Jarvella, eds., The process of language understanding, 113-168. New York: John Wiley. Campione, J. C., A. Brown, and M. Jay, 1992. Computers in a community of learners. In: E. De Corte, M. C. Linn, H. Mandl and L.Verschaffel, eds., Computer-based learning environments and problem solving, 163-188. New York: Springer-Verlag. Cherry, L. L., 1982. Writing tools. IEEE Transactions on Communication COM-30: 100-105. Cognition and Technology Group at Vanderbilt, 1992. The Jasper Series as an example of anchored instruction: Theory, program description, and assessment data. Educational Psychologist 27:291-315. Collins, A., and J. S. Brown, 1988. The computer as a tool for learning through reflection. In: H. Mandl and A. Lesgold, eds., Learning issues for intelligent tutoring systems, 1-18. New York: Springer-Verlag. Conkin, J., 1987. Hypertext: An introduction and survey. 1EEE - Computer 20:17-41. Daneman, M., 1991. Individual differences in reading skills. In: R. Barr, M. L. Kamil, P. B. Mosenthal, and P. D. Pearson, eds., Handbook of reading research 2, 512538. New York: Longman. Daneman, M., and P. A. Carpenter, 1980. Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior 19: 450-466. Davison, A., and G. M. Green, eds., 1988. Linguistic complexity and text comprehension: Readability issues reconsidered. Hillsdale, NJ; Lawrence Erlbaum. Digital Equipment Corporation, 1984. DECtalk DTC01 Owner's Manual (2nd ed.). Maynard, MA: Author. Durrell, D. D., and J. H. Catterson, 1980. Durrell analysis of Reading difficulties (3rd ed.). New York: Harcourt Brace Jovanovich. Elkind, J., K. Cohen, and C. Murray, 1993. Using computer-based readers to improve reading comprehension of students with dyslexia. Annals of Dyslexia 43: 238-259. Elliott, C. D., 1983. The British Ability Scales. Windsor, UK: NFER-Nelson. Gildea, P. M., G. A. Miller, and C. L. Wurtenberg, 1990. Contextual enrichment by videodisc. In: D. Nix and R. Spiro, eds., Cognition, education, and multimedia: Exploring ideas in high technology, 1-29. Hillsdale, NJ: Lawrence Erlbaum. Gtaser, R., 1977. Adaptive education: Individual diversity and learning. New York: Holt, Rinehart and Winston.
Che Kan Leong
Gough, P. B., and W. E. Tunmer, 1986. Decoding, reading, and reading disability. Remedial and Special Education 7: 6-10. Greene, B. G., J. S. Logan, and D. B. Pisoni, 1986. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems. Behavior Research Methods, Instruments and Computers 18:100-107. Greenspan, S. L., H. C. Nusbaum, and D. B. Pisoni, 1988. Perceptual learning of synthetic speech produced by rule. Journal of Experimental Psychology: Learning, Memory, and Cognition 14:421-433. Halliday, M. A. K., and R. Hasan, 1976. Cohesion in English. London: Longman. Hard, I., and S. Papert, eds., 1991. Constructionism. Norwood, NJ: Ablex. Hativa, N., and A. Lesgold, 1991. The computer as a t u t o r - can it adapt to the individual learner? Instructional Science 20: 49-78. Hoover, W. A., and P. B. Gough,1990. The simple view of reading. Reading and Writing: An Interdisciplinary Journal 2: 127-160. Jacobs, J. E., and S. G. Paris, 1987. Children's metacognition about reading: Issues in definition, measurement, and instruction. Educational Psychologist 22: 255-278. Jarvella, R. J., 1979. Immediate memory and discourse processing. In: G. H. Bower, ed., The psychology of learning and motivation: Advances in research and theory 13: 379-421. New York: Academic Press. Jarvella, R. J., I Lundberg, and H. J. Bromley, 1989. How immediate is language understanding? Investigating reading in real time. Reading and Writing: An Interdisciplinary Journal 1: 103-122. Jastak, S. and G. S. Wilkinson, 1984. The Wide Range Achievement T e s t - Revised: Administration manual. Wilmington, DE: Jastak Associates. Jenkins, J. R. and R. Dixon, 1983. Vocabulary learning. Contemporary Educational Psychology 8:23 7-260. Jones, K. M., J. K. Torgesen and M. A. Sexton, 1987. Using computer guided practice to increase decoding fluency in learning disabled children: A study using the Hint and Hunt I program. Journal of Learning Disabilities 20: 122-128. King, E. M., ed., 1982. Canadian tests of basic skills: Multilevel edition 9-12/Forms 5 and 6. Toronto: Nelson. Kintsch, W., 1974. The representation of meaning in memory. Hillsdale, NJ: Lawrence Erlbaum. Kintsch, W. and J. M. Keenan, 1973. Reading rate and retention as a function of the number of propositions in the base structure of sentences. Cognitive Psychology 5: 257-274. Klatt, D. H., 1987. Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America 82: 738-793. Lehrer, R., ed., 1992. New directions in t e c h n o l o g y - Mediated learning. [Special feature]. Educational Psychologist 27: 287-404. Leong, C. K., 1992a. Cognitive componential modelling of reading in ten- to twelveyear-old readers. Reading and Writing: An Interdisciplinary Journal 4: 307-326. Leong, C. K., ed., 1992b. Reading and spelling with text-to-speech computer systems [Special issue]. Reading and Writing: An Interdisciplinary Journal 4/2: 95-229. Leong, C. K., 1992c. Introduction: Text-to-speech, text, and hypertext: Reading and spelling with the computer. Reading and Writing: An Interdisciplinary Journal 4: 95105.
Using Microcomputer Technology
Leong, C. K., 1992d. Enhancing reading comprehension with text-to-speech (DECtalk) computer system. Reading and Writing: An Interdisciplinary Journal 4: 205-217. Leong, C. K., 1993. Towards developing a framework for diagnosing reading disorders. In: R. M. Joshi and C. K. Leong, eds., Reading disabilities: Diagnosis and component processes, 85-131. Dordrecht: Kluwer Academic Publishers. Leong, C. K. and M. Mackay, 1993, May. Listening to synthesized speech and reading on-line. Paper presented at the Annual Conference of the Canadian Psychological Association, Montreal, Canada. Lepper, M. R. and R. W. Chabay, 1988. Socializing the intelligent tutor: Bringing empathy to computer tutors. In: H. Mandl and A. Lesgold, eds., Learning issues for intelligent tutoring systems, 242-257. New York: Springer-Verlag. Lepper, M. R. and J.-L. Gurtner, 1989. Children and computers: Approaching the twenty-first century. American Psychologist 44:170-178. Lesgold, A. M., 1983. A rationale for computer-based reading instruction. In: A.C. Wilkinson, ed., Classroom computers and cognitive science, 167-181. New York: Academic Press. Lock, S. and C. K. Leong, 1989. Program library for DECtalk text-to-speech system. Behavior Research Methods, Instruments, and Computers 21: 394-400. Lundberg, I. and C. K. Leong, 1986. Compensation in reading disabilities. In: E. Hjelmquist and L.-G. Nilsson, eds., Communication and handicaps: Aspects of psychological compensation and technical aids,. 171-190. Amsterdam: NorthHolland. Manous, L. M., D. B. Pisoni, M. J. Dedina and H. C. Nusbaum, 1985. Comprehension of natural and synthetic speech using a sentence verification task. Research on Speech Perception Progress Report No. 11. Bloomington, IN: University of Indiana Speech Research Laboratory. Margalit, M., 1990. Effective technology integration for disabled children: The family perspective. New York: Springer-Verlag. McKeown, M. G., 1985. The acquisition of word meaning from context by children of high and low ability. Reading Research Ouarterly 20: 482-496. McKeown, M. G. and M. E. Curtis, eds., 1987. The nature of vocabulary acquisition. Hillsdale, NJ: Lawrence Erlbaum. Miller, G. A. and P. M. Gildea, 1987. How children learn words. Scientific American, 257/3: 94-99. Nix, D. and R. Spiro, eds., 1990. Cognition, education, and multimedia: Exploring ideas in high technology. Hillsdale,NJ: Lawrence Erlbaum. Olson, R. K., G. Foltz and B. Wise, 1986. Reading instruction and remediation with the aid of computer speech. Behavior Research Methods, Instruments, and Computers 18: 93-99. Omanson, R. C., I. L. Beck, M. G. McKeown and C. A. Perfetti, 1984. Comprehension of texts with unfamiliar versus recently taught words: Assessment of alternative models. Journal of Educational Psychology 76: 1253-1268. Perfetti, C. A., 1983. Reading, vocabulary, and writing: Implications for computerbased instruction. In: A. C. Wilkinson, ed., Classroom computers and cognitive science, 145-163. New York: Academic Press. Perfetti, C. A., 1985. Reading ability. New York: Oxford University Press.
Che Kan Leong
Perfetti, C. A., 1992. The representation problem in reading acquisition. In: P. B. Gough, L. C. Ehri and R. Treiman, eds., Reading acquisition, 145-174. HiUsdale, NJ: Lawrence Erlbaum. Ralston, J. V., D. B. Pisoni, S. E. Lively, B. G. Greene and J. W. MuUennix, 1991. Comprehension of synthetic speech produced by rule: Word monitoring and sentence-by-sentence listening times. Human Factors 33:471-491. Reinking, D., 1988. Computer-mediated text and comprehension differences: The role of reading time, reading preference, and estimation of learning. Reading Research Quarterly 13: 485-499. Reinking, D. and L. Bridwell-Bowles, 1991. Computers in reading and writing. In: R. Barr, M. L. Kamil, P. Mosenthal and P.D. Pearson, eds., Handbook of reading research 2, 310-340. New York: Longman. Reinking, D. and S. S. Rickman, 1990. The effects of computer-mediated texts on the vocabulary learning and comprehension of intermediate-grade readers. Journal of Reading Behavior 22:395-411. Reinking, D. and R. Schreiner, 1985. The effects of computer-mediated text on measures of reading comprehension and reading behavior. Reading Research Quarterly 20: 536-552. Salomon, G., T. Globerson and E. Guterman, 1989. The computer as a zone of proximal development: Internalizing reading-related metacognitions from a reading partner. Journal of Educational Psychology 81: 620-627. Scardamalia, M., C. Bereiter, C. Brett, P. J. Burtis, T. Calhoun and N. Smith-Lea, 1992. Educational applications of a networked communal database. Interactive Learning Environments 2:45-71. Scardamalia, M., C. Bereiter, R. McLean, J. Swallow and E. Woodruff, 1989. Computer-supported intentional learning environments. Journal of Educational Computing Research 5:51-68. Schatz, E. K. and R. S. Baldwin, 1986. Context clues are unreliable predictors of word meanings. Reading Research Quarterly 21: 439-453. Slingerland, B., 1981. A multi-sensory Approach to language arts for specific language disability. Cambridge, MA: Educators Publishing Service. Spiro, R. J. and J.-C. Jehng, 1990. Cognitive flexibility and hypertext: Theory and technology for the nonlinear and multidimensional traversal of complex subject matter. In: D. Nix and R. Spiro, eds., Cognition, education and multimedia: Exploring ideas in high technology, 163-205. Hillsdale, NJ: Lawrence Erlbaum. Stahl, S. A., 1991. Beyond the instrumental hypothesis: Some relationships between word meanings and comprehension. In: P. J. Schwanenflugel, ed., The psychology of word meanings, 157-186. Hillsdale, NJ: Lawrence Erlbaum. Stanovich, K. E., 1980. Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research Ouarterly 16: 32-71. Stanovich, K. E., 1986. Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Ouarterly 21: 360-407. Stanovich, K. E. and R. F. West, 1981. The effect of sentence context on ongoing word recognition: Tests of two-process theory. Journal of Experimental Psychology: Human Perception and Performance 7: 658-672.
Using Microcomputer Technology
Stanovich, K. E., R. F. West, and D. J. Feeman, 1981. A longitudinal study of sentence context effects in second-grade children: Tests of an interactive-compensatory model. Journal of Experimental Child Psychology 32:185-199. Sternberg, R. J. and J. S. Powell, 1983. Comprehending verbal comprehension. American Psychologist 38: 878-893. Swanson, H. L., 1992. Generality and modifiability of working memory among skilled and less skilled readers. Journal of Educational Psychology 84: 473-488. Swanson, H. L. and M. F. Trahan, 1992. Learning disabled readers' comprehension of computer mediated text: The influence of working memory, metacognition and attribution. Learning Disabilities Research and Practice 7: 74-86. Swartz, M. L. and D. M. Russell, 1989. FL-IDE: hypertext for structuring a conceptual design for computer-assisted language learning. Instructional Science 18: 5-26. Torgesen, J. K., J. A. Kistner and S. Morgan, 1987. Component processes in working memory. In: J. Borkowski and J. D. Day, eds., Memory and cognition in special children, 49-86. Norwood, NJ: Ablex. Turkle, S., 1984. The second self: Computers and the human spirit. New York: Simon and Schuster. Vygotsky, L. S., 1978. Mind in society: The development of higher psychological processes (M. Cole, V. John-Steiner, S. Scribner, and E. Souberman, eds.). Cambridge, MA: Harvard University Press. Vygotsky, L. S., 1986. Thought and language ( rev. ed.). (A. Kozulin, ed.). Cambridge, MA: MIT Press. (Original work published 1934). Webb, N. M. and R. J. Shavelson, eds., 1985. Computers and education [Special Issue]. Educational Psychologist 20:163-241. West, R. F., K. E. Stanovich, D. J. Feeman and A. E. Cunningham, 1983. The effect of sentence context on word recognition in second- and sixth-grade children: Reading Research Ouarterly 19: 6-15. Wise, B. W., 1992. Whole words and decoding for short-term learning: Comparisons on a "talking-computer" system. Journal of Experimental Child Psychology 54:147167. Wise, B., R. Olson, M. Anstett, L. Andrews, M. Terjak, V. Schneider, J. Kostuch and L. Kriho, 1989. Implementing a long-term computerized remedial reading program with synthetic speech feedback: Hardware, sottware, and real-world issues. Behavior Research Methods, Instruments. and Computers 21: 173-180. Witten, I. H., 1982. Principles of Computer speech. New York: Academic Press. Young, S.R., 1984. RSVP: A task, reading aid and research tool. Behavior Research Methods, Instruments and Computers 16: 121-124a.
This Page Intentionally Left Blank
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 17 ISSUES IN THE D E V E L O P M E N T OF H U M A N - C O M P U T E R MIXED-INITIATIVE P L A N N I N G Mark H. Burstein BBN Systems and Technologies, a division of Bolt Beranek and Newman Inc., USA [email protected]
Drew V. McDermott Department of Computer Science Yale University, USA [email protected]
"A mixed-initiative system is one in which both humans and machines can make contributions to a problem solution, often without being asked explicitly." Jaime Carbonell, Sr.
ABSTRACT Mixed-initiative planning systems are systems in which humans and machines collaborate in the development and management of plans. The "initiative" in such systems is shared in that each can contribute to the formulation, development, management, refinement, analysis and repair of the plans developed "without being asked explicitly". Intuitively, the goal is to develop a style of interaction where both men and computers can further the state of an ongoing planning activity through contributions that include the many activities that surround the actual construction of plans. In this paper, we discuss some of the research areas that are likely to be important in transitioning prototype AI planning systems to a new role as collaborator in a mixed human/machine collective planning process. This paper is, in large part, the result of a series of discussions that took place in late 1993 and early 1994, among a group of AI researchers working on planning, supported by the ARPMRome Laboratory sponsored initiative in military planning and scheduling. INTRODUCTION The overall objective of research on mixed-initiative planning (MIP) is to explore productive syntheses of the complementary strengths of both humans and machines to
M.H. Burstein and D.V. McDermott
build effective plans more quickly, and with greater reliability. Human users need better, more intelligent, more active problem solving support than the current generation of plan authoring tools can provide, and AI planning systems need human support in such areas as problem definition, information interpretation, and spatial/perceptive reasoning if they are to be useful in real-world applications. Through a series of discussions of both the electronic and face-to-face variety, a team of researchers in AI-based planning came to some substantial agreement on a set of issues that will need to be addressed in the development of mixed-initiative planning systems I. This chapter documents some of the conclusions reached in those discussions. We were motivated to consider these questions by our work in the domain of military planning and scheduling. However, we believe that much of the discussion applies equally well to any system of people and software engaged in planning activities will be faced with much the same set of issues. We will define military planning for the purposes of this discussion as the organization of resources to carry out a military or humanitarian objective. For example, to plan for the evacuation of civilians from a region that is in turmoil, or the movement of people and materials to and from an area so as to provide relief atter a natural disaster, planning activities include the identification of appropriate resources for carrying out the objectives, including the transportation of those resources to the area in which they will be used in advance of their planned time of use. A major part of military planning is the identification and planning for the movement of the resources to and from the region in question. Our larger interest in mixed-initiative planning systems grows out of some observations of the strengths and weaknesses of both human and automated planning systems as they have been used (or considered for use) in the past. Humans are still better at formulating the planning tasks, collecting and circumscribing the relevant information, supplying estimates for uncertain factors, and various forms of visual or spatial reasoning that can be critical for many planning tasks. Machines are better at systematic searches of the spaces of possible plans for well-defined tasks, and in solving problems governed by large numbers of interacting constraints. Machines are also better at managing and communicating large amounts of data. In addition to the potential for synergistic improvements in current planning processes by combining the strengths of these different kinds of planners, we must also recognize the currently burgeoning roles of electronic collaboration and electronic data access on tasks of all kinds. As network technology has matured and become widespread the notion that work on a shared task could be physically distributed has become a reality. Electronic conferencing and workflow tools are starting to reach the market place. On-line access to huge amounts of information via wide-area networks must be taken as a given. Multi-agent distributed AI systems of many kinds are being
1The face-to-face meetings took place in December, 1993 and January, 1994 at Yale University and BBN. Participants in these discussions included: James Allen (Rochester U.), Marie Bienkowski (SRI), Mark Burstein (BBN), Steve Cross (ARPA), David Day (MITRE), Gary Edwards (ISX), Nort Fowler (Rome Lab.), Matt Ginsberg (U. of Oregon), Jim Hendler (U. of Md.), Leslie Kaelbling (Brown U.), John Lemmer (Rome Lab.), Drew McDermott (Yale U.), Stephen Smith (CMtD, Austin Tate (U. of Edinburgh), Craig Wier (ARPA), and David Wilkins (SRI). Other contributors included: Larry Birnbaum (NorthwesternU.), Katia Sycara (CMU), and Dan Weld (U. of Washington).
Human-Computer Planning
explored, though much of the work is still in its early stages. We need to develop a clear vision of the role of these technologies in true mixed-initiative systems as well. From a pragmatic standpoint, we took the major research question to be the following: How can human(s) and machine(s) best share information about and control of plan development? That is how do we get positive synergy from interactions between human planners and automated planning and support software such that: *
Each works in areas where they perform best;
"Agents" (using the term loosely to refer to both humans and software systems) are able to use appropriate (often concise or abstract) representations for communication of plans, constraints, assumptions and analyses to communicate with other agents that have different areas of expertise or functionality, and different kinds of communications skills; and
"Agents" have means of acquiring and transferring authority for planning related tasks. The remainder of this chapter elaborates on these questions and posits some directions to pursue in efforts to find answers to them. We begin by taking apart current notions of AI planning techniques to examine where they will need to change, perhaps radically, in order to fit into the world of collaborative problem solving. We then discuss some ideas about the near-term, focusing on ways that current generation AI-based planning systems might be adapted to support a more mixed-initiative style of interaction. PLANNING AS A COLLABORATIVE ACTIVITY As one moves from the current-day perspective historically taken by many AI planning researchers, who took it as their objective to develop "stand-alone" planning systems, to a model where issues of communication and collaboration are more central, a number of assumptions underlying research in this area must be questioned. We began with a model of planning activity in which we assumed that many "agents", of both the human and software variety, are actively cooperating, and from that perspective looked for aspects of current planning system theory and practice that would have to change. The AI conception of planning is largely dominated by the notion of a singlethreaded search through a space of possible (partial) plans for satisfactory solutions (Wilkins, 1988). This "classical" view is typically implemented in terms of some form of goal-refinement, back-tracking search algorithm. When the search for a plan is to be coordinated among many "agents", some of which are human, this model must be seriously questioned. Present-day, autonomous planning algorithms assume that plans are to be developed by systematic exploration of alternative plan refinements under programmatic control. The objective of these systems is usually to find a satisfactory plan, rather than an optimal one, although this distinction does not matter greatly here (yet). Human planners, on the other hand, do not search systematically in this fashion, but rather may jump around in the space of possible plans, perhaps based on preliminary
M.H. Burstein and D.V. McDermott
analyses of what is "hard" about the problem, or more simply on pre-existing knowledge of particular solution models. If the plan is for an important objective, they will typically explore several approaches to some limited depth before choosing a path to pursue to a completely detailed plan. Therefore, in a mixed-initiative approach to search during planning and problem solving we must expect the human planners involved (hereafter "the users") may wish to dictate where and how much to search, while at other times, automated planning "agents" may be given reign to search problem spaces under their own control. Some redundancy in these collaborative efforts must be viewed as good, even essential, as the search techniques of these different kinds of agents are likely to be very different, and to the extent their results are comparable, each can serve to support and correct the other. A second theme of this discussion is the need to support a variety of kinds of dialogue during planning. In an informal study of collaborative planning reported in (Allen, 1994), dialogues were collected of pairs of people working together to solve a planning problem in an artificial environment consisting of trains that could travel between cities, and needs for those conveyances. In each case, one of the subjects played the role of the "system" and the other the role of the "manager". The two could not see each other, and did not know each other. The only shared information they had was an initial map of the TRAINS world. Each interaction between the players was categorized according to its general purpose. Table 1 summarizes the relative frequency of these interactions, by category. The kinds of interactions typically supported by current-day planning systems comprised less than 25% of the total. While this data is merely suggestive, it strongly implies that effective collaboration in plan development must address as a central issue the question of how to manage and support the variety of kinds of dialog that were seen here as necessary to this kind of collaborative problem solving.
Evaluating & comparing options Suggesting courses of action Clarifying and establishing state Discussing problem solving strategy
25% 23% 13.5% 10%
Summarizing courses of action
Identifying problems and alternatives
Table 1: Frequency of interactions by type
Given the disparate styles of problem solving found in people and machines, and the need for coordination during collaborative planning among these very different kinds of agents, each of the following areas of research related to M-based planning and collaborative problem solving must be addressed to develop software capable of supporting a mixed-initiative model for planning:
Human-Computer Planning
Plan-Space Search Control Management addresses the question of how to coordinate various kinds of agents' exploration of potential solutions to a planning problem. Representations of and Sharing of Plans by different kinds of agents for different (but related) purposes, and for communications among the collaborating agents. Plan Revision Management is the problem of coordinating revisions to "the plan", especially if it is being revised during execution.
Planning and Reasoning under Uncertainty is an ongoing research area within the AI community. For planning, it primarily concerns the anticipation of future situations, the enumeration of their possible outcomes, and estimation of the likelihood of those outcomes. It is an area not been adequately addressed in most AI planning systems, and is one of the reasons for their lack of acceptance by user communities. It is hoped that this issue will benefit from a more synergistic man/machine approach. Learning from past planning attempts, from their results when executed, and from ones collaborators: While machine learning techniques have had some limited successes, it seems clear that there is far to go, and that this research area will have an important role to play in mixed-initiative environments generally. Agents are not "born" team-players. They must adapt to roles on a team based on the strengths and weaknesses of their teammates. Inter-agent Communications and Coordination is not about planning per se, but is important here in recognition of the different ways human and machine team members might need to interact during planning. Issues related to this topic will be raised in each of the other topics as well. When viewed from the point of view of a collaboration between people and software systems, we expect that planning systems, taken as the union of all these participants, must be supported by tools that enable a variety of kinds of communications (graphical, language-based, audible) to take place. These communications are most certainly not just about the plans themselves, as the experiment cited above suggests. We take each of these areas in turn:
Search Control Management Control Dialogues to Establish Collaboration Patterns. There needs to be some amount of ongoing "dialogue" between the human and machine planning agents about how search will be organized, divided and conducted. It is most likely that the human planners will need to maintain control of the setting of major objectives, and cede control of some of the more mundane aspects of planning to the machine. This dialogue may not be, indeed is not likely to be, done with true natural language, but by a variety styles of interaction with a graphical user interface. Assuming for the moment that a human "user" maintains control during these dialogues and is characterizing planning sub-tasks for his or her machine and human associates, then he or she must describe how to do search through a space of possible plans, and express to the other agents how their search is to be bounded. This dominant user must have the means to express search constraints of many kinds, including such things as how to decompose the planning into partitionable subtasks, what assumptions to make in subtask planning about available resources, what assumptions to make about the world in which the plan will execute, and more general
M.H. Burstein and D.V. McDermott
controls such as whether to be optimistic or pessimistic about the utilization of resources, and the "cooperativeness of the world" in which the plan will be executed. One style of cooperation that might be selected by a user would have that user retaining control of and directing search at the higher, more abstract levels of plan development, while ceding to automated planning agents responsibility for pointing out critical areas to address carefully (such as potential resource shortages), or charging them to explore in detail specific issues or plans for known subtasks. Further communications between these various automated agents and the human planner(s) would involve summarizing the results of plan analyses and the presentation of potential options for doing specific subtasks. On the other hand, it seems clear that this is just one of a range of collaboration models, and that different users will wish to vary the form of search control and collaborate more or less closely in plan development, depending on how "cut and dry" the problem to be solved is, and their understanding and faith in the capabilities of their electronic collaborators. Variable speed and resolution response. In the end, we want our collaborative planning tools to produce detailed plans. But at earlier stages of the process, we want them to assist in generating cruder plans quickly, so that preliminary analyses can be performed. In essence, collaboration at the early stages of plan development may be a time of consideration of and dialog about abstract alternatives, leading to the more precise formulation of objectives. This characterization of the preliminary phase of planning suggests that planning problems need to be viewed as solvable at different levels of resolution (or "abstraction"), which, in effect, means having multiple different representations of what "plans" or "solutions" are. Deeoupling and recombining plans. The user should have the ability to isolate sets of subgoals that are only loosely coupled to the rest of the overall set of goals, in the sense that plans for those subgoals can be developed in parallel and later be combined. The techniques required to identify such goal sets and combine the plans developed into an overall solution are still open research areas. Context registration. There will need to be means for constantly conveying where in the problem solving the team of humans and software agents is currently working, and who is performing what tasks. We will refer to this maintenance of a shared problem solving context as context registration. When a sot~ware agent completes a task, the human-computer interface, itself a kind of coordinating and mediating "agent" must be able to give the user a succinct, coherent picture of the current "state of play" in the planning process, and a summary of the conclusions reached, which can be interrogated for more detail. Users may also need to convey preferences for levels of "communications volume" or communications bandwidth in their dialogues with the other planning agents. Plans are often too large to view in a single picture or short text. Many different perspectives and styles of visualization are useful at different times, and these typically rely on abstractions and approximations that help to convey the gist of the plan or subplans under discussion. Dialogues must be supported concerning the amount and presentation style of information to be conveyed (e.g., to the user) at different stages of planning. These dialogues may tend to occur at landmark points in the search for an effective plan, when the locus of planning activity is about to shift to a different level or perspective. Good techniques for graphical and other forms of summarization will be
Human-Computer Planning
critical, and all communications techniques for conveying perspectives on the plans under consideration must readily support requests for elaborations and explanations as needed. Intent recognition. Oftentimes, users will not explicitly convey all of the constraints they know to apply in the context of a planning problem. This is a problem for automated, "autonomous" planning tools. When people collaborate to solve planning problems, it is assumed that each participating planner will understand the context in which the plan will be executed well enough to make new planning assumptions as they elaborate their part of the plan, and seek out any additional information needed to make their plans effective. They will also identify problems with their own assumptions and those of the agent who tasked them as they proceed in order to "shore up", if possible, or reject, if necessary, possible plans that they might have otherwise produced. For example, if they cannot make a workable plan with the resources assumed available they will communicate this back to the agent that tasked them. This shared "world context" extends also to shared knowledge of the many standard components of plans they might use to achieve particular goals. Consider, for example, a request made by one agent that another develop a plan for a conference in another city. The agent tasked could infer from what was stated (city, dates, attendees) not only that subplans would be needed for reserving the meeting space, and alerting the participants, but also that plane reservations would be needed to transport the participants if the city was distant, and that hotel reservations would be needed if the meeting was longer than would permit traveling on the same day. The agent developing this plan might discover, on researching the problem, that the hotels available were not close to the meeting space, and so need to generate plans to reserve cars for local travel as well. The agent who requested the plan might, even though he or she did not ask for a rental car, be explicit in requesting that particular kinds of sound and projection equipment be made available at the meeting, because that was a piece of information not inferable from the general task. Clearly constraints can be left implicit in this kind of communication of an abstract plan for elaboration by another agent. Indeed it is almost necessarily the case that details will be left out, if the communication is to be succinct enough to make it worth delegating the planning task. Mixed initiative planning systems must be capable of using prior knowledge of the particular domain of planning (and the preferences of the agents specifying the abstract plan) to fill in such details. Task planning agents must be able to make reasonable assumptions about the environment that the plan will be carried out in, so that the specification and communication of a planning subtask is not overburdened with a large volume of "common-sense" details. In addition, the requesting agent must know what details cannot be left implicit if the resulting plan is to be satisfactory. Each must have at least a partial model of what knowledge the other has and what assumptions must be made explicit in their communications during collaboration. It will be important, if such collaborations are to be successful, that automated systems do not unnecessarily impede plan development by requiting too many details specified in advance, or ask too many questions when the information can be inferred. On the other hand, it is equally important that these systems be able to ask refinement questions when important details are omitted. This is one of the many kinds of
M.H. Burstein and D.V. McDermott
dialogue that must be supported. We also discuss some of the problems raised by this issue in a later section of this chapter devoted to initiative. Plan analysis. The computer must provide the user with a set of tools for analyzing fragments of plans, and comparing versions of plans and plan fragments generated under different assumptions. These tools should include plan displayers that highlight different information, statistical packages for analyses of uncertain outcomes, and sensitivity analyzers that check whether actions might take place under conditions leading to higher than normal failure rates. Means must be developed for describing, requesting, and/or attaining automatically the information required for use of these tools.
Representations of plans and plan-related information sharing For collaboration over plans to work, we assume that there must be shared representations of those plans, and means of extracting and reformulating those representations into forms convenient to the various collaborators. This does not necessarily mean that there is a single place where the full representation of a plan is stored, but that collaborators can get efficient access to pieces of the plan, as needed. Visualizations of plan representations must be intelligible to human users, and extractions/reformulations of planning constraints and other plan-related information must be possible to provide information for specific automated planning and/or analysis processes. In reality, this is perhaps the biggest barrier to collaboration. Dialogues about the plan under development must be framed in terms of consideration of alternatives and their justifications, almost in the style of an argument (Allen and Ferguson, 1994). Most planning done by people is at best reduced to raw text and graphics, rather than encoded in electronic forms amenable to manipulation by computer systems. Another role of interfaces to planning systems must be to make it as convenient as possible to maintain plans in electronic forms, rather than more exclusively human readable forms. Shared representations. It is generally assumed that if the planning process is distributed, there must be a representation of "the plan" that is shared among the collaborators. It should support a variety of visualizations, abstractions and translations into more specialized forms for specific purposes. Abstractions. It will be necessary to represent plans at different levels of detail. Even alter a plan has been elaborated, the user must be able to see a "low-resolution" version highlighting particular aspects of the plan. Visualizations. If the user is to have a chance of understanding the current state of a plan that is only partially specified, then there must be many ways for users to view and edit any part of the plan, as well as its justifications and ramifications. For example, it should be possible to display the state of affairs expected at any point in a schedule. It should be possible to run a "movie" that shows possible unfoldings of the plan over time. Visualizations should support a variety of perspectives and "filters" on such views, highlighting such things as resource utilization, workload, transportation of materials, etc. Uncertainty. The user must not be misled into thinking that nominal plan values are certain. The system must help to disabuse him or her of such illusions. Uncertain information is likely to be handled in several different ways in representations of plans.
Human-Computer Planning
Where possible, sources of uncertainty should be recorded, along with planning decisions dependent on those uncertainties, so that plan revision can be done more automatically. The quality of information and of its source should be available wherever it is likely to be suspect. In addition, tools such as decision theoretic models that explicitly reason with probabilistic information should be supported, where applicable. These tools require more detailed estimates of probabilities than just discrete alternatives. Versioning, author tracking, change authority. As part of the support for interactive, collaborative plan development there will be a need for better mechanisms for maintaining versions of partially developed plans, both so that collaborators can explore options in parallel without global commitments, and so that plans can be compared, contrasted, combined, and, in general, referred to without confusion. Information associated with different versions that will be important in the collaborative dialogue includes information about authorship, who has authority to change particular aspects of that plan version, what views of the plan are most useful, etc.
Plan Revision Management There is a serious sense in which one is never planning "from scratch", and in which planning is never completed. Planning should be viewed as a continuous, ongoing process involving alternatives exploration, refinement, diagnosis, repair and recombination, in the face of constantly changing information. Even before execution has begun, human planners are constantly striving to improve the quality of the information used for planning, and that is as likely to cause replanning to occur as runtime contingencies. Maintaining continuity between plan versions. As execution time grows imminent, there is a need to alter the patterns of plan change, and one's preferences among alternatives, toward those that maintain continuity, or minimize execution-time replanning at lower levels of detail. Activities involving advance preparation should not be changed once they have begun, unless those changes are consistent with the preparations made. Activities in progress incur even greater costs if changed in an incompatible way. Future planning systems need to be able to deal with this range of continuity-maintenance constraints due to the potentially varying need to minimize disruption of ongoing activities. Run-time replanning. True execution-time replanning raises another set of issues beyond continuity. As the time available for planning and the age of one's information about the current state of the world diminishes, your team's own activities in executing "the plan" must be considered as part of the process. Once execution starts, parts of the plan become historical and what matters is the relationship of the outcome of those parts of the plan to the remainder of the unexecuted plan. Indeed, during execution, predicting a future state of affairs and its impact on the remainder of the plan may be based on observing unexpected changes occurring in the present. Coordinating multi-agent planning tasks. The whole situation is complicated by the fact that multiple agents may be attempting to modify an ongoing plan at essentially the same time, and that different agents may have responsibility for, indeed be the only ones capable of, revising particular portions of the plan. There are a number of issues to be addressed here: coordination of plan update authority, either through a central
M.H. Burstein and D.V. McDermott
manager, or through distributed authority management; information update notification, to ensure that the proper agents are made aware of the information that may lead them to revise the portions of the plan they control; information and plan consistency management, which is needed in the face of the potential acquisition of inconsistent information and the possibility of contradictory plan changes being made by different agents simply because some take longer to complete their work and update the plan; and resource coordination, such as between different plans that may be executed at the same time.
Planning under Uncertainty. Planning, perhaps more than other kinds of reasoning, is fundamentally based on uncertain information. There is the uncertainty in the timing of availability of resources, uncertainty in one' s own information sources, uncertainty in the actions of other agents operating in the environment, and uncertainty in one's ability to estimate the outcome and time required to complete planned actions. All of these different sources of uncertainty can sometime be modeled as discrete alternatives, as is currently done in AI planning systems, and at other times it might be described probabilistically. The key point is that when planning is to be done collaboratively, and the goal is not necessarily to get down to small atomic actions that can be executed by single agents in the world, managing uncertainty needs to be done more explicitly. We also observe that: 9 People can't deal with too many (slightly varying) alternative plans or scenarios. One can't overload a user with a million alternative scenarios whose probabilities, if known, would sum to 1. Identifying and analyzing qualitatively distinct plans should be stressed. 9 There are a variety of current tools designed to help humans analyze (oiten implicit) uncertainty in their plan representations. We anticipate continued frequent appeals to sensitivity analyses that reveal how the projected effectiveness of plans change with changes in key resources. 9 In many environments, the emphasis should be on finding robust plans, as opposed to ones that will be optimal if no assumptions are violated. The system should point o u t which resources are most likely to be under stress (e.g., waypoints in a transportation plan that are projected to operate at capacity). 9 Getting users to assign probabilities to events is hard. Getting users to provide (even qualitative estimates of) probabilities for every uncertain fact is nearly impossible.
Learning Teams don't start out working well together, they must "grow into it" by learning the most useful ways of contributing, and the times not to contribute (too much). Since people are already fairly adaptive (within limits), the issue is one of finding opportunities for the automated systems to do useful learning to make them better team members. Some near-term objectives here would be leaming of: User preferences: If the user repeatedly asks for a particular type of statistical analysis, visualization, node expansion, constraint handling preference (conservative or liberal), or problem decoupling, the system could begin to anticipate such requests and
Human-Computer Planning
automatically do them or inquire if they should be done. A recent example of this kind of learning is (Sycara et al., 1994), for an adaptive case-based scheduling system. Prior plans and their effects: Users may want to generate new plans by modifying old ones, in whole or in parts. Case-based reasoning techniques are a potentially easy way to get plan-level learning into MIP systems. The system could help by indexing and retrieving stored plans as similar goals are stated for new problems, and by recording failures and the conditions that led to them, so that they can be brought to the attention of users if similar plans are constructed. General and domain-specific planning knowledge or heuristics: If the automated planning components of a mixed-initiative system are to keep pace with change, or improve on their initial capabilities as provided by the system designers, there will be a continuing need to develop and refine the heuristics for the automated planning tasks that the system provides. It is desirable that at least some of this knowledge updating and maintenance come about as a result of interactions with the human users of the system. This may motivate some additional (possibly off-line) clarification dialogues so that the system can learn from user directives about such things as searching through the planning space, operator preferences under different conditions, etc. Inter-agent communications and coordination
Given the highly distributed nature of planning in large organizations, it is going to be important that mixed-initiative planning systems of the future be open systems where multiple humans and multiple machines are collaborating in an open architecture. While this adds a number of complications to the study of mixed-initiative systems, many of the issues need to be addressed equally well in distributed M systems research, in improved technologies for electronic collaboration between humans, or in distributed systems support generally. Nonetheless, there are a number of issues that are unique to person-machine collaboration, and to large-scale distributed planning systems that involve both human and machine agents. As a first pass, it seems useful, until we see more artificially intelligent sottware agents running around, to break down the issues along the lines of whether the agents communicating with each other are human or machine. Many of the issues related to electronic collaboration between people are now being addressed in distributed groupware and workflow systems that are available commercially. The issues that relate to inter-sottware-agent collaboration are largely being addressed by the Distributed M community and the M knowledge sharing research community. However, the issue of knowledge sharing is ubiquitous, and there are some specific things to be said about this with respect to planning systems: Distributed information management will need to be coordinated among many disparate knowledge and information sources, with varying amounts of sophistication, varying capabilities for query processing, and with varying levels of accuracy and timeliness in the data provided. Where to store shared data, what kinds of transactional mechanisms are required, how to make access fast enough, how to make sure information is disseminated in a timely fashion to those who need it, and how to control access to it are all ongoing concerns. Maintenance of and timely access to shared plans is a related but more specific issue: it seems inevitable, given near term hardware technologies and the large amounts
M.H. Burstein and D. V. McDermott
of information that is required for large-scale, distributed planning that plan-related data is passed around between computers, reformulated for use by different agents, and cached in those new forms. This potentially makes the maintenance of plan version consistency and information access support every agents'problem. The central issue for to mixed-initiative planning systems is communication between humans and software systems. Echoing some of our earlier comments, we see the important research areas here as: Dialogue-based task management for interactively controlling search, communications bandwidth, asynchronous interruption management, and delegation. Context registration using many kinds of clarification dialogues, summarization, elaboration and explanation techniques. Flexible, interactive visualizations of plans and support information from different perspectives as a means of conveying information to users, and providing graphical contexts for communications to the machine. Information acquisition and management, which often dominates the planning process, also means the transformation of information into usable electronic forms, so that it can be related to electronic versions of the plans under development. There are a number of potential opportunities for greater machine "initiative" to provide assistance in this area, in the world of the Internet, as well as many potential stumbling blocks related to the interpretation of text and graphics into representational forms. A key point here is that there must be increased effort put into ensuring that useful representations of plans are captured in usable electronic forms, where the uses are by both humans and software systems. Constraints that are left implicit in the head of users or in raw text cannot be part of the cooperatively agreed upon plans that are developed, and this means lost opportunities for automation, especially in replanning. INITIATIVE IS IN THE EYE OF THE BEHOLDER If initiative is viewed as acting to achieve shared goals, without being asked explicitly, it seems fair to ask what it means to ask a computer program to do something. A program will be "triggered" under certain conditions; the end user may classify some of these conditions as "having been explicitly asked," but for the person who wrote the code the distinction is not terribly significant. After a human user has used such a program for a while, he or she may well come to expect the computer's actions as predictable responses to their own. For example, users of spreadsheets do not perceive the programs' calculations (which result from changes in data values in cells) as being the result of system "initiative". In the same degenerate sense, programs that complain when you enter invalid responses to their prompts are not seen as engaging in "clarification dialogues". From this perspective, we should be careful with the use of the term "initiative," and focus on revising our models of automated systems' interactions with users, with the goal of improving their utility in collaborative endeavors. By focusing too much on initiative for initiative's sake, we risk burdening ourselves with the impractical task of producing modules that impress us with their intelligence.
Human-Computer Planning
Taking that point of view, it seems clear that the problem is not initiative, but mixing. Various agents will be involved in working on a problem. Let's imagine that some of them are humans and some are computer programs. Each one gets triggered under certain circumstances, and must make a modification to an evolving plan. For example: A scheduling program might be run to produce a timetable for activities and shipments. A route-planning program might fill in the details of shipment routes. An inventory planning program might project the availability of raw materials. A person might prioritize the major activities involved, such as selecting which items on sales orders to manufacture first. A probabilistic plan checker might look for places where the plan is likely to fail, and introduce "risk-reduction strategies". We might start out assuming that there is a user who has final authority to accept the plans produced by the system. (There may be grades of users, with different authority.) We could further assume that this user has a reasonably accurate mental model of the capabilities of the machine. For example, he or she might know that a transportation planning program can fill in routes and produce schedules and timetables, but would make a mistake if not told that some critical vehicle was in the shop for repairs. Given these other assumptions, it also seems reasonable to assume that there will always be a way to issue explicit commands to trigger automatic capabilities. Hopefully, there will be many circumstances where issuing such explicit directives will not be necessary, because the user will find it convenient to let the system to "just do it" without being told. Characterizing the circumstances under which the phrase "just do it" might apply is certainly one way of increasing the "mixing" of the activity performed by the agents in this shared environment, and reducing the burden of the user to directly state every step required to complete his or her task. This simple-minded notion of initiative is easily sought, and is usually not AI, but rather is the province of all engineers and programmers that seek to automate support functions for users. If, however, the context shared by the user(s) and the system(s) is sufficiently rich, in terms of their models of the task to be performed and the inputs and outputs of the planning process, then better, more sophisticated context-specific "triggers" can be implemented. This seems straightforward, if unglamorous. It also seems clear that one will want it to be easy to retract anything done automatically, if the user decides it isn't right, and, better yet, be able to fix whatever was done, without completely redoing it, to take advantage of whatever was done correctly. This style of interaction is fairly easy to produce programatically if the size of the task performed by the system is small, or the user acknowledges that there is a standard process to get an acceptable answer, and the system has an encoding of that procedure. It may be equally OK (to the particular user) if the system has a fallible method of doing it, and the user doesn't care how it gets done, either because it is a quick and dirty hypothetical plan, or because it can be fixed later. Unfortunately, as we stated at the outset, there are several related problems with this model, as it might be applied to present-day AI-based planning and scheduling tools:
M.H. Burstein and D.V. McDermott
9 The user is not typically "in control" of current-day automated planning processes, except at the very beginning and the very end, and the product is typically a plan or schedule of substantial size and complexity. 9 Questions that these systems ask of users tend to be at the system's discretion, without much attempt at staying "in synch" with the user's way of approaching and solving the problem (or building the plan). 9 "Backing up" in a search space is not usually an option that the user has, or, if it is an option, its consequences are not easily understood. These problems might be manageable if all such modifications satisfied the topdown refinement constraint, which can be characterized as the requirement that plan modifications always take a plan from a more abstract to a more instantiated state, or return from an instantiated state to a previous abstract state. If a set of planning modules obeys this constraint, then mixing initiative becomes fairly simple. Each participant sees an abstract plan, to which it can add information. No one ever changes information once it is added, except to return to a previous state, discarding all subsequent changes (and noting that they should not be tried again) z. Unfortunately, it is safe to assume that this constraint will never hold in practice, because users generally require the authority and capacity to change any part of the plan they can see. If humans are to be in the loop at all, they just won't tolerate a system that puts them in a straitjacket. This is partially because users are often better at saying what they don't like about a plan than what they do like. It's also partially because it is at present difficult to imagine maintaining the knowledge bases of the AI planning systems such that they have proper representations of every constraint that the human planner is operating under in a new planning situation. If the planning system doesn't have a full representation of the problem, then "arbitrary" modifications of the solutions produced may be necessary, even when a solution is "right" as far as the system can tell. An issue, then, for a mixed-initiative planner is to help the user characterize what "don't like it" means, or what to do about it in as useful a way as possible. Responses to a user stating "I don't like X" might include any of: 9 Just drop X (and perhaps any goals it supported); 9 Reduce the priority of the goals and constraints that caused the introduction of X and try to replan with those altered assumptions; 9 Explain why it is critical that X is there and ask if the user still wants to remove it; 9 Ask for a reason for removing that element, so as to be able to incorporate the criticism as a rational constraint with a basis. Hence, a key problem in building mixed-initiative planning systems, especially using as components many of the kinds of problem solvers around today, most of which were originally designed to operate autonomously, is the problem of getting those modules to correctly determine which criticisms by other agents are "boundary 2Even when this simple model works, some styles of planning search may be more suitable than others. For example, it is probably better if the subsequent changes were ones that were related to the change that is discarded (i.e., justification-based backtracking is more reasonable here than chronological backtracking).
Human-Computer Planning
conditions", and which are best treated merely as preferences. Consider a couple of examples of where this issue comes up, some of which are discussed in (Smith and Lassila, 1994): 9 The user looks at a transportation schedule produced by a program, and notices that a certain high priority package is being shipped by rail rather than by air. The user edits the schedule (graphically or by altering text fields) so that the package now come flies into the nearest airport to its destination. The scheduler is rerun. It must treat the new shipment plan for that package as a constraint on what it produces. 9 The user looks at a plan generated by the program, and notices that an airplane is flying almost completely empty, and switches its passengers to another flight to save the trip. It turns out that the automated planner had inserted the flight so that the plane would be available for another flight from the destination later that day. Later, the planner is rerun, and has to schedule yet another plane to go pick up the passengers of that later flight. (If the planner is not rerun, the schedule will no longer accurately reflect the plan.) The user may not notice this, but if he or she does notice, he or she will want to know what's going on. It would have been helpful if the system had been able to tell the user as soon as the edit was proposed that the edit is impossible without making other changes. 9 Some modules produce only plan assessments, without making any other changes. It might be desirable for the user to edit these assessments, when it thinks that the automated system is overlooking something. After further plan revision, the user will have to re-edit the assessment, because the automated system has no way of knowing how much of the revisionist assessment should be preserved. The user might then fall into a rhythm of blindly upgrading a plan even when it has been revised to a point where it really is bad. One can imagine lots of examples of this sort. They may not have much in common in detail; that is, there may not be a general theory of intent recognition that will let the person "explain" to the computer what to preserve about the edits made in each case. We should at least hope that the user will be able to receive enough information to validate their own actions. A more modest research agenda, then, might be one based on the idea of anticipating types of edits to plan structures. One basic principle is that anything a person can see on the screen ought to be editable. Another basic principle is that what's on the screen ought to be what people find natural to think about. At a very early stage in the process, we need to track possible edits of the plan display. There are two sources of information about this. One is just the raw syntactic possibility of edits. Whatever can be displayed (location, time, ordering, etc.) can in principle be changeable. Another source of information is what changes the human planners currently make. For each possible change to a plan, we ask, What boundary condition might this change imply? If there is a choice of boundary conditions, a dialogue should take place in which the choices are described to the user in his or her own terms and he or she is asked to clarify which is meant. Any tool that works on a plan must be prepared to obey notes about such boundary conditions.
M.H. Burstein and D. V. McDermott
Non-proscriptive Forms of Initiative are the Most Easily Accepted It should be a goal of systems built for person-machine mixed-initiative planning tasks to be as helpful as possible to the human user without being counter-productive in the sense of making more work than they provide. One way to maximize this is to do as many of the "little things" that make effective use of the computer's capabilities and resources in ways that do not automatically modify the plans under development. There are some tasks that are naturally non-destructive in this sense, and others that can be made so by putting their products in the forms of "suggestions" that can be considered and either accepted or rejected. Whether these things are or are not due to "initiative" on the part of the system is likely to be based on the conditions under which they occur. That they are valid, useful contributions made a timely fashion is all users should really care about. That they support more effective distribution of the labor is most important. Some things in the naturally non-interfering category are: 9 Initiation of information retrieval requests to other agents; these can be triggered by such things as the "named" phases of a planning process, and so require minimal information about the specific planning problem to be triggered. One can think of this kind of activity as analogous to a nurse asking you to fill out a medical questionnaire before being seen by the doctor. In an on-line system, it might mean automatically sending out email requests to other users or "off-line" databases to collect information before beginning a phase of planning. 9 Automatic "highlighting" of information requirements for which the user is the primary source; Examples include flagging of unexpanded goals which need to be worked on to complete a phase of planning, assumptions under which a plan or plan element will remain valid or achieve its mission, etc. Many kinds of visual cues may be used to identify pending or incomplete tasks, new information which must be taken into account, etc. 9 Promoting the reuse of experience by automatic retrieval of relevant prior plans or fragments (cases), that are similar to the ones under development, either to provide additional "cut and paste" opportunities, or to provide examples of potential problems that might come up when executing the plan under consideration. 9 Automatically sending inputs to plan evaluation tools, including constraint checkers, resource analyzers, simulations. On completion of such evaluations, the user should be notified only of important issues that were uncovered, and even then perhaps only a notice that the results are available to be considered whenever the user is ready to look at them, unless the problems are of sufficient priority. Some things that can be most useful if done on a "not to interfere" basis. The issue with this kind of activity is the potential for harm, either in generating work for agents that is subsequently thrown away, or in causing too many interruptions of the user's activities. Beyond endeavoring to find the most appropriate triggering conditions for these activities, all of the usual ways of hedging by getting confirmation that the action is desirable should be considered. Also, means of canceling these behaviors individually and globally (in the sense that a user never wants that feature) should be provided. This
Human-Computer Planning
means support for issuing retraction messages for automatically triggered activity requests (which could also be triggered if the goal they served is abandoned). Some example activities in this category are: 9 Eager elaboration of trivial subgoais when no alternative choices need to be considered. 9 Limited-depth subgoal search to find a few qualitatively different feasible elaborations to present to the user as options.
9 Automatic notifications to collaborators that their participation (based on plan content) will be required for some planning task, especially when some advance preparation is required on their part, or scheduling of their time is an issue. 9 Notification that planning assumptions have changed and the planner must consider redoing part of his plan. This could be either an execution-time issue (the world changed), or a planning time one (the head planner changed a critical decision, or new information arrived). 9 Reconciliation of local and more global objectives, especially where different human agents are responsible for these different perspectives. This is really a specialized form of plan consistency checking. Task Decomposition Models Provide Useful Triggers for System Initiative
As the prior discussion suggests, when several agents collaborate in a mixedinitiative fashion on a planning task, each agent must be able to recognize when they can perform useful work in service of the overall task objectives, and when and how to convey the results of their work to the other cooperating agents. For the team as a whole to succeed, each agent that initiates activity must share compatible models (though perhaps implicit ones) of the tasks to be performed, in addition to knowledge of the specific goals to be achieved by the plans they produce. If we are looking for mixed-initiative systems to do more than systems that perform functions when a command is issued by the user, then we need to find more sophisticated "triggering" criteria, that can be based on any or all of: the user's inputs, the currently represented state of the plan under construction, background data and other potentially "live" information about the world in which the plan is to be executed. This situation is simplified if there are some quickly recognizable cues as to where in the process of plan construction the agent who is "in charge" (typically the user) is focused, so as to avoid the impossible chore of scanning and interpreting all of the available data to discover what tasks are likely to be appropriate ones at any given time. There are now working examples of "plan editing tools" that break complex planning activities down into recognizable phases with different displays appropriate to each phase of the task. These displays provide a crude form of"context registration", in addition to providing specialized task support. In planning tasks where such explicit models of the structure of the task are appropriate, such phase shiits can serve as triggering cues for planning support activities by other agents. People engaged in complex tasks break them down into stages or phases so that they can organize the work into activities with specific methods of achievement, limit the depth of their own problem solving, simplify the applicability conditions for the subactivities involved, delegate subtasks to others, and in order to learn how and when
M.H. Burstein and D.V. McDermott
to perform subtasks in the first place. NeweU's (1981) "problem spaces" is a formalization of this notion. As different phases of planning may or may not use the same problem solving styles, the representations of the problem and the vocabulary used in solutions may change as one changes phases. As a result, it may be a lot easier to say when one is "done" with a phase of planning, (e.g., that all subgoals have been expanded into a set of activities defined in advance to represent a consistent level of description in the domain; or that one has validated the plan at that level of description by some set of methods such as simulations and constraint consistency checks) than it is to say that one has found the best plan in some absolute sense. For example, the different kinds of planning and evaluation activities that go into the overall military planning process have very different information requirements. They use completely different sottware tools and have different ways of interpreting and applying results. For example transportation schedulers and simulators serving similar functions require classes of inputs in terms of consumers (e.g., things to move) and resources or producers (e.g., ways to move them). Information about the resources must be of consistent kinds, and matched to the consumers. The consumers in the transportation scheduling phase, the people and equipment to be moved, are the resources in the employment planning phase. This relationship between phases of planning makes transportation schedulers or simulators useful as resource constraint checkers during employment planning, since the overall plan will not work if one cannot marshal the needed resources. However, when using a transportation scheduling tool as a resource analysis tool, different criteria, in terms of detail and accuracy, are used to gather inputs to the program. Far less detail is needed to apply a scheduler in this second role, which means that it can be used earlier in the planning process, using approximate. A very useful form of system initiative would be to automatically assist in developing inputs to a scheduler so that such preliminary analyses can take place in a timely fashion. Also, since use of a scheduler in this formative phase of the planning process is for such a different purpose, the interpretation of the result is also radically different. An acceptable result might be as simple as "doesn't use any more resources than allowed", or an indication of a critical resource shortage, instead of a detailed schedule. The system must know this context in order to present the appropriate conclusions to the users or other software agents. Another reason for working with explicit models of planning tasks is that it may be critical for effective, adaptive learning by the system and by the user. Since effective collaboration means learning to act in a coordinated fashion, user's need to have cues to guide their expectations of the each others' capabilities in different contexts, and learning takes place more effectively if the context of the activity being learned or refined is a localized task. THE ROAD AHEAD
We have tried in this chapter to "raise the ante" for research and development of M-based planning systems. The history of planning in M comes very much from the tradition of robotics; that is, providing autonomous entities with a capability to move and act in the world. But the worlds in which these planners worked tended not to change much, "fight back" at all, and certainly not actively collaborate. This world
Human-Computer Planning
view has colored much of the last 30 years of planning research. Our goal was to "open up the box" and reconsider these assumptions in the light of recently raised opportunities for virtually global electronic collaboration, and the new emphasis on cognitive technologies for interactions between people and machines. We hope and expect that many of the research areas we touched upon will become heavily discussed issues in the next few years. REFERENCES Allen, James F., and George Ferguson, 1994. Arguing About Plans: Plan Representation and Reasoning for Mixed-Initiative Planning. In: Mark Burstein, ed., Proceedings of the 1994 ARPA-Rome Laboratory Planning Initiative Workshop, 123-132. San Marco, CA: Morgan Kaufman. Alien, James F., 1994. A Perspective on Mixed-Initiative Planning. In: Mark Burstein, ed., Proceedings of the 1994 ARPA-Rome Laboratory Planning Initiative Workshop, 486-495. San Mateo, CA: Morgan Kaufman. Smith, S. F., and O. Lassila, 1994. Toward the Development of Flexible MixedInitiative Scheduling Tools. In: Mark Burstein, ed., Proceedings of the 1994 ARPARome Laboratory Planning Initiative Workshop, 145-154. San Mateo, CA: Morgan Kaufman. Sycara, K., and K. Miyashita, 1994. Evaluation and Improvement of Schedules According to Interactively Acquired User-defined Criteria. In: Mark Burstein, ed., Proceedings of the 1994 ARPA-Rome Laboratory Planning Initiative Workshop, 155-164. San Mateo, CA: Morgan Kaufman. Newell, Allen, 1990. Unified Theories of Cognition. Cambridge, MA: Harvard University Press. Wilkins, D. E., 1988. Practical Planning: Extending the Classical AI Planning Paradigm. San Mateo, CA: Morgan Kaufman.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 18 C O M M I T T E E S OF DECISION TREES David Heath, Simon Kasif and Steven Salzberg* Department of Computer Science The Johns Hopkins University, USA [email protected], edu
ABSTRACT Many intelligent systems are designed to sift through a mass of evidence and arrive at a decision. Certain pieces of evidence may be given more weight than others, and this may affect the final decision significantly. When more than one intelligent agent is available to make a decision, we can form a committee of experts. By combining the different opinions of these experts, the committee approach can sometimes outperform any individual expert. In this paper, we show how to exploit randomized learning algorithms in order to develop committees of experts. By using the majority vote of these experts to make decisions, we are able to improve the performance of the original learning algorithm. More precisely, we have developed a randomized decision tree induction algorithm, which generates different decision trees every time it is run. Each tree represents a different expert decision-maker. We combine these trees using a majority voting scheme in order to overcome small errors that appear in individual trees. We have tested our idea with several real data sets, and found that accuracy consistently improved when compared to the decision made by a single expert. We have developed some analytical results that explain why this effect occurs. Our experiments also show that the majority voting technique outperforms at least some alternative strategies for exploiting randomization. INTRODUCTION Decision trees have been used successfully for many different decision making and classification tasks. A number of standard techniques have been developed in the machine learning community, most notably Quinlan's C4.5 algorithm (1986) and Breiman et al.'s CART (Classification and Regression Trees) algorithm (1984). Since the introduction of these algorithms, numerous variations and improvements have been put forward, including new pruning strategies (e.g., Quinlan, 1987) and incremental
The authors wish to thank David Aha for providing comments and relevant references. This research was supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151, and by the National Science Foundation under Grants IRI-9116843 and IRI-9223591.
D. Heath, S. Kasif and S. Salzberg
versions of the algorithms (Utgoff, 1989). Many of these refinements have been designed to produce better decision trees; i.e., trees that were either more accurate classifiers, or smaller trees, or both. The main goal of our research is to produce classifiers that provide the most accurate model possible for a set of data. To achieve our goal, we have combined a standard method for classification--decision treesmwith two other ideas. The first idea is randomization, which in this context allows us to generate many different trees for the same task. The second idea is majority voting, which has been used with other learning methods (e.g., by k-nearest neighbor algorithms) to perform classification and diagnosis. Here we use a majority vote of k decision trees to classify examples. RANDOMIZATION IN LEARNING ALGORITHMS In previous work (Heath, 1992), we introduced a system for simulated annealing of decision trees (SADT). In that work, we explored the generation of decision trees comprised of tests that are linear inequalities over the attributes. We call these "oblique" decision trees, because the tests at each node are simply hyperplanes at an oblique angle to the axes of the attribute space. This is a generalization of standard decision tree techniques, in which each node of a tree is a test of a single attribute; i.e., a hyperplane that is parallel to one of the axes in attribute space. We showed that when generating oblique trees, finding even a single test that minimizes some goodness criteria is an NP-hard problem. We therefore turned to the optimization technique of simulated annealing to find good tests, which should generate good (i.e., small and accurate) trees. Using simulated annealing in our learning algorithm introduces an element of randomness. Each time our SADT program is run, it generates different trees. This led us to explore methods of using this randomization to advantage by generating many trees and using additional criteria to choose the best tree. Our argument is that picking a good tree out of the many solutions produced by a randomized algorithm may be preferable to using an algorithm, even a very clever one, that only produces one solution. In this paper, we explore another way of using randomization to advantage. As before, we use a single training set to generate a set of classifiers. Instead of choosing one representative tree, we attempt to combine the knowledge represented in each tree into a new, more accurate, classifier. We regard each classifier as a separate "expert" for the domain at hand, and the collection of classifiers as a committee. Although the committee members are not entirely independent (because they were generated by the same algorithm on the same training data), they are not identical either. Therefore a combination of classifiers might be able to outperform any individual. Specifically, we take a set of classifiers and combine their classifications by taking the plurality. In binary classification problems, this reduces to taking the majority. For example, if we have five trees, and three classify an example as A, while the other two classify it as B, we predict the example belongs to class A. When this technique is applied to decision trees, we call the resulting algorithm k-DT, in the spirit of k-NN, the k-nearest neighbor algorithm.
Committees of Decision Trees
The advantage of majority voting The premise behind this idea is that any one tree may not capture the target concept completely accurately, but will approximate it with some error. This error differs from tree to tree. By using several trees and taking the majority, we hope to overcome this type of error. Consider, for example, a test example x with probability p(x) of being correctly classified by a random two-category SADT tree. If we take the majority vote of k trees, the probability that x is correctly classified is
ma](k,x) - ~p(x)S(1 - p(x)) k-j j>k/2
In this equation, j represents the number of trees that correctly classify example x. We require that it be more than half of the k trees, thus the restrictions on the sum. p(x)] represents the probability of] trees getting the example correct; (1-p(x)fl'-J is the probability that the remaining trees get it wrong, k choose j simply counts the number of possible ways k trees could divide into two sets of trees, one of size ]. Figure 1 shows how maj(k,x) varies with p(x) when different numbers of trees are used for the majority. Note that for example x, taking the majority vote increases the probability of getting a correct classification ifp(x) > 0.5, but decreases it ifp(x) < 0.5. Let X1 be the set of examples in the test set for which p(x) < 0.5, and ]<2 be those for which p(x) > 0.5. If x e X~, it is to our advantage to use the classifiers directly. If, on the other hand, x e X2, taking the majority will increase the probability that we will classify x correctly. For any given test set, there will likely be points in both cases. Obviously we cannot tell, given a particular example, whether it belongs to X1 or X2 unless we know its classification. However, it is our experience that the benefit we get by increasing the likelihood of a correct classification for those examples in X2 outweighs the loss in accuracy we get on the examples in X1.
/[.. i
~D O
- " - .---~"
o ~
!-: , ,o/~
!- , , / - / /
0.6 0.4
,.'t o
0 .2
/ .~ t /
,," /! ,,'" ii
. - 9
'i Tree . . . . Trees . . . . . . '9 T r e e s . . . . . . ' 4 9 Trees ...........
..'~ /
- ......
0.2 0.4 Individual
0.6 0.8 Probability
Figure 1. Majority classification probability versus individual classification probability.
D. Heath, S. Kasif and S. Salzberg
Intuitively, it would seem that simply increasing the number of classifiers on the committee (in a majority voting scheme) should continually increase the expected accuracy of the decisions. The next example illustrates why this intuition is wrong, and how, in fact, the ideal size of the committee will vary depending on the problem. The critical factor is how many examples in the domain at hand are difficult to classify--if there are many such examples, then very small committees will be preferable. An implication of this is that choosing the appropriate value for k may be a difficult problem. We have already seen that for some examples (those with less than 50% probability of being correctly classified by the average tree), using a majority vote will lower the chances of a correct classification, and the more trees used, the lower the resulting accuracy will be. On the other hand, increasing the number of trees involved in the vote will increase the accuracy on those points likely to be classified correctly by the average tree. Normally, we would expect many domains to have a mixture of these two types of examples, some difficult to classify and some easy. When we try using a majority voting scheme on a mixture of these two types, we will get a mixed result. Consider two examples, e/ and e2. If we generate many trees, on average el is classified correctly 45% of the time, and e2 is classified correctly 80% of the time. (One can also think of el as a set of examples with the same probability of correct classification.) As shown in Figure 2, if we use a majority voting scheme, then el will rarely be classified correctly, but e2. will almost always be classified correctly. Figure 2 also shows the combined expected accuracy for the set {el, e2}. If we generate a
r.) u
. .
accuracy accuracy Combined ~ 1 7 .6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80% 45%
...... ......
-,-I 0
0.2 !
15 Number
20 of
25 30 Trees
Figure 2. Effects of majority voting on mixed data sets. series of trees and use each one to classify the two examples, we expect their average accuracy to be 62.5%. If we use majority voting, we expect the accuracy to increase up to about 68% for nine trees. However, if we use more than nine trees, the expected accuracy goes down, eventually converging to 50%. Thus for this simple example, the qr~timal.csmamittee_~z~i~nine_.
Committees of Decision Trees
For a set of examples X, where p(x) is the probability of example x being correctly classified by an average tree, it is easy to show that the average accuracy without voting is
• ~p(x) while the accuracy when an infinite number of trees are used in a majority computation is I{x ~ X , p ( x ) > 0.5}[
I;O that is, the fraction of the examples which are more than likely classified correctly by the average tree. Between these two extremes, the overall accuracy may have dips and peaks. In this paper, we experiment with majority voting using different numbers of trees. We use these experiments to empirically choose a value for k which seems to work well in practice. RELATED W O R K k-DT is one of several different strategies for combining multiple classifiers. There are two common approaches to this problem. The first approach can be thought of as multi-level learning: a set of classifiers are trained. Their outputs are fed to another learning system, which learns an appropriate weighting scheme to apply to those outputs, in the hopes of creating a more accurate classifier. Depending on the implementation, the two levels can be trained separately or simultaneously. Wolpert's (1992) stacked generalization technique and the hybrid technique developed by Zhang et al. (1992) are examples of separately trained systems. An example of a simultaneously trained system is Jacobs et al. (1991), in which the second learning level learns how to assign training examples to the different components of the first level. k-DT takes another approach. Only the first level is trained; the second level is a simple, easily understood, fixed strategy. We have used majority voting in this study, but other fixed strategies could also be used. Another system that takes this approach is the cluster back propagation network of Lincoln et al. (1990). THE SADT ALGORITHM Although the majority voting technique could be applied to any randomized classifier scheme, k-DT was first conceived as a natural enhancement of our SADT algorithm. Accordingly, all of our experiments have been conducted on the SADT algorithm. To aid in the understanding of k-DT, we explain the workings of our SADT algorithm here. The basic outline of the SADT algorithm is the same as that of most other decision tree algorithms. That is, we find a hyperplane to partition the training set and recursively run the partitioning algorithm on the two subsets that result. Here we describe how SADT searches for a good hyperplane.
D. Heath, S. Kasif and S. Salzberg
In our implementation, d-dimensional hyperplanes are stored in the form
H ( x ) = hd+1+ ~_~a=~h~x~, where H - {hi, h2..... hd§ 1} is the hyperplane, x - (Xl, x2, ..., x,/) is a point, and hd+l represents the constant term. For example, in the plane the hyperplane is a line and is represented in the familiar ax + by + c - 0 form. Classification is done recursively. To classify an example, compare it to the current hyperplane (initially this is the root node). If an example p is at a non-leaf node labeled H(x), then we follow the left child ifH(p)> 0; otherwise we descend to the fight child. The first step in our algorithm is to generate an initial hyperplane. This initial hyperplane is always the same and is not tailored to the training set. We simply wanted to choose some hyperplane that was not parallel to any of the axes, so we used the hyperplane passing through the points where x i = l and all other xj-O, for each dimension i. In particular, the initial hyperplane may be written in the above form as hi = 1 for 1 < i < d and ha+ 1 = -1 since H(x) = 0 for each of these points. Thus in 3-D, we choose the hyperplane which passes through (1,0,0), (0,1,0), and (0,0,1). Many other choices for the initial hyperplane would be equally good. Once the annealing begins, the hyperplane is immediately moved to a new position, so the location of the initial split is not important. Next, the hyperplane is repeatedly perturbed. If we denote the current hyperplane by H - {hi, h2 ..... hal§ then the algorithm picks one of the hi's randomly and adds to it a uniformly chosen random variable in the range (-0.5,0.5). Using our goodness measure (described below), we compute the energy of the new hyperplane and the change in energy AE. ,
If AE is negative, then the energy has decreased and the new hyperplane becomes the current split. Otherwise, the energy has increased (or stayed the same) and the new hyperplane becomes the current split with probability e-AwT where T is the temperature of the system. The system starts out with a high temperature that is reduced slightly with each move. Note that when the change in energy is small relative to the temperature, the probability of accepting the new hyperplane is close to one, but that as the temperature becomes small, the probability of moving to a worse state approaches zero. In order to decide when to stop perturbing the split, we keep track of the split that generated the lowest energy seen so far at the current node. If this minimum energy does not change for a large number of iterations (we used numbers between 3000 and 100,000 iterations in our experiments), then we stop making perturbations and use the split that generated the lowest energy. The recursive splitting continues until each node is pure; i.e., each leaf node contains only points of one category. Goodness Criteria SADT can work with any goodness criterion, and we have experimented with several. For detailed discussions of these measures, see Heath (1992) or Murthy et al. (1994). In this paper, we experiment with three of these criteria: information gain (Quinlan, 1986) and our own Max-Minority (MM) and Sum-Minority (SM) measures. We define MM and SM as follows.
Consider a set of examples X, belonging to 2 classes, u and v. A hyperplane divides the set into two subsets X1 and X2. For each subset, we find the class that appears least often. We say that these are the minority categories. If )(1 has few examples in its
Committees of Decision Trees
minority category C1, then it is relatively pure. We prefer splits that are pure; i.e., splits that generate small minorities. Let the number of examples in class u (class v) in )(1 be Ul (Vl) and the number of examples in class u (class v) in X2 be u2 (vz). To force SADT to generate a relatively pure split, we define the SM error measure to be min(ul ,Vl ) + min(u2 ,v2 ), and the MM error measure to be max(min(ul ,Vl ),min(u2
Classifying irises For our first experiment, we ran k-DT on Fisher's iris data, a well known dataset that has been the subject of numerous other machine learning studies (see Holte, 1993 for a recent summary). The data consists of 150 examples, 50 each of three different types of irises: setosa, versacolor, and virginica. Each example is described by numeric measurements of width and length of the petals and sepals. We performed 35 ten-fold cross validation trials using SADT. In an x-fold cross validation trial, one divides the data into x approximately equal subsets and performs x experiments. For each subset s, we train the learning system on the union of the remaining x-1 sets and test on set s. The results are averaged over these x runs. Our results on the iris data are shown in Table 1. Average Goodness Error Criterion Rate (%)
Error rate Reduction
Best Accuracy
with 11 trees
in Error
Error Rate
Number of Trees
Table 1. Iris results for k-DT. Shown in the table is the accuracy obtained when, for each training- and test-set pair, we take the majority vote of 11 trees when classifying the test set. Note that the accuracy when using the majority voting scheme is consistently higher than when using single SADT trees. Also shown in Table 1, in the last two columns, are results from the single best tree of the 35 different trials. Weiss and Kapouleas (1989) obtained accuracies on this data of 96.7%, 96.0%, and 95.3% with backpropagation, nearest neighbor, and CART, respectively. Their results were generated with leave-one-out trials, i.e., 150-fold cross validation.
Choosing a value f o r k How did we choose k=l 1 for our k-DT trees? Intuitively, it may seem that the more trees used in the voting process, the higher will be the combined accuracy. However, if an example is somehow 'difficult' to classify, then voting will only make it less likely that the example is classified correctly by the committee of trees.
D. Heath, S. Kasif and S. Salzberg
Figure 3 is a plot of average classification accuracy on the iris data set, as the number of trees in the voting process is varied. Note that there is a big jump in accuracy even when only three trees are used. The max-minority and information gain measures peak fairly early and begin to drop off, whereas the sum-minority measure is still increasing in accuracy at thirty-five trees.
o o
q4 -H wJ O
~ '.......... ~~---G~ ~....0~....~ ~ // 8""~"'~"'G ....8 .... D...~...G....G....G. .~ 'Max
'Sum m i n o r i t y . . . . . . . .
93 92
I0 15 20 25 N u m b e r of T r e e s
Figure 3. Iris classification accuracy versus number of trees. We have compromised by using eleven trees, which appears to work well in practice. Table 1 shows the average classification accuracy when using eleven trees for m
i00 80
X q4 o
60 40
20 I
0 0.2 Probability
, - -
0.4 0.6 0.8 1 of C o r r e c t C l a s s i f i c a t i o n
Figure 4. Percentage of iris examples achieving a given accuracy.
Committees of Decision Trees
voting. Also shown is the classification accuracy for the optimal choice of k. (The optimal choice in the table is limited by the number of cross validation trials we have run, since we only had that many trees to work with). The choice of 11 trees worked well for the iris dataset. The accuracy obtained with this number of trees was at least as good as any other number of trees we tried for two of the goodness measures and still quite good for the third. At this point, it is worth considering whether these results are to be expected. For each example x in the iris data set, we computed the percentage p(x) of times it was correctly classified in our tests. Figure 4 shows, for a given percentage p, the fraction of the examples for which p(x) - p. (Note that the figure is an average over all three goodness criteria). This gives us a rough estimate on the probability of the average tree classifying that example correctly. First, note that a vast majority of the examples are always or nearly always classified correctly. Approximately 4.4% of the examples are predicted correctly less than half of the time. These are the examples that we would expect to be classified incorrectly if we were to take a majority vote over a large number of trees. We note that this percentage is close to error rate obtained with kDT. The suggestion is that our majority voting scheme is obtaining very close to the maximal accuracy possible for this data. Applying k-DT to cancer diagnosis For our second experiment, we chose a dataset that has been the subject of experiments that classified the data using oblique hyperplanes (Bennett and Mangasarian, 1992). This dataset contains 470 examples of patients with breast cancer, and the diagnostic task is to determine whether the cancer is benign or malignant. The input data comprised nine numeric attributes, hence our decision trees used oblique hyperplanes in 9-D. Mangasarian's method uses linear programming to find pairs of hyperplanes that partition the data. The algorithm finds one pair of parallel hyperplanes at a time, and each pair can be oriented at any angle with respect to all other pairs. The resulting model is a set of oblique hyperplanes, similar in spirit though very different in structure from an oblique decision tree. Because Mangasarian received the data as they were collected in a clinical setting, their experimental design was very simple. They trained their algorithm on the initial set of 369 examples. Of the 369 patients, 201 (54.5%) had no malignancy and the remainder had confirmed malignancies. On the next 70 patients to enter the clinic, they used their algorithm for diagnosis, and found that it correctly diagnosed 68 patients. We used 68/70 = 0.97 as a rough estimate of the accuracy of Mangasarian et al.'s method. They then re-trained their algorithm using the 70 new patients, and reported that it correctly classified all of the next 31 patients to enter the clinic. Mangasarian reported that his program's output was being used in an actual clinical setting. Using the same dataset with a more uniform experimental design, Salzberg (1991) reported that the EACH program, a nearest hyperrectangle classifier, obtained 95% classification accuracy, and 1-nearest-neighbor had 94% accuracy. The results of our tests on this data are shown in Table 2. The average values are the average of 36 ten-fold cross validation trials. Once again, the accuracy obtained by using an 11-tree committee of classifiers is consistently higher than that of the average tree. In this example, the SM goodness criterion did quite a bit better on average than
D. Heath, S. Kasif and S. Salzberg
the other two, but it benefitted less from the use of the majority technique. It is possible that by taking the majority, we are able to overcome weaknesses in the other two criteria that are not as significant with SM. Average Error rate Reduction Goodness Error Criterion Rate (%)
with 11 trees
in Error
Best Accuracy Error Rate
Number of Trees
Table 2. Breast cancer malignancy diagnosis with k-DT. We also see that using eleven trees is a good choice for this dataset as well. Only for the max-minority energy measure was there a noticeable difference in accuracy between the optimal choice for the number of trees and our choice of eleven.
Identifying stars and galaxies In order to study the performance of k-DT on larger datasets, we ran several experiments using astronomical image data collected with the University of Minnesota Plate Scanner. This dataset contains several thousand astronomical objects, all of which are classified as either stars or galaxies. Odewahn et al. (1992) used this dataset to train perceptrons and backpropagation networks to differentiate between stars and galaxies. We did not have access to the exact training and test set partitions used by Odewahn et al., so we used a cross validation technique to estimate classification accuracy. The Odewahn et al. study used a single training/test set partition. Although our results may not be exactly comparable to theirs, we include them to show that both learning methods produce similar accuracies. Our results were generated by averaging 19 ten-fold cross validation trials. The astronomy dataset consists of 4164 examples. Each example has fourteen realvalued attributes and a label of either 'star' or 'galaxy'. Approximately 35% of the examples are galaxies.
Average Error rate Reduction Goodness Error Criterion Rate (%)
Best Accuracy
with 11 trees
in Error
Error Rate
Number of Trees
Table 3. Star/galaxy classification with k-DT.
Committees of Decision Trees
Classification results are shown in Table 3. Odewahn et aL (1992) obtained accuracies of 99.7% using backpropagation and 99.4% with a perceptron. It appears, however, that their results were generated with a single trial on a single partition into test and training sets. In fact, we obtained a ten-fold cross validated accuracy of 99.1% using a perceptron. (We had individual runs in which we obtained even higher (99.8%) accuracy, but our average results are a better estimate of the true accuracy.) Using a majority classifier increased classification accuracy for this data set, as in the other studies. For the max-minority goodness criterion, we were able to reduce the error rate by almost 60%. Using eleven trees for the majority classification was a good choice for this dataset. The results for eleven trees were at least as good as for any other number of trees (up to fifteen, the number of cross validation trials we ran).
Comparison with other methods In Heath (1992), we explored several techniques of taking advantage of randomization in learning algorithms. Our focus in that work was on techniques that generate many trees, and use some additional criteria to select the best tree, which we then measure on the testing set. In this section, we compare those techniques to the majority classification technique. Goodness Dataset Iris
Test Set
Malignancy SM
Table 4. Error rates ofk-DT compared to other methods. One of our criteria for choosing the best tree was to choose the smallest trees. The intuition behind this technique is that smaller trees may be more concise descriptions of the problem domain, less sensitive to noise in the training data, and have a lower chance of being generated through overtraining. In addition, smaller trees are easier to understand by domain experts, and therefore more likely to be adopted. For each of the ten pairs of training and testing sets in a ten-fold cross validation, we generated several SADT trees, and then chose the smallest. We then averaged the accuracy and size of
D. Heath, S. Kasif and S. Salzberg
the ten chosen trees. If, for given training and testing sets, there was more than one smallest tree, we averaged them, before averaging them with the other nine. In another experiment (Heath 1992) we split the training set 70/30 and trained only using 70% of the training set. The other 30% was used as a second test set. We used it to test the tree and assign it a figure of merit. We ran this several times, choosing different 70/30 splits each time and choosing the trees with the highest figures of merit. We then tested those trees on the real test set. In Table 4, we compare k-DT with these two approaches. All three techniques gave some improvement in accuracy, although the method of choosing trees by size was not very consistent. In some cases, small trees were actually worse than average trees, kDTs always performed better than separate tests set up to judge trees. It nearly always performed better than picking the smallest trees. The only exception to this was for two goodness criteria used on the iris data. CONCLUSION We have explored the idea of taking advantage of the randomization inherent in some recent machine learning techniques, by generating a committee of classifiers and combining them with a majority voting scheme. We first observed that committees can in some situations produce consistently better classifiers than a single system operating alone. However, the optimal size of the committee depends on the domain, and must be determined empirically. We have experimented with this technique on SADT, a randomized oblique decision tree algorithm. Our results show that committees containing a relatively small number (10-15) of SADT decision trees consistently perform better than average SADT trees, which in turn perform better than standard axis-parallel trees (Heath, 1992; Murthy et al., 1994). The consistency and degree of improvement is better than other techniques we have considered for increasing accuracy through randomization. This work is still in its early stages; we have not tried to apply the majority technique to other types of randomized learning algorithms. However, this is a clear opportunity for future experiments. We would also like to explore combining this technique with other ideas for improving classifiers. For example, we would like to try the majority technique on trees which are smaller than average, to see if we can get any further improvements in accuracy. We also plan to explore constructing committees with a more diverse group of classifiers, including not only decision trees but also memory-based classifiers, statistical classifiers, and other methods that may be appropriate. REFERENCES Bennett, Kristin, and Olvi Mangasarian, 1992. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1: 23-34. Breiman, Leo, Jerome Friedman, R. Olshen, and C. Stone, 1984. Classification and Regression Trees. Belmont, Massachusetts: Wadsworth International Group. Heath, David, 1992. A Geometric Framework for Machine Learning. Ph.D. thesis, Johns Hopkins University, Baltimore, Maryland.
Committees of Decision Trees
Holte, Robert, 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11: 63-90. Jacobs, R., M. Jordan, S. Nowlan, and G. Hinton, 1991. Adaptive mixtures of local experts. Neural Computation 3: 79-87. Lincoln, W., and J. Skrzypek, 1990. Synergy of clustering multiple back propagation networks. In David S. Touretzky, ed., Advances in Neural Information Processing Systems 2, 650-657. San Mateo, California: Morgan Kaufmann. Murthy, Sreerama, Simon Kasif, and Steven Salzberg, 1994. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research 2: 1-33. Odewahn, S.C., E.B. Stockwell, R.L. Pennington, R.M. Humphreys, and W.A. Zumach, 1992. Automated star-galaxy discrimination with neural networks. Astronomical Journal 103(1): 318-331. Quinlan, J. Ross, 1986. Induction of decision trees. Machine Learning 1:81-106. Quinlan, J. Ross, 1987. Generating production rules from decision trees. Proceedings of Tenth International Joint Conference on Artificial Intelligence, 304-307. San Mateo, California: Morgan Kaufmann. Quinlan, J. Ross, 1993. C4.5: Programs for Machine Learning. San Mateo, California: Morgan Kaufmann. Salzberg, Steven, 1991. A nearest hyperrectangle learning method. Machine Learning 6:251-276. Weiss, Sholom, and I. Kapouleas, 1989. An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the International Joint Conference of Artificial Intelligence, 781-787. San Mateo, California: Morgan Kaufmann. Wolpert, David, 1992. Stacked generalization. Neural Networks 5:241-259. Zhang, Xiru, Jill Mesirov, and David Waltz, 1992. A hybrid system for protein secondary structure prediction. Journal of Molecular Biology 225:1049-1063.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 19 A LEARNING ENVIRONMENT TO TEACH PLANNING SKILLS Roger C. Schank & Sandor SzegO The Institute for the Learning Sciences* Northwestern University, USA [email protected]
INTRODUCTION People differ from each other. They are interested in different activities, they know different things, they learn at different rates, and so forth. Everyone knows this simple fact, yet, schools of today pretend it is not the case. Schools force students to learn the same things, to learn them at the same pace (or risk not learning them at all), and to learn them the same way. There is something horribly wrong with this picture. It should come as no surprise, then that after twelve years of schooling most students are turned off by the idea of learning, and they remember very little of what they were supposedly taught. Most of what they do know after highschool is information and skills they picked up outside the classroom. Educational theorists (e.g., Dewey, 1916) realized the problem a very long time ago. But their arguments for change remained just arguments, and schools continued to do their business with no or very little progress over the last eight decades. The main reason for this resistance to change can be found in the economical constraints on the modern schooling system. Before the current school system was invented, one-onone tutoring and apprenticeship were the main forms of education. But, to make education accessible to large masses, these paradigms had to be abandoned, and the modern school system with its 30:1 student-teacher ratio was invented. Newly emerging technologies can finally provide the substrate for real progress in education. One of our missions at the Institute for the Learning Sciences (ILS) is to create learning environments which allow students to learn what they want and when they want it. The challenge for educators in this newly re-emerging one-on-one tutoring paradigm is to understand the real needs of the society, and design learning environments to prepare students for those needs. In this paper we describe a simulated learning-by-doing environment to teach planning skills. In the first section we describe what we mean by planning (or problem solving). We contrast our idea of problem solving with traditional ones. The next The Institute for the Learning Sciences was established in 1989 with the support of Andersen Consulting. The Institute receives additional support from Ameritech and North West Water, Institute Partners
R.C. Schank and S. Szeg,5
section outlines the skills we associate with the thus defined planning. Then we introduce a class of learning environments, Goal Based Scenarios, that can teach the skills identified in the previous section. Finally, we briefly describe the system we developed at ILS. P R O B L E M SOLVING IN THE REAL WORLD Real-life vs. classical problem solving There are a number of tutoring systems teaching problem solving skills. For example: Tutoring systems for programming and geometry (Anderson & Reiser, 1985; Anderson, Boyle & Yost, 1985; Reiser, et al., 1991). These systems assume that the process of problem solving is a two stage process; first, planning the solution to the problem, where the output is a list of actions, and second, executing the plan. Furthermore, they assume that the second stage, execution, is so simple that it need not be taught. Hence, traditional tutoring systems focus on teaching the first stage only. However, real-life problem solving, such as managing a farm, is not a two stage process and cannot be grouped with traditional problem solving models, because they do not share the following basic features. The current state and the goal state. In classical problem solving domains the problem statement very often describes both the current state and the goal state, and the current situation is frequently described by stating the relevant features only. Let us look at a fairly typical problem taken from geometry: "Design a triangle if two sides and the enclosed angle are given." Every piece of information in this statement refers to a component that the problem solver needs to use in the solution. Real-life goals, on the other hand, are rarely, if ever, well defined. In fact, it is up to the problem-solver to identify the goal or goals he wants to achieve or to realize that there is a problem to solve. For instance, a farmer is not told that the rainy season is approaching, which might cut the harvest short. He has to select the important features of the situation and then determine whether or not the current situation is a problem. Should the weather forecast be consulted to determine how much labor to hire for harvest? How about the weather records? Is the cost of the extra labor something to worry about? etc. Goal importance. In classical problem solving all goals are equally important. If a problem requests the construction of a triangle with sides a and b and angle c, the problem solver cannot simply say: "I don't know how to construct such a triangle, but I do know how to construct a triangle with sides a and b'. That should be good enough." But this type of answer is very often the only one we can give in real-life problem situations because we otlen pursue multiple goals at once, to later realize that some of these goals are impossible, while others are in conflict with each other. Since our goals are not all equally important (i.e. there is a value assigned to real-life goals), we might redefine or abandon less crucial goals during the solution process to achieve more important ones. Effects of actions. The operators in mathematics, computer programming and other traditional problem solving domains have clearly determined applicability conditions and effects. To give an example from geometry again, two distinct points always determine a line. Unlike mathematics and programming, the world we live in is a very complex and dynamic system; Therefore, apart from the simplest situations, the mechanics of actions we take can never be completely understood. For instance,
A Learning Environment
fertilization depends on so many variables, i.e. the amount of moisture distribution in the soil, the consistency of the soil, what other chemicals are present, etc., that no one can predict with certainty what its effects will be. Single vs. multiple agents. In classical problem solving domains the problem solver is the only agent who can change the state of the "world"; he needs not worry about interference from other unpredictable agents. In geometry, lengths of line-segments do not change randomly once they are drawn, but in the real world it is hardly ever the case that a situation is completely under our control. A farmer needs to worry about factors, such as how the market values different crops, or what the weather will be like in the near and not so near future; Factors that are beyond his control. Correct vs. incorrect plans. A solution to a classical problem is a sequence of actions that transform the initial state to the goal state. Since the initial and goal state as well as the operators are well defined, it is possible to label a plan as "correct" or "incorrect" without executing it. However, the notion of correctness in real-life is problematic. If we measure correctness of a plan by its outcome, then solutions to reallife problems can be labeled "correct" or "incorrect". But we cannot assess the success of a plan by simply examining the sequence of actions it contains. This assessment can only be done after the plan is carried out.
This measure of correctness is problematic also because it is very simplistic. It does not take into account that goals might be redefined during execution, even though we all know that real-life problem solving is very ot~en about making compromises. Nor does it acknowledge that the same plan might work (i.e. classified as correct) under one set of circumstances, and fail under others. Thus, assigning "correct" or "incorrect" labels to plans makes little, if any, sense. One consequence of these differences between classical and real-life problemsolving is that planning and execution cannot be treated as separate issues. First, the planning stage cannot produce a simple list of actions that can easily be executed during execution time. Rather, the output of planning is a mixture of guidelines, concrete actions and actions contingent on conditions not known at planning time. For example, a farmer can "plan" to plant wheat, fertilize and, if enough funds are still available, hire some labor to kill weeds, etc. Not only are these actions incompletely specified, but there are no guarantees that they will be carried out at all; unforeseen events might introduce some new actions to carry out instead of the planned ones. Second, execution is clearly a much more involved process than what is assumed by classical approaches. As goals get redefined during execution, the existing plan must be modified, and actions are further specified. In summary, the nice plan-execution model of problem-solving breaks down when applied to realistic situations and problems, because planning and execution are very often interleaved. In fact, a large part of being a good planner involves knowledge about when to plan, how much to plan, how to determine needs to change plans and revise goals (Pea, 1982). The ramifications of these observations for learning environments will be made clear in later sections of this paper. REAL-LIFE PLANNING SKILLS Solving problems in realistic situations require a huge amount of knowledge on the part of the problem-solver. The question is: What type of knowledge does the problem-solver need to know to be successful?
R.C. Schank and S. Szeg6
Knowledge taught in traditional tutoring systems Tutoring systems for classical problem-solving attempt to teach two types of knowledge: 9 Operators. In order to be good problem-solver in a domain, one needs to know when certain operations can be applied and what their effects are. While it is very important to know the mechanics of the domain, this knowledge cannot be applied in other domains. 9 General weak-methods. Hill-climbing, backward-chaining are just a few of the strategies that these systems tend to teach. While these strategies are general enough to be applied in other domains, their application is very hard precisely because they are too general.
Knowledge needed for real-life problem-solving Factual domain knowledge and weak-methods form only a very small subset of the types of knowledge we use in real-life planning and problem-solving. First of all, knowledge about the applicability and effects of actions cannot be packaged in well defined operators, due to the complexity and uncertainty of the world. Moreover, very specific domain operators would not allow for any meaningful transfer of planning skills. So, in addition to knowledge about operators and weak-methods, a good planner needs to possess the following types of knowledge as well. Cases Good problem-solvers know much more than the "operators" of the domain. For example, a good farmer knows a lot of cases about farming. He might remember the year when high yield expectations early in the year were nullified by a very short harvest season. The short harvest season case then can be used to decide when to start harvest this year. Of course this means that the farmer needs to know which case is applicable in a given situation. In general, cases can be used when nice, explicit theories of the world and actions do not exist. In the complex world of our every-day actions, they are our best bet to guide us in our decisions. A good problem-solver has a large library of cases, and knows when to apply them in the reasoning process (the indexing problem). Typical problem situations Related to cases, but possibly more general is the knowledge of typical problem situations. While eases are usually very domain specific, these problem situations are (or can be) described in more general terms. A good problem-solver can use his knowledge of problems in one domain to novel problems in another domain. To do so, problems need to be characterized in somewhat domain independent terms. For example, a farmer might know that occasionally long dry-spells occur in the region, which are especially dangerous for wheat growing. Therefore, dependence on wheat as the only crop can result in devastating outcomes for the farming operation. This knowledge can be cast in more abstract terms. The farmer might know that relying on a crop that can have very low yield under circumstances beyond his control can be devastating. The more abstractly he can construe the current problem, the more knowledge he can use to solve it.
A Learning Environment
Strategies to deal with problems Of course, knowing classes of problems would be useless if we did not know how to deal with them. So, a good problem-solver must have a large collection of general planning strategies and links to the problem-types they address. To use the example above, a farmer might know that planting soybean and wheat, while less profitable on average, might prevent the possible disastrous outcome of planting wheat only. HOW CAN WE TEACH GOOD PLANNING SKILLS? In Engines for Education (Schank & Cleary, 1994) we argued that the best way to learn any skill is by doing it. However, sometimes it is too expensive, dangerous or simply impractical to put novices in real situations where these skills can be learned. (Flying an airplane is clearly one of these activities.) As an alternative to real situations, we can create realistic simulations of the environment. The best example of this approach is the flight simulator. It recreates the touch and feel of the cockpit to the last detail, and TV screens in the windows of the cockpit provide scenery as you would see it from the simulated position and orientation of the aircra~. Though realistic, simulators alone do not make good learning environments. While they allow learners to get actively engaged in the task they are supposed to learn, they do not automatically satisfy other requirements of good learning environments. Besides providing a realistic task environment, good learning environments must: 9 Provide a goal that is inherently interesting for the learner to pursue; 9 Enable learners to fail in the activity. Failures prompt them to ask questions, which then leads to learning (Schank, 1982); and 9 Provide coaching when the learner needs it. Very omen failures are good opportunities to impart some knowledge the learner needs. Other times, coaching can guide the learning process to avoid floundering, or to direct the learner towards more interesting issues in the domain. Goal Based Scenarios (GBSs), developed at the Institute for the Learning Sciences, are such learning environments. The underlying principle of GBSs is that in order to teach some set of skills the student should be allowed to play the role of a person who would need those skills. This way students (1) can practice the skills they need (i.e., they can learn by doing), (2) they can practice them in a realistic situation and (3) the activity is meaningful and motivating. For example, in order to teach some skills associated with genetics (e.g., determining the likelihood of having an off-spring with a given gene) the student should play the role of a person who needs this skill (e.g. a genetics counselor). While the alternative of making students solve text-book problems in genetics allows them to practice the required skill, it falls short of motivating the activity and it does not provide the rich context that enables good indexing of the learned skill. That is, a GBS not only teaches how to do something, but it teaches the circumstances when the skill is useful as well. Teaching a skill in a realistic environment results in richly indexed memory structures. This, in turn, means that these memory structures will be recalled and used exactly when they are needed. Together with the realistic role, students need a clear and interesting mission that identifies the goals they should achieve. The mission of a GBS needs to fulfill two major requirements. First, it should be clear to the student when the objectives of the
R.C. Schank and S. Szeg6
mission are achieved. Vague goals do not suggest ways to achieve them, which can cause a student to flounder and to give up prematurely. Second, the mission should make clear that the target skills are applicable outside the scope of the specific GBS. Specific requirements for a planning GBS To identify the particular requirements of a planning GBS we need to address two issues. First, we need to determine a coherent set of activities where planning skills are required. Second, we need to identify the coaching methods the system has to support. Activities First of all, good planners can create and execute plans in many related domains and under many different circumstances. That is, a good farmer, but not necessarily a good planner, might be able to run a particular wheat farm very well. Yet he,~might not be able to run a farm in a different part of the country, or he might be able to farm no crops other than wheat. On the other hand, a good farmer who is a good planner at the same time, can farm in different parts of the country and he can farm different crops. Thus, the first requirement of the planning GBS is that the student should be able to practice and learn planning skills in multiple domains. How different or similar should these domains be7 There is no simple answer to this question. It is clear that the domains or circumstances should not be arbitrarily dissimilar. A good farmer is not necessarily good at making verbal arguments about farming. While both activities require good planning skills, the domains are so different that the shared planning knowledge must be highly abstracted and far removed from the actual context of the activity 1. And, as we pointed out earlier, the context of the activity has a lot to do with why learning by doing works in the first place. So, to answer the question, it is not the domains, but the structure of the activity that is important to ensure that useful planning skills can be learned. For example, farming and trucking might be vastly different domains, but the activity of a farmer and an owner of a trucking company have a lot in common. They both have to decide where to start their operation, what types of tools to buy, etc. As a result, they both have to ask very similar questions to make these decisions: Can they afford to operate a big farm/company7 Will the tools they are about to buy have enough capacity to serve the needs of the farm/company? etc. In summary, a planning GBS should enable the learner to perform the same activity under varying circumstances and in different domains. Coaching The coaching component of the planning GBS has to present cases, provide help in detecting a problem, support reflection on the problem-solving process, and make connections between the different domains and circumstances. We will discuss each of these functions in more details. Case Presentation. We discussed earlier that good problem-solvers have a large library of well indexed cases to use when faced with a new problem, so a tutoring system should present relevant and important cases when the learner needs them. 1That there is sharable abstract knowledge is without a doubt. For example, "Don't put all your eggs in one basket" makes perfect sense both in farming and verbal arguments aboutfarming.
A Learning Environment
Presenting cases involves two separate issues: When should the system present a case, and which case should it present? People learn a lot by failing; consequently, cases should be presented when some expectations of the learner are violated 2. The case-presenter should be able to track the learner's expectations and intervene when they are violated. But the learner's expectations are idiosyncratic, so tracking them is not possible in general. In spite of this handicap, some methods of detecting expectation failures can be employed. For example, the system can explicitly ask the learner's expectation about some issue or it can identify the goals of the learner, and detect when they fail. Since people expect to achieve their goals, a goal failure is a very good indicator of an expectation failure. Once an opportunity to intervene is detected, the system needs to decide which case to present. Cases can be used to achieve different effects in the learner, so identifying the most appropriate case must involve the teaching goals of the system. Depending on the actual situation, a case might be used to show that the learner could not have done anything better. In essence, these cases convey the "that's life" idea to the learner and encourage them look beyond the current failure. A novice farmer, for instance, might think that he could have done something to prevent his wheat from falling to a disease. Presenting a story where an expert farmer discusses the same or similar situation would help the learner understand that there are certain problems that cannot be solved. Another type of case can show how to deal with a problem. For example, a different story might show a farmer who saved some of his wheat by using some less known fungicide. This can prompt the learner to learn more about pesticides and search for one that can cure the disease. A case can also be used to show the scope of the problem. A story where instead of wheat, corn fell to the same type of disease, might convey the idea that the current problem is not specific to wheat, but it is common to some set of crops. Finally, a case can be used to help the learner identify the cause of the problem (i.e. credit or blame assignment). A story might talk about a farmer who uses a certain type of pesticide at the time of planting to prevent the disease. This case can help learning at two levels. First, the student can learn that using the given technique can prevent the disease. Second, the more general idea of prevention vs. treatment can be conveyed through this example.
Problem detection. In domains where we are very knowledgeable, problems are fairly easy to detect, but if we have less experience in a domain, we might not have clear expectations about the world. A novice farmer, for example, might set his expectations too low or high with respect to wheat yield. Good problem solvers can use very good strategies to identify problems even in domains that are less familiar to them. To check their expectations they might look at other agents who perform the same or similar activity, or they might study the history of the domain to see how others performed, and adjust their expectations accordingly. A good tutoring system should encourage and scaffold the use of these and other problem-detection strategies. Reflection & articulation. In order to "separate" strictly domain and context bound knowledge from more general knowledge, the system should enable learners to articulate their problems and to reflect on what they have done.
2Another possible use of cases is to show some surprising outcome that the learner might not have expected. While this use has its merits, we are not going to discuss it in this paper.
R.C. Schank and S. SzegO
Connecting domains. To facilitate the emergence of more general planning knowledge, coaching needs to connect mistakes and situations the learner encountered in different domains. During this process the system should give a common vocabulary that is useful to describe the problems and situations in these domains; furthermore, it should help the usage of this vocabulary during reflection and articulation.
Based on the principles laid out above we have started to develop a goal based scenario to teach planning skills. The current version implements a farming scenario where students need to run a farm, but the architecture supports other production scenarios as well. Below we describe what production scenarios are, how the Farmer GBS is implemented, and how the system realizes the coaching principles outlined above. Production scenarios
People often carry out tasks to produce something. The task might be the production of some service (e.g. trucking, mail delivery) or materials (e.g., food products, crops, other material goods). These tasks share a lot of features in common; they all require some location(s) and tools to produce the product. Decisions about where to locate the operation and what tools to use is done fairly rarely. Thus, tools and the location are typically called "fixed inputs" of the production. In farming, fixed inputs are the farm itself, tractors and combines, irrigation systems, etc. In trucking, fixed inputs are the main garage, the trucks, gas stations, etc. Fixed inputs are necessary but not sufficient to produce a product; we need the components from which it is made, and the energy to make it. The components and energy together are called variable inputs of the production system, because the easiest way to vary the produced amount is by changing any of the variable inputs. In farming, variable inputs are fertilizers, irrigation (water & energy), seeds, etc. In trucking, they are labor, gasoline, etc. The task of production is a cyclical execution of the following steps: determine production quantity, determine problems with production, identify causes, determine possible actions, select and execute actions. The set of actions in a production scenario are fairly limited: change variable input(s), acquire new tools, repair tools, and maintain tools. Production tasks are interesting for our purposes because they are carried out in a very complex environment. That is, how much gets produced is only partially determined by the levels of variable inputs and fixed inputs. Environmental factors can influence production beyond the producer's control. For example, too much or too little rain in the crucial period of crop growth can adversely affect the final yield. Similarly, changing road conditions and road regulations can affect a trucking company's production. In summary, production of some product in a complex world has all the features of real-life problem situations and therefore it is a good candidate for teaching real-life planning skills. The Farmer GBS: an example The mission and the cover story. In the Farmer GBS, the student is given the role of
a farmer. He is told that his farm has been in his family's possession for centuries.
A Learning Environment
Now, a newly emerging oil-company wants to buy the land to use it for storing waste products. Unless he can turn the farm into a very profitable one in the coming 5 years, the local bank will repossess the land and sell it to the company. He is also told that he will get some help from Bob, the foreman of his farm. After the introduction, the foreman greets the student (George in this example) and recommends that he plants wheat on the farm because wheat is usually the most productive crop on this farm. Figure 1 shows the screen as the foreman introduces himself. At this point the student can do one of the following: ask for information about the foreman's suggestion, go with the suggestions, or ignore them, and determine tasks from scratch. Since the student has little knowledge about farming, he decides to follow the foreman's advice. He then decides to advance the simulator by three months to check on his farms progress. After three months the student's farm does not look too promising (Fig. 2). The foreman greets the student again and informs him that some disease attacked the wheat (Fig. 3). Bob suggests to use a particular type of pesticide, but alter further inquiry about the pesticide (its effects and cost), the student decides not to use it because it would not be cost effective. Eventually harvest time arrives, and the student is informed that the yield is so low that he has no profit alter the first year. At this point some tutorial advice becomes available, thus the "Ask Expert" button is highlighted. After clicking on the "Ask Expert" button, the strategy expert summarizes the problem (i.e. the wheat yield was low because of a disease), and suggests a possible strategy to use in the future to avoid this problem. Figure 4 shows the initial assessment of the problem, as well as some questions the strategy expert is ready to answer.
The foreman The role of the foreman is three-fold. First, it motivates activities. A novice in the given domain (farming) might not be able to determine which actions are important to focus on. The foreman can identify tasks to consider, and the student can decide to leave them as they are, tweak them or completely ignore them. The second role of the foreman is to help the student focus his attention on important aspects of the situation. Finally, the foreman can help the student to detect problems with the production. It can either directly tell the student the problem (as we saw in the above example), or suggest activities that can help determine if there is a problem.
Other farmers The role of the simulated farmers in this system is also three-fold. They provide good standards for the students to measure his progression against. For example, if the student sees that other farmers' crop is growing really fast while his is not, he might start looking for a problem. Another role of the other farmers is to show alternative ways of performing the task. This not only helps the student to learn more techniques in the domain, but it can also help him form generalizations. Finally, other farmers can provide extra motivation, by posing as competitors.
The simulator In the real world, events happen outside our control, so if we want to teach someone to deal with a given situation, we might have to wait a very long time before
R.C. Schank and S. Szeg6
[---,~ ~
~ ~]
Figure 1. The student' s first view of the farm. In the foreground we can see the foreman as he introduces himself and suggests a task. In the background we see the student's (George) and two simulated farmers' farms (Beth and Bill).
A Learning Environment
Figure 2. The farming scene after three months. The student's (George's) farm shows signs of some problems.
R.C. Schank and S. Szeg6
Figure 3. The foreman directs the student' s attention to the problem at hand.
A Learning Environment
3 31
_. ~ ' ~
~ ,---.4 o.2 ,...~
Figure 4. The system provides an explanation of the failure and offers some suggestions. Coaching in the Farmer GBS The above example highlights the different types of coaching the system supports.
R.C. Schank and S. Szeg6
that situation occurs. Simulated worlds are much better in this respect. If we decided that we want someone to experience a bad situation, we can play "God", and produce that bad situation. This way the student can be exposed to more interesting situations in much less time than it would be possible in real-life. This is exactly what major airlines use flight-simulators for. However, playing the role of "God" is a doubleedged sword. Imagine what would happen, if someone, who knows nothing about airplanes, would try to fly an airplane in a simulator. If she was constantly put in dangerous situations, she might come away from the simulator knowing nothing about airplanes, and with the conviction that anyone who flies must be crazy. And who would blame her? The lesson of all this is that playing the role of "God" might be useful, but this technique should be used with care.
Experts The system contains three simulated experts to help the student in the domain and to provide general problem-solving assistance. The agriculture expert provides information on different farming techniques, tools and other agricultural resources (different fertilizers and pesticides). The economics expert helps the learner determine the monetary costs and possible benefits of his actions. The strategy expert assists the learner to solve the problem at hand (see example)
Task guidance The simulated environment and the task are complex enough that at certain times the student might not know what to do next. To reduce frustration and floundering, the system provides task guidance. The student can always press the "Now What?" button, which provides some specific goals to pursue. Moreover, the interface contains context sensitive task buttons as well. These buttons enable the student to perform tasks that make sense in the current context. Hence, if the student is unsure of what to do in a situation, a look at the buttons can provide the needed assistance. CONCLUSIONS AND FUTURE W O R K This paper described a framework for developing learning environments for teaching planning skills. We attempted to show how multimedia can be used to implement pedagogically and cognitively sound intelligent tutoring systems. The combination of different media (graphics, video, text) not only enables the development of more realistic simulations, but it provides opportunities for delivering knowledge more effectively when the learner needs it.. Currently we are considering to continue and expand this work on two fronts. First, simulators for scenarios other than production should be developed. While production is a frequently occurring activity, it by no means covers the entire spectrum of activities where planning skills are needed. For example, navigation in an uncertain world obviously requires real-life planning skills, which are not covered in the production scenario. Our most important goal is to create an authoring tool that can instantiate production scenarios in many different domains. The most important motivation for
A Learning Environment
this is that there are way too many student interests and domains, and creating a planning GBS for each from scratch would make little sense. We can do better than that. Moreover, tools can enable educators to develop new learning environments with relative ease and without any additional programming. REFERENCES
Anderson, John. R., and B. J. Reiser, 1985. ALISP tutor. Byte 10: 159-175. Anderson, John.R., C. F. Boyle and G. Yost, 1985. The Geometry Tutor. Proceedings of IJCAI-85, 1-7. Los Angeles, CA: IJCAI. Dewey, John, 1916. Democracy and Education; An Introduction to the Philosophy of Education. New York: Macmillan. Pea, R.D., 1982. What is planning development the development of?. In: D.L. Forbes and M. T. Greenberg, eds., Children's planning strategies, 5-27. San Francisco, CA: Jossey-Bass. Reiser, B. J., D. Y. Kimberg, M. C. Lovett and M. Ranney, 1991. Knowledge representation and explanation in GIL, an intelligent tutor for programming. In: J. Larkin and R. Chabay, eds., Intelligent Tutoring Systems and Computer Assisted Instruction: Shared Issues and Complementary Approaches, 111-149. Hillsdale, NJ: Erlbaum. Schank, Roger C., 1982. Dynamic Memory: A Theory of Learning and Reminding in Computers and People. Cambridge, England: Cambridge University Press. Schank, Roger C., and C. Cleary, 1994. Engines for Education. The Institute for the Learning Sciences, Northwestern University. In print.
This Page Intentionally Left Blank
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 20 C O G N I T I V E T E C H N O L O G Y AND D I F F E R E N T I A L T O P O L O G Y : THE I M P O R T A N C E OF SHAPE FEATURES Tosiyasu L. Kunii* The University of Aizu Aizu-Wakamatsu City, 965-80 Japan [email protected]
ABSTRACT Facing the critical moment of entering into the era of information superhighways, we have to be well prepared to control the flood of information, by cognizing the human need, as well as watch out for the danger of being washed off the shore of human leadership into the ocean of information. The key to success is establishing a cognitive technology that will let the knowledge, either inside computers or accessible through information superhighways, match the level of human cognition through abstraction. As a case study, the most dominant information for human beings, namely visual information, is selected to illustrate the application of essential cognitive technology for abstracting the key features of visual information, in particular of shape information, such as singularities and, more generally, the use of differential topology. Through concrete examples worked out over a couple of decades, I will show, in the case of visual cognition, how the most effective technologies also are the most abstract. The examples include the cognition of the features of geographical terrain surfaces for efficient planning in backpacking, of expert techniques in martial arts (shortening the learning time from 3 years to 30 minutes), and of an effective guidemap generation method to present multiple views of the place of guidance, based on manifolds instead of on a single view. INTRODUCTION Imagine how much we can broaden our mental world by realizing the types of computer systems built to match human cognition, or better, to enhance human cognitive activities. Any job or product we' re working on to get delivered is the result of a lot of mental work: cognizing what is required, determining what design and
* Without the heartful encouragement of Professor Barbara Gorayska, this paper could not have been written. The research is partially supported by the MOVE Consortium, the Fukushima Prefecture Foundation for Promotion of Science, the Aizu Area Foundation for Promotion of Science and Education, and also by the Top-down Education Projects of the University of Aizu.
T.L. Kunii
implementation will be satisfactory, and deciding what will be necessary to prevent the output from becoming outdated. So far, most computer systems have been either joboriented or product-oriented. As long as they deliver jobs or products properly, they are considered satisfactory. Human cognition has never been the master of the scene. Only recently, such aspects as visual human interfaces have been considered; but still, human cognition has been subordinate to computer performance. There has been intensive research on cognition from the beginning of human history in philosophy, psychology and science. Only recently cognition has been studied from a technical point of view. The reason is simple. The only machines which have cognitive capability are computers equipped with processors to handle cognitive information, and memory to store and retrieve cognitive processes as algorithms and data. Furthermore, there has been some confusion about the distinction between cognition and recognition. The basic nature of this distinction is very similar to that between search and research, and goes as follows: Upon cognition, if there is a mechanism to memorize what has been cognized, the memorized cognitive results can be exploited for improved cognition. Such improved cognition with reuse of the memorized cognitive results is conceptually identified as recognition. For example, after cognizing some signs, we start to develop in our own (or computer) memory an additional ability to classify the cognized signs into groups based on some distinct features such as the types of singularities, shared within each group. As stated earlier, there has been an over-emphasis on the roles of computers in cognitive technology, and also even confusion about 'for whom and for what purpose cognitive technology exists'. Taking visual cognition as a case, let us look at what kind of cognitive technology can help human creative processes. Usually, a creation is triggered by the discovery of items which cannot be explained or satisfied by what already exists. In the case of products, such a discovery is often called an invention. In general, discovery is done effectively through observing numerous cases and comparing them efficiently with whatever is known. Classifying the cases into types by finding the common features of the cases is a good practice; it is called abstraction. The higher the abstraction, the more effective our discovery. Through concrete examples worked out over a couple of decades, I will show that in the case of visual cognition, the most effective technologies are the most abstract ones (such as those based on differential topology). The following is only a small listing of the extensive research projects we have completed in order to test the validity of differential topological cognitive technologies: 1. cognition of the features of geographical terrain surfaces for efficient planning in backpacking; 2. cognition of expert techniques in martial arts, which shortened the learning time from 3 years to 30 minutes, with a gain in efficiency on the order of around 50 thousand times; and 3. cognition of an effective guide-map generation method to present multiple views of the place of guidance, based on manifolds instead of on a single view (which is usually perspective-based). What was found was simple, but turned out to be very effective. In the era of the 'information superhighway', with its ever extending universe of visual information, differential topology (in particular, the set of 'singularity signs' defined there; see below) helps us human beings to navigate through the system; also, it indexes critical information, allowing human cognition to take the lead in computerized cognitive technology.
Differential Topology WHY A TECHNOLOGY FEATURES?
Watching the recent development of computer systems, and of multimedia systems in particular, we notice that their human interfaces usually have been considered, conceptualized, designed, and realized (mainly or even exclusively) on the basis of increased machine performance and the related efficiency in product development, while neglecting the totality of human multimedia functions. (On multimedia functions, see also Kirkeby and Malmborg's contribution in the present volume). For example, let us look at how human beings come equipped with the integrated multimedia functions, named the five types of basic senses: visual, audio, touch, smell, and taste. Brain scientists say that human beings, in contrast to other animals, dedicate more brain cells, by an order of magnitude, to the visual than to the other senses. That means that, when we integrate multimedia into computers, human beings function better if the other media are centered around the visual medium. The visual medium includes pictures, signs, symbols, numbers, and characters. All of these have shapes which are varied and not necessarily in focus: they may be fuzzy or diffuse. Abstracting features from shapes is essential for human cognition of shapes, and hence for their categorization into classes, leading first to the establishment, and then to the cognition and representation of the notions associated with the classes as signs. The classes, the notions, and the signs are not necessarily monolithic; actually, in most cases they form a hierarchy. WHY DIFFERENTIAL TOPOLOGY ? Differential topology is a branch of abstract mathematics which handles differential properties of topological spaces. For example, let us look at a mountainous geographical terrain we want to go through. When we cross a mountain range, we usually try to find a pass to save our energy. For the best view, we usually go to a peak. For fun, we row boats on a lake, which is usually a water-covered pit. The notions of passes, peaks, and pits are the results of abstracting the shapes of a wide variety of mountainous terrain. A peak is the result of the cognition of a class of shapes in the terrain, comprising the highest points locally, in all directions, for us to climb. Similarly, a pit is the class of the lowest points locally, in all directions, for us to go down. A pass is locally the class of the lowest points, in a mountain range, and hence we usually cross the range at a pass, to literally ease the job of 'passing' the mountain range. These peaks, pits and passes form a super-class named 'critical points' in the class hierarchy. The motivation for this is very simple: Peaks, pits and passes all have a point where their first-order derivative 1 is zero; hence, these points were given a common name. If we view the hierarchy from the top, the critical points form a class, and the peaks, pits and passes form subclasses. Suppose we assume the terrain is smooth and the critical points are nondegenerate z. Then, the peaks, pits, and passes give us the complete description of the way the shape of the terrain behaves. In other words, the features that shape the mountainous terrain are completely represented by the critical points. This knowledge has a remarkable impact in establishing such computerized cognitive technology as will match the cognitive capability
1 The first order derivative, often also called the first-order differential coefficient, of the surface of a given shape at a point is the steepness of the surface at the point. 2 Here, nondegeneracy means the isolation of given critical points from the other critical points.
T.L. Kunii
of human beings. The knowledge is a consequence of a lemma originally proved by an American mathematician, Marstone Morse, in a book, published in 1932 (Morse, 1932); the proof was simplified by John Willard Milnor in his lecture notes published in 1963 (Milnor, 1963, 1969). The lemma, generally called the Morse lemma, is very useful for cognitive technology. The lemma says: Around a nondegenerate critical point, the shape of the surface is locally equivalent to a quadratic polynomial, namely, a sum of the multiplication of two variables, where each variable represents the front to back location, the right to left location, or the height of the point. The theory surrounding the Morse lemma is called Morse theory in differential topology. Morse theory has not been so frequently exploited as it should have been, nor is it too well known, unfortunately, outside the domain of mathematics, particularly in computer science, where it is indispensable for developing a human-driven cognitive technology. The major reason for this neglect has to be found in the delay incurred in developing an appropriate curriculum of computer science that would include the human aspects of cognitive technology. It is almost like gathering piles of thick books without providing indexes for human beings to access the necessary portions of the contents of these voluminous works. Although Morse theory is a key part of the endeavor to model human cognitive behavior, it has a few inherent drawbacks. For example, when the mountainous geographical terrain has a plateau, Morse theory is no longer applicable, because the top of the plateau is, by definition, fiat and hence equivalent to having an innumerable number of degenerate peaks. A similar situation occurs when the terrain includes ridges and ravines. Another problem is that Morse theory depends on the direction of the height function being predetermined; when the direction of the function changes, Morse theory loses its validity. This situation normally does not occur in geographical terrain. However, when we try to apply Morse theory to other cases, such as fashion design of apparel, or medical diagnosis of our internal organs (e.g. the large intestine), the directions of the surface tangents vary continuously, and so does the direction of the height functions, making Morse theory inapplicable. CONCEPTUALIZING EXAMPLE
Ideally, a desirable cognitive technology would have the quality of speaking to close friends, who share the same levels of cognition in the areas of our specialty; doing this, we would mobilize all our cognitive processes and our entire discourse capacity. Let us take an example. We are all familiar with wrinkles. In apparel fashion design, cloth made of natural fibers, such as linen and cotton, has been widely used for the longest time; these fabrics are prone to exhibit delicate wrinkles after having been worn for a while, because the warp and the weft are strained collectively. We know now of wrinkles found in completely different types of environment. One such environment is cosmic space. Specifically, wrinkles were found in space by George Smoot and his group at the University of California, Berkeley, in 1992 (Smoot and Davidson, 1993). They shot up a satellite named COBE, to verify certain essential information on the formation of the universe. The wrinkles discovered in space were lying stretched out over a large area, not unlike the Great Wall of China. This discovery is considered one of the major scientific findings of our century. The universe, then, possibly is like a sheet of cloth, woven by space and time as its warp and weft. The wrinkles on this sheet of cloth came about as the universe was formed, just like wrinkles in clothing are created, viz. through the interaction of warp and weft, here: through the interaction of space and time,
Differential Topology
along with their subsequent collective motions, at the beginning of the universe's formation. Another completely different type of environment is the human body and the organs inside it. Stomach cancers and tumors are often due to mental stress; they strain the stomach collectively and wrinkle its wall. Such wrinkles are usually discovered by visual inspection of the inside surface of the stomach, by using an optical fiber-based gastroscope. A place which exhibits an altogether other, new kind of wrinkles is the market of financial trading. Looking at the international stock market, we see that there have been a fair number of crises. One of the largests and best known was that of Black Monday on October 29, 1987 and the following Terrible Tuesday, which resulted in an almost complete meltdown of the international stock market. Even today, the market has not yet fully recovered from that blow. According to the financial analysts commenting on that event, it was triggered by the collective behavior of some rather simplistic pieces of stocktrading software inside the networked, dedicated computers, after the traders had left for the day, leaving the machines to themselves in Wall Street. If we carefully analyze this event further, we see that it was a matter of wrinkles on the surface of the internationally spread-out financial trading houses, whose terminals are currently handling over 90 % of the world-wide financial trade electronically; of this amount, around 40 % is handled through the Reuter financial terminals (Kurtzman, 1993). Concluding this story, it seems fair to say that almost all economical booms and recessions are in reality wrinkles in the world economy. Having thus been presented as occurring in widely diverse areas, such wrinkles may all look unrelated at first glance. One method of establishing a human-driven cognitive technology is to let people and computers share some high-level common knowledge which, by definition, is based on abstraction, as we have seen above. As to wrinkles, they are commonly abstracted as 'signs of singularity' carrying more information than do critical points, and including critical lines, where critical points are degenerate. Singularity signs are also direction invariant, such as in the case of ridges, ravines and their combinations (Kunii et al., 1994), (Anoshkina et al., 1995). Hence, the concept of wrinkles ranges over a wider area of more general applicability than does Morse theory. Wrinkles are also related to other elementary singularity signs such as a fold, a cusp and their combinations, such as a cross that is a combination of two folds, and a p + +c singularity that is a combination of a cusp and a fold (Arnold et al., 1985). The theory of singularity signs is called 'singularity theory'. Another closely related area of research is that of 'catastrophe theory' (Thom, 1989; Thom, 1990). All these theories belong to modern differential topology, which is among the fastest growing research areas in theoretical mathematics, with an incredibly wide range of applicability. WHY MANIFOLDS
Many ancient and medieval paintings were executed from multiple viewpoints, describing objects so as to illustrate their different points of interest. After the Renaissance, when the perspective view became dominant as an exact, and hence scientific, way of drawing, multiple viewpoint pictures faded out, surviving only in limited domains such as area guide maps, physicians' diagnostic drawing, and in some schools of art, such as cubism. In our human memory, we remember scenes of our native surroundings as perceived from various viewpoints. When we try to understand how machines are configured, we draw them as seen from different sides. In human visual cognition, multiple viewpoint pictures are natural and there is no reason to reject them as less important. We designed some research as a step toward the science of multiple viewpoint pictures. A case
T.L. Kunii
study was carried out to prove the hypothesis that there is a way to model multiple viewpoint pictures "exactly" - "exact" in the sense that we can define them without any ambiguity and hence generate them automatically. A popular instance of area guide maps, a mountain guide map, was studied (Kunii and Takahashi, 1993a), (Takahashi and Kunii, 1994). In such a map, mountain peaks, mountain passes, and mountain pits (the latter usually filled up with natural water in the form of lakes), are all abstracted so as to characterize the geographical undulations. To represent such land features clearly, a mountain guide map has to be drawn from multiple viewpoints. For example, let us consider the difference between an ordinary perspective picture and a mountain guide map. When we view the scenery at the foot of a mountain in perspective, the lake will be partially hidden by the surrounding mountains, while the mountain skyline will be seen clearly. A good lake view is usually obtained from the top of the highest mountain. Pasting these views together as charts we create a space called a manifold. To normalize the overlapping areas of the charts, a method named the 'partition of unity' is used. It is the generic form of many piecewise approximations of the smooth curves and surfaces known, for example, as the 'spline' and the 'NURB (NonUniform Rational B-Spline)' approximations. A spline originally was a strip of long, narrow, and thin wood or metal used by draftsmen for flexibly laying out sweeping curves. Later, it was turned into a mathematical function to simulate its physical behavior in approximating free form shapes in a piecewise manner. A spline, or its improved modern version NURB (Farin, 1994), is a free curve or surface which consists of smoothly connected piecewise polynomial functions, approximating a smooth shape of an object out of a rough polygonal representation of the object shape. Since a spline approximation and its variations are piecewise, they are convenient for interactive design, because a local shape change while refining the shape design does not affect the shape of the rests. From the point of view of a human-driven cognitive technology, understanding the partition of unity gives a far better understanding of the meaning of localized surface approximations than remembering a multitude of piecewise realized approximation methods. W H Y C W - C O M P L E X E S .9 There exist a number of fundamental problems that nobody seems to have found a space for in which to model them properly. This means that no human cognitive technology has been developed for these fundamental problems; hence, no way has been found for us to realize the models on the computer, neither have we been able to obtain assistance from the fast growing computer technology in our work in major and socially important problem areas, typically comprising such classes of problems as: 1. Automated factory design problems; 2. Interactive complex shape design problems; 3. Problems in dentistry design. I expect to be able to model all of these problems properly in spaces which are CWcomplexes 3. The design problems in these areas all share a common property, which characterizes the modeling space for each of them: in each area, we are dealing, within the individual design, with one particular component that is glued together with the other, regular components in a space of different complexity and precision. Usually, complexity here means 'dimensionality'; one question is how dimensionality relates to precision. Leaving that question aside (actually, it could be regarded as another theme of research), let us go back to the cases listed above. In example 1, the general layout of automated machines operates with a degree of precision of several centimeters, while the individual automated 3 CW (closure finite weak topology) complexes are intuitively defined as spaces consisting of cells with different complexity glued together (Fomenko, 1987).
Differential Topology
machines in our case need to produce components having a precision of a few microns. In example 2, the same is true in car design: whereas overall car design may have a flexibility of a few centimeters, the final curve design requires a precision of the order of a few microns. In example 3, the human jaw structure can absorb differences of a few millimeters; in contrast, the final human tooth surface design requires a precision of 20 microns. It seems that problems 1, 2, and 3 all can be modeled by CW-complexes; however, manifolds will not do, because of the occurrence of singularities at the cell boundaries. WHY HOMOTOPY
Homotopy is defined as follows: Given a closed interval t which is normalized for convenience to take the value from 0 to 1, a continuous mapping H from one function f to another function g is a 'homotopy', if the mapping H is f when the value of the interval t is 0 and the mapping H is g when the value of the interval t is 1. f and g are called 'homotopic'. There are numerous textbooks on homotopy theory (see, for example (Sieradski, 1992) ). For the people working in many areas of application with the reconstruction of three dimensional (3D) images from a sequence of equiheight functions, such as sequences of CT (computed tomography) images and topographic maps (with the equiheight lines drawn in clearly), I have a very important warning. The warning is simply "Please don't over-use triangulation. Try to use homotopy theory instead." The reason behind this warning is clear. If you triangulate, you have for ever to renounce on information both on singularity and on differentiability: once it is given up, there is no way for you to get this information back. No matter how good your surface approximation works after triangulation, it can only give you false shapes. From a cognitive technology point of view, triangulation is a disaster. Still, it has been used for hundreds of years, and unfortunately people seem unable to get out of the habit. Homotopy theory is no magic. It simply tells you what a continuous transformation is, and helps you locate the places where deformations can be applied. Suppose we generate a surface in between two equiheight lines of an object. The process of surface generation can be considered, in order for homotopy theory to apply properly, as a process of continuous deformation of one equiheight line into another. It can be easily proven that spline approximation, loft surface generation, and even triangulation itself, are special cases of homotopic deformation. However, triangulation is dispreferred because of the guaranteed loss of singularity and differentiability, as I explained above. COGNITIVE TECHNOLOGY
Sports competitions are known to be typical cases of extreme exercising of the human body. Their analysis and understanding require the type of cognitive technology which can clarify how complex human body configurations work. Intuitively, I felt it must be possible (as subsequently confirmed by the results of decades of research) to turn this problem into a cognitive one: that of the mapping from the human body configuration-space into the human body work-space; hereby the cognitive problem becomes that of the cognition of the singularity of this mapping function. This approach was applied to a martial art, Shourinji, and by way of a test, one expert technique which normally requires three years of intensive training was turned into a thirty minutes' fairly easy exercise (Kunii et al., 1993b). Below, I briefly describe what we cognized and also how.
T.L. Kunii
The martial art competitions were carried out on a floor where 5 TV cameras were continuously recording, at the rate of 30 frames of pictures per second, the changes of the body configurations and the movements of the competitors from the 5 sides, namely from the right, left, front and back sides, and above. After turning the 5 frames of video pictures from the 5 sides at each 1/30 second interval into 5 frames of digital images inside computers, we constructed the human body configurations as connected body segments, using their locations and angles to computationally cognize the configuration-space as well as their work to computationally cognize the work-space. Then, the mapping function from the configuration-space to the work-space was derived to discover that, whenever the work of the winner in the competition was an expert technique, the mapping function of the defeater became singular. By recognizing this, we cognized the meaning of what are generally called 'expert techniques'; 'singular' means the particular types of human body configuration which prevent smooth body movements. Thus, in the competition, the winner succeeded to push the defeater into a posture where the defeater's body configuration did not allow any more smooth continuous movement of the body. After this cognition, we could teach the resultant body configuration and movement as an expert technique in 30 minutes instead of wasting 3 years of uncognized lessons, with around 50 thousand times more efficiency.
CONCLUSIONS The overall aim of what I have described above has been to achieve a relatively simple goal: that of establishing a human-driven cognitive technology in the face of the overwhelming barrage of visual information. Certainly, with the oncoming 21st century of worldwide information superhighways, the multimedia networks will bring you at every moment very large amounts of information, centered around visual input which goes well beyond the processing power of your computer and yourself. Visual information will be displayed on your screens at every moment and almost non-stop. While in the case of textual information, indexing is fairly straightforward, for the purpose of indexing visual information, as I have explained, we need singularity signs. The advanced semiotics that is involved in the study of signs has been developed in a multimedia domain which gives it a broader scope than what is practiced currently in computer science (Fischer-Lichte, 1992), (Nattiez, 1990); it will represent the future development of cognitive technology.
REFERENCES Anoshkina, Elena V., Alexander G. Belyaev, Runhe Huang and Tosiyasu L. Kunii, 1995. Ridges and Ravines on a Surface and Related Geometry of Skeletons, Caustics and Wavefronts. In: Computer Graphics: Developments in Virtual Environments, Proceedings of CGI'95, June 26-30, 1995, Leeds, UK, 311-326. London: Academic Press. Arnold, Vladimir Igorevich, S. M. Gusein-Zade and A. N. Varchenko, 1985. Singularities of Differentiable Maps. Boston, Mass. and Basel: Birkhaeuser. Farin, Gerald E., 1994. NURB Curves and Surfaces: from projective geometry to practical use. Wellesley, Mass.: A K Peters, Ltd. Fischer-Lichte, Erika, 1992. The Semiotics of Theater. Bloomington, Ind.: Indiana University Press. Fomenko, Anatoly T., 1987. Differential Geometry and Topology. New York: Plenum Publishing.
Differential Topology
Kunii, Tosiyasu L., Alexander G. Belyaev, Elena V. Anoshkina, Shigeo Takahashi, Runhe Huang, and Oleg G. Okunev, 1994. Hierarchic Shape Description via Singularity and Multiscaling. Proc. Eighteenth Annual International Computer Software & Applications Conference (COMPSAC 94), 242-251. Los Alamitos, Calif.: IEEE Computer Society Press. Kunii, Tosiyasu L., and Shigeo Takahashi, 1993a. Area Guide Map Modeling by Manifolds and CW-Complexes. In: Bianca Falcidieno and Tosiyasu L. Kunii, eds., Modeling in Computer Graphics (Proc. IFIP TC5/WG5.10 Second Working Conference on Modeling in Computer Graphics), 5-20. Berlin: Springer Verlag. Kunii, Tosiyasu L., Yukinobu Tsuchida, Yasuhiro Arai, Hiroshi Matsuda, Masahiro Shirahama and Shinya Miura, 1993b. A Model of Hands and Arms based on Manifold Mappings. In: Nadia Magnenat Thalmann and Daniel Thalmann eds., Communicating with Virtual Worlds (Proc. CG International '93), 381-398. Berlin: Springer Verlag. Kurtzman, Joel, 1993. Death of Money. New York: Little, Brown and Company. Milnor, John Willard, 1963, 1969 (with corrections). Morse Theory. Princeton, N.J.: University Press. Morse, Marstone, 1932. The Calculus of Variations in the Large. Providence, R.I.: The American Mathematical Society. Nattiez, Jean-Jacques, 1990. Music and Discourse: Toward a Semiology of Music. Princeton, N.J.: Paperbacks. Shinagawa, Yoshihisa, Yannick L. Kergosien and Tosiyasu L. Kunii, 1991. Surface Coding based on Morse Theory. IEEE Computer Graphics and Applications Wol. 11, No.5:66-78. Los Alamitos, Calif.: IEEE Computer Society Press. Sieradski, Allan J., 1992. An Introduction to Topology and Homotopy. Boston, Mass.: PWS-Kent Publishing Company. Smoot, George, and Keay Davidson, 1933. Wrinkles in Time. New York: William A. Morrow and Company, Inc. Takahashi, Shigeo, and Tosiyasu L. Kunii, 1994. Manifold-Based Multiple-Viewpoint CAD: a Case Study of Mountain Guide-Map Generation. Computer Aided-Design, Vol.26, No.8:622-631. London: Butterworth-Heinemann. Thom, Ren6, 1989. Structural Stability and Morphogenesis: An Outline of a General Theory of Models. New York: Addison-Wesley. Thom, Ren6, 1990. Semio Physics: a Sketch. New York: Addison-Wesley.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Chapter 21 H Y P E R T E X T AND READING C O G N I T I O N Alec McHoul and Phil Roe
School of Humanities, Murdoch University, Murdoch, Western Australia 6150 mchoul@murdoch, edu. au
W H A T IS H Y P E R T E X T ? 'Hypertext' - a term which is sometimes extended to include hypermedia in general refers to so,ware capabilities which allow readers supposedly non-linear forms of access to information via personal computers and terminals. A typical hypertext document would open with a top-level menu or home page which might include conventional texts, audio recordings, still pictures and/or video samples: indeed information of any kind which can be stored digitally. On selecting highlighted or coloured words or phrases, or specially boxed graphic frames, a hypertext reader is led to a further screen containing more words and images which explain or expand the initially chosen item: and so on, potentially indefinitely. Each verbal or graphic point can be thought of as a node in a grid of nodes, such that the path traversed in any particular session of reading will be open to the interests discovered by the reader as she or he passes through the grid. Hypertext documents can be distributed on disk or CD, or else posted on mainframes and accessed through file transfer protocol (FTP) routines or network softwares such as MOSAIC. -
Here are some simple examples of hypermedia: "You are reading a text on the Hawaiian language. You select a Hawaiian phrase, then hear the phrase as spoken in the native tongue. You are a law student studying the California Revised Statutes. By selecting a passage, you f i n d precedents from a 1920 Supreme Court ruling stored at Cornell. Cross-referenced hyperlinks allow you to view any one o f 520 related cases with audio annotations. Looking at a company's floor plan, you are able to select an office by touching a room. The employee's name and picture appears with a list o f their current projects. You are a scientist doing work on the cooling o f steel springs. By selecting text in a research paper, you are able to view a computer-generated movie
A. McHoul and P. Roe o f a cooling spring. By selecting a button you are able to receive a program which will perform thermodynamic calculations. A student reading a digital version of an art magazine can select a work to print or display in full. Rotating movies o f sculptures can be viewed. By interactively controlling the movie, the student can zoom in to see more detail. ,,1
In principle, this form of information transfer should mean not only that hypertext information can flow freely to any reader whatsoever but also that, once accessed, any given document can be inspected according to the supposedly free choices of the reader. This double openness - of access to texts and of addressing their contents - has led some communications theorists to think of hypertext as revolutionary, as redistributing 'power' away from text producers and towards readers. In this paper we want to argue against such claims- principally (and perhaps ironically) because they are based on a very narrow conception of reading practices. If reading itself were (and always had been) such a narrowly-conceived social and cognitive practice, there might be some substance to the celebratory and optimistic claims of these hypertext analysts. If not, their celebrations and their optimism may be premature. MISREADING
It is pertinent in the first place to expose the theoretical substructure and the assumptions which underlie the above claims for hypertext. Central to the claims of these communications theorists is their understanding of the hypertext object itself, their reading of certain critical theorists (especially Barthes and Derrida), and the assumption of a self-evident difference between hypertext and traditional print text. Claims of convergences between reader and writer, and between hypertext and contemporary critical theory, are based on a praxis of misreading. The critical issues are effaced by this misreading and by a use of (critical) language that a quotation from Derrida seems adequately to describe: it "betrays a loose vocabulary, the temptation of a cheap seduction, the passive yielding to fashion" (Derrida, 1976: 6). Definitions of hypertext are continually elaborated against a particular and rigid notion of print text. The definitions accorded to the text are also presumed to be the determinants of reading practices. Delany and Landow, for example, elaborate their definition of hypertext against a notion of the traditional text, which they define according to three attributes: "that the text was linear bounded and fixed". Their definition of hypertext is then able to become: "the use of the computer to transcend the linear, bounded and fixed qualities of the traditional written text" (Delany and Landow, 1991: 3). Their extended explanation proceeds negatively, contrasting hypertext with the static form of the book: accordingly, hypertext can apparently be composed and read non-sequentially as a variable structure comprising blocks of text connected by electronic links.
1 These examples of hypertext/media usage are provided in an introductory document Guide to Cyberspace 6.1: What is Hypertext and Hypermedia? This guide can be found on the World Wide Web at the following address:
Hypertext and Reading Cognition
Landow has frequent references to the fluidity and instability of hypertext as opposed to the fixity of print-based text. This is premised on hypertext's electronic status, the fact that it is potentially able to be amended and added to by the reader, and so forth (Landow, 1992). What he calls the convergence of reader and writer tends to efface a significant conflation that slips by without critical comment as to what this move constitutes and what is at stake in the conflation. The simple juxtaposition of the physical and fixed structure of the book (bound by its materiality) versus the electronic fluidity of hypertext does violence to the notion of textuality, collapsing distinct categories. Landow frequently relies on Barthes to elaborate his notion of text, but the distinction between work and text is not so simply elided. Barthes states that: "The work can be hem in the hand, the text is held in language, only exists in the movement of a discourse...; or again, the Text is experienced only in an activity of production. " (Barthes, 1977: 157) In the electronic paradigm, it is the notion of the work that makes no sense. In both electronic and print forms, the text remains in language, existing in the movement of discourse and experienced in the activity of production (reading/writing). The claim by Landow and others of a convergence between contemporary critical theory and technology, specifically hypertext, is a misreading of both hypertext (as a critical object) and contemporary critical theory. The convergence of terms to which Landow points between these areas is simple appropriation - the theoretical connections have not been established in any systematic way. It is bizarre and superficial to claim that an important theoretical and practical convergence has taken place simply because a number of terms ('link', 'web', 'network', 'interwoven') happen to be used in both hypertext discourse and in Derridean theory. Derrida's work on writing, for example, concerns writing in general - a general condition of undecidability preceding all particular signs, texts and communications- and so hypertext as a form of writing must be implicated just as much as other forms of writing. No 'special relationship' between Derridean conceptions of writing and hypertext has been established despite the claims. That could only happen if one wrongly - thought of Derrida not as a philosopher interested in writing's general preconditions (which he is), but as a prophet of semantic anarchy and the reader's liberation movement - which he most certainly is not (Nealon, 1992; Lucy, 1995). And Derrida notwithstanding, any claimed relationship between hypertext and 'readerpower' must be problematic, especially given the highly conventional and organised structuration of hypertext. The crucial issues of textuality and textual politics that are paramount to this discussion constantly slip away. CELEBRATING HYPERTEXT Let us begin with a caution. Hypertext enthusiasts exhibit a certain religious fervour linked with the political panacea of democratisation; they imagine a freely available node-web within which the liberation of the reader is the celebration of the mass(es). However, as we noted above, hypertext is very conventionally structured in terms of both access and address. Turning firstly, then, to questions of access: this is bound to be limited. Hypertext readers are a very select group simply by virtue of the equipment required to access hypertext documents. The minimal equipment needed is a reasonably powerful PC, connection software to a local mainframe and means of
A. McHoul and P. Roe
access to that mainframe: access in terms of both hardware - such as a modem or ethernet connection- and institutional rights which usually come with membership of, say, a university community. (And hence it is not surprising that the hypothetical users in the World Wide Web examples in the first section of this paper have such institutional affiliations.) On top of this, potential readers will need to be skilled in filetransfer routines and in hypertext manipulations themselves. This presupposes at least some minimal form of training, institutional or otherwise. Then we have to consider what types of texts can be delivered in hypertext and who controls this. Hypertext authoring programs such as Authorware Professional, Macromedia Director and Toolbook do not come cheaply. They require even more powerful machines than those required merely to read. They require institutional sanctions which allow writers to 'post' their texts on mainframes. Or else they require industrial links to CD-ROM manufacturers for distribution. This effectively limits hypertext genres either to proinstitutional texts (so that many of the first forms available via WWW were in effect advertisements for universities and museums) or to texts which might be perceived as having a market (games, encyclopaedias, movie guides, and so on). In this sense, hypertext technologies appear, in terms of their affordances for free composition and distribution, much more limited than conventional book technologies. In hyper-space, there is no equivalent of the spirit duplicator. Now moving away from the question of sheer access and towards questions of address: a central claim among pro-hypertext enthusiasts is that hypertext is 'readerly', as opposed to 'writerly', this distinction being based very loosely on that of Roland Barthes. Hence Landow writes: "From the vantage point of the current changes in information technology, Barthes's distinction between readerly and writerly texts appears to be essentially a distinction between text based on print technology and electronic hypertext, for hypertext fulfills [to quote Barthes (1974: 4)] 'the goal of literary work (or literature as work) [which] is to make the reader no longer a consumer, but a producer of the text. Our literature is characterised by the pitiless divorce which the literary institution maintains between the producer of the text and its user, between its owner and its customer, between its author and its reader. This reader is thereby plunged into a kind of idleness - he is intransitive; he is, in short, serious: instead of functioning himself, instead of gaining access to the magic of the signifier, to the pleasure of writing, he is left with no more than the poor freedom to either accept or reject the text: reading is nothing more than a referendum. Opposite the writerly text, then, is its countervalue, its negative, reactive value: what can be read but not written: the readerly. We can call any readerly text a classic text '." (Landow, 1992: 5-6) There is no doubt, in this reading of Barthes, a terrible category mistake. While Landow wants to make a complete separation between types of text such that the 'writerly' type is conflated with print (and hence closure) and the 'readerly' type with hypertext (and hence openness), Barthes himself is more equivocal. For Barthes, the writerly text denies the reader the pleasure of writing, to be sure. But this is precisely what forces the reader into a readerly position, into the space of"what can be read but not written". He consciously tropes on Nietzsche's idea of a slave ethics in introducing
Hypertext and Reading Cognition
3 51
the readerly itself: it arises from a denial of entry into writing; it is a "negative, reactive value". It conforms to the writerly, gives itself over to it, plays its game. It too works with the rule of "what can be read but not written". And that is precisely why it is "classic". Where Landow finds an idealist space of liberation, Barthes only marks the side of the slave who is dependent on the master. In Landow' s American liberalism, the oppressive simply has to be named and overcome by a word of negation - 'readerly'. In Barthes, the apparent opposite always depends on what it opposes, plays its game, and finds ways of operating within the same rules. The readerly and the writerly are two prongs of a single forked instrument- an instrument which may be writing in general and, if so, it will always contain possibilities of violence, one way or the other. So there is, in this pro-hypertext position, an initial and foundational category mistake which reads a relational binary as an absolute binary, forgetting the dependence that binaries must always bring with them. Then, having made this move, Landow can begin his celebration in earnest, even in his dreams of his imaginary readers' reactions to his own writing:
"Although you cannot change my text, you can write a response and then link it to my document. You thus have read the readerly text in two ways not possible with a book: You have chosen your reading path - and since you, like all readers, will choose individualised paths, the hypertext version of this book might take a very different form in your reading, perhaps suggesting the values of alternative routes." (Landow, 1992: 7) Now, in a sense, the silly joke is out in the open: the master-writer's conditions are always the conditions that allow the slave-reader to be free: "you cannot change", "you have chosen", "you, like all readers, will choose" ... usw. This is precisely what Barthes meant by the 'readerly', and the meta-joke is that Landow could not read him slavishly enough. Overcoming the problems of a 'given' text, whether a book or an electronic node-web, is not a simple problem of negation or of imagined negation. And yet the supposed revolution of hypertext and its terrible 'readerliness' are premised, precisely, on such a simple overcoming: the readerly seems non-violent, but "you ... will choose". The reason for this convenient reading of Barthes becomes clear when we see how this move is then able to link to one of the long-held objectives of hypertext practice. This objective has been articulated from early on in hypertext's history. Yankelovich's influential and often-cited paper "Reading and Writing the Electronic Book" (first published in 1985) makes this clear:
"Ideally, authors and readers shouM have the same set of integrated tools that allow them to browse through other material during the document preparation process and to add annotations and original links as they progress through an information web. In effect, the boundary between author and reader shouM largely disappear." (Yankelovich, 1991: 64) McKnight et al. (1989), in "The Authoring of Hypertext Documents" note that most writings on hypertext have focussed on reading, on what is presented to the reader, and generally on reader-based research strategies. Authoring becomes something which is always oriented to reading (a very narrow and specific notion of reading), so that many hypertext systems in actual use blur the distinction between author and reader, particularly in cases "where the 'reader' will add links to the document,
A. McHoul and P. Roe
customise and annotate it, thus making the distinction between the author and reader less clear" (1989:140). Yet McKnight et al. also attempt to re-establish the place of the author, and do so by pitting the hypertext author against the author of the conventional book. The crucial point they make here is against the grain of hypertext enthusiasm whose short history has always privileged the reader, and the readerly. They say: "once we have it in our hands, the whole of a book is accessible to us as readers. However, sat in front of an electronic read-only hypertext document we are at the mercy of the author since we will only be able to activate the links which the author has provided" (?vlcKnight et al., 1989: 140) 2
This argument is at odds with the celebration of hypertext as constituting the vanguard of the readers' liberation movement; for it conceives of reading practices as essentially determined by the structure of the text, implying a traditional relationship between author and reader, mediated by intentionality. With these assumptions about reading, it becomes possible to construe the provision of links in a document as choices for a hypertext reader which don't otherwise exist. McKnight et al in fact conceive of the links in a document as a constraint on the reader in that such links specify a structured, organised and thus limited number of options. So while our own critical position towards unduly celebrating hypertext receives some backing from McKnight et al, it's also true that we part company from them when they construe pre-hypertextual readings (indeed any readings) in terms of a very narrow communications model involving authors' intentions set in place specifically to impart limited information to readers who thereby become victims of the text. What this position misses- along with the celebrationist position - is that quite 'ordinary' (including pre-hypertextual and hypertextual) forms of reading cognition can be quite fluid, artful, nodal and so on: there is nothing special about this and this is why there is nothing special about hypertext. Along with the celebrationists, McKnight et al seem to think that what is called 'reading' can only be one thing: a single practice with a set of fixed and identifiable criteria. For us 'reading' has always taken a number of.highly diverse forms, some of which just happen to be used in electronic formations. CELEBRATING HYPERTEXT TOO Returning to the celebrationist position, then: from its obviously spurious claims about an apparently new 'readerliness' comes a further claim which shifts it into the broader field of communications history: "The strangeness, the newness, and the difference of hypertext permits us, however transiently and however ineffectively, to de-center many of our culture's assumptions about reading, writing, authorship, and creativity. "' (Landow, 1992: 2 03) The impetus is no doubt Ongian (see Ong, 1982). Ong, we may remember- if we have long memories- claimed that oral communication was the most authentic and human, that writing technologies all but destroyed that complete presence which the exchange 2 See also Whalley (1993) who provides a similar argument regarding the 'nonlinearity' of hypertext, and against the notion of the strict linearity of conventional text.
Hypertext and Reading Cognition
of talk permitted and reflected, and that, eventually, a post-'writerly' (in Landow's sense) communications technology would come to restore us to our authenticity. Ong mentions the telephone as an example: a means of exchange which restores the voice, the natural memory, and the presence of one soul to another. But isn't Landow's reading of hypertext an ultimate version of that; a system which "offers the reader and writer the same environment" (Landow, 1992: 7)? And isn't that shared environment precisely one of pure presence? So if "our culture's assumptions about reading, writing, authorship, and creativity" have anything wrong with them, it's that they don't permit an equal exchange, "the same environment", the auditorium (which is the space of the voice). Hypertext is then supposed to redress this balance, to make all persons equal because they become equal participants in a form of writing which (we hear) totally maps on to conversational exchange. Hypertext is 'readerly' and liberating because it restores the truly human voice (marked by the instantaneous exchange of positions, the dialectic) via an electronic medium. 3 Elsewhere, Landow (1991) asserts that since hypermedia changes both the way texts exist and the way we read them, then it requires a new rhetorics and stylistics. Beginning from what he calls the defining characteristic of hypermedia (blocks of text connected by electronic links which emphasise multiple connections), he notes that the potential of hypermedia cannot be realised simply by linking, and that there must also be a range of techniques suited to hypermedia- stylistic devices and rhetorical conventions. What initially seems promising, however, is just as quickly returned to an informational economy serviced by these 'new' rhetorics and stylistics. The necessity for these techniques, he says, is that they "will enable the reader to process the information presented by this new technology" (Landow, 1991: 81). Rather than engaging with a reconceptualisation of reading, writing, texts and meanings, it remains a matter of the more efficient distribution and dissemination of quite traditional information. John Slatin's (1991) discussion of hypertext also takes up questions of rhetoric. While adopting the initial assumption that hypertext is very different from traditional forms of text, he attributes this difference to a function of the technology that makes hypertext. The characteristics of this technology, he says, are "various, vast and minute simultaneously- making hypertext a new medium for thought and expression, the first verbal medium ... to emerge from the computer revolution!". A new medium, he says, "involves both a new practice and a new rhetoric, a new body of theory" (Slatin, 1991: 153). The first requirement he suggests for a rhetoric of hypertext is that it must take the computer actively into account as a medium for composition and thought, and not simply as a presentational device or as an extension of the typewriter. Although he too contrasts hypertext with traditional (print) text, he does not collapse all reading into a single model. Instead he focusses on the assumptions which each kind of text makes about what readers do and the ways in which assumptions about reading affect an author's understanding of composition. His project is concerned with finding ways of talking about documents that have multiple points of entry and exit, and multiple pathways between these points. What this approach begins to open on to is not only an
3 A colleague fond of hypermedia exchanges objected to our idea that he was setting up a virtual conference: "Howcan it be virtual if I can see him and he can see me in real time, face to face?"
A. McHoul and P. Roe
exploration of the possibilities of the medium (through for example, questions of interactive reading, interactive writing, and co-authorship), but also the language of the medium ('ways of talking'). Slatin's central argument is that rhetoric is typically indifferent to the physical processes of textual production. He notes that the maturity and stability of print technologies have been invisible as technology, while such transparency is not yet available in terms of computing technologies. In Slatin's argument, hypertext and hypermedia are still, and are likely to remain, immature and unstable as technologies, and so a rhetoric of hypertext cannot afford to disregard its technological substrate. For this reason, theory and practice in hypertext have, potentially at least, an interesting co-existence and mutual interdependence. What becomes apparent in the way hypertext practice is organised (because of its orientation to this narrow kind of informational reading), despite the claims, is that it is still based on conventional structures of writing and linearity (albeit with a more clearly defined, and also more clearly limited, multilinearity). The metaphorics of hypertext (and hypertextualism) are illustrative here. Shneiderman and Kearsley (1989 6), for example, have a section on "hierarchies" in which the predominant metaphors are the "tree" (roots, branches and leaves) and of the "parent-child" (defined as superordinate and subordinate concepts). The definitions and descriptions they provide for these terminologies function as instructions for reading which organise reading cognition in terms of a series of metaphors connected to several of what are now fairly conventional discourses. These metaphors- browsing, indexing, searching, maps, filters, tours, navigation, etc. - constitute a conventional conceptual reading apparatus. While the implied function of this apparatus can be read as a bridge or transition between 'old' and 'new' modes of reading practice (enabled by the rigid definition of print text and the reader's relation to it), it appears more as the overlaying of conventional reading practice on new technology. The technology may be new, but the approach to it and the relations to it are wholly conventional. Hypertext has already been colonised by conventional reading practices- how could it not be since, in a sense, it is thoroughly conventional- and the colonisers don't seem to have noticed. What exists as 'theory' about hypertext at this time does not acknowledge the roots of hypertext practice and is seduced by the hype around the vastness of the information potential of the medium. This seduction seems to function around the spatial metaphorics of its reading practice and their relations with the discourses on hyperspace and cyberspace, generated through a confluence of science fiction (cyberpunk in particular), the economics of information, and the technologies of computer science. The narratives that mark out these spatial trajectories bear a remarkable resemblance to colonial narratives of discovery and exploration - where 'virtual' space (following geographic and then 'outer' space) has become 'the final frontier.' These narratives, it must be remembered, have their roots in pre-existing models of writing, textuality and technological practices- Neuromancer, after all, was written on a typewriter. Tracing the movements of the narratives of these discourses may tell us more about the structures of reading cognition at work in hypertext than simple reduction to notions of the 'readerly' and the 'efficient' processing of information. "We're in an information economy. They teach you that at school What they don't tell you is that it is impossible to move, to live, to operate at
Hypertext and Reading Cognition
any level without leaving traces, bits, seemingly meaningless fragments of personal information. Fragments that can be retrieved, amplified." (Gibson, 1988: 30) And, of course, fragments that can be read in terms of a number of different discourses. Let us not forget here the cautions raised in the notion of the 'electronic panopticon' - not in the sense of the Orwellian 'Big Brother', but where, as Provenzo (1992: 187) cautions, "[u]sing Foucault's terminology, the literate individual increasingly becomes an 'object of information, never a subject in communication'". Nor should we forget links to military discourses - not only in terms of technological development, but also for plans and futures. Shneiderman in fact proposes what he calls his 'Star Wars' plan for American education with his vision of the beneficent patriarch. We are also enthusiastic about computing technology in education but we wonder about this educational philosophy: '7 propose a boM national Strategic Education Initiative (SEI) ... patterned on the concept of the Strategic Defence Initiative (SDI) or the Strategic Computing Initiative (SCI) .... Mine is also a Star Wars Plan but it is #nked to the image of Luke Skywalker's wise and gentle teacher ObiWan Kenobi (played by Alec Guinness) rather than to the terrifying Darth Vader. Instead of 1,000 space-based battle stations, I propose at least 10, 000, 000 school-based edu-stations, enough to have one for every five students, plus appropriate teacher training and software." (Shneiderman, 1992: 14-15) Returning to our earlier question: the pro-hypertext position claims its object to be revolutionary by virtue of the supposedly non-linear way in which reading cognition takes place in such electronic environments. Hypertext, then, as the ultimate "nonlinear organisation of information" (Schneiderman and Kearsley, 1989: 158), appears to signal an historic shift: the end of the book, the end of linear writing and reading. In our experience, there is no doubt that hypertext documents do have some unique aspects: they speed up the rate of information retrieval and they do allow certain kinds of access to proceed at a pace which would previously have been thought impossible, or would have required massive and painstaking archival research. To take an example from a pre-hypertext database first: using the CD-ROM version of the OED, as opposed to its print version, allows a reader to find, say, all the words that have come into English from Russian since 1855 - more or less instantly. The same process could, in principle, be carried out on the print version, but this would necessitate a sequential inspection and selection of each entry in the 13 volumes. But, speed apart, nothing has effectively changed in terms of the process of reading cognition. It's merely that the very hard work of meticulous inspection has been taken over by a disc's scanning head linked to a software instruction. The scanning head proceeds in a precisely linear or syntagmatic fashion, allowing the reader access to a specified field of data which, once generated, appears to have a non-linear or paradigmatic character to it. Because of the speed of computer processing, it appears as if the paradigmatic interest of the reader simply 'leaps' into the foreground. But this neglects the machine-reading component which is, in fact, more fully linear and syntagmatic than any human processing capacity. The same goes for hypertext documents. The reader's paradigmatic interest is displayed in the unique path which she or he takes through a potentially infinite number
A. McHoul and P. Roe
of such paths in an information web. But each path, as the computer links from node to node, is a purely linear movement. Then, once retrieved, the image, sound or screenprint may or may not be inspected linearly. However it is inspected, the means of its inspection, at this point, will be precisely as it would be under any quite ordinary conditions of reading. Outside the hypertext environment, print can be inspected either sequentially or, say, globally: such as when one looks at a page for its typographical characteristics. Outside the hypertext environment, still images are routinely inspected in non-linear fashion: in fact it's very hard to know what a linear reading of a photograph could be like - except that we know that computer scanners can divide photographs into pixels and proceed to reproduce them in a left-to-right, top-tobottom form. Again, it's the computer technology which is more linear than the human and quotidian method of inspection. Outside the hypertext environment, films and videos can be viewed in 'real' time, sequentially from frame 1_ to frame n - but simple VCR equipment also allows them to be looked at in freeze frame, in reverse, shot by shot, scene by scene and so on. Quite simply then, there is a very broad variety of processes, both inside and outside the hypertext environment, which can be called 'readings'. The celebration of the supposedly new 'readerly', 'exchange-based', and 'non-linear' forms of reading which hypertext permits may, then, be premature. Moreover, it may be based on (in order to be opposed to) a far too narrow conception of what 'ordinary' reading is. Let us turn to this problem. JUST READING 4 Hypertextualism, in its opposition to the 'writerly', the 'monologic' and the 'linear', appears to think that, prior to the advent of hypertext, reading was a single process, something like the scanning of a printed book from the first to the last word, with information passing into cognition in a sequence dictated by an author, allowing no space of intervention (no 'turn at talk', as it were) to the reader. Having read, on this picture of things, the reader simply 'has in mind' precisely what an author 'put there' and in the order that the author 'put it there'. There are numerous objections to this picture. A fairly simple one is that empirical analyses of reading have shown that readers do not simply add information bits to information bits in linear sequence. Rather, using what Garfinkel calls "the documentary method of interpretation", a very practical and ordinary form of the hermeneutic circle, readers build up a "gestalt contexture", a pattern of overall meaning which can modify - or else, be modified by subsequent text items (words, sentences, paragraphs, and so on) (Garfinkel 1967; McHoul, 1982). So here we have an objection to the particular picture of 'ordinary' reading held by hypertextualism. A second, and more serious, objection is that we can find no grounds at all for thinking that reading is a singular process of any kind, no matter what that process might be imagined to consist of. Between sections 156 and 171 of the Philosophical Investigations, Wittgenstein (1968: para 167) rejects the idea that reading is a "particular process" - especially the quite popular idea that this process is a purely mental one. He asks how it could be that "one particular process takes place" when we read. We might read a sentence in print and then read it in Morse code, to give his own (multimedia) example. In such a case, is the cognitive process the same? 4 This section is based on a chapter called "Reading Practices" from McHoul (in press).
Hypertext and Reading Cognition
We expect that most of us will think not. But Wittgenstein is not dogmatic about this. He wants to know why we come to think of the process as a particular one, as singular. And the tentative answer he gives is that we are perhaps fooled by the uniformity involved in "the experience of reading a page of print". He continues:
"the mere look of a printed line is itself extremely characteristic - it presents, that is, a quite special appearance, the letters all roughly the same size, akin in shape too, and always recurring; most of the words constantly repeated and enormously familiar to us, like well-known faces." (Wittgenstein, 1968: para 167)
But the uniformity of a page of print, and the repetition effect we get in scanning it for all that they point to a surface definiteness and specifiability - do not mean that reading - even in this highly 'linear' case of scanning a printed page i t s e l f - is a particular process. Instead, a brief inspection throws up a whole range of differences and distinctions regarding what the concept of reading might cover. Staten (1986: 84ff) speculates that one candidate for the essence of reading might be to specify it as being the derivation of repetitions from an original. And this, again, is one of the directions in which computer metaphors of reading have tended to take us - such that it is how computers work that becomes the model for 'ordinary' (non-electronic) readings and not vice versa. But then we also have to ask: what is to count as deriving? The problem simply shitts on to another terrain. Perhaps, Staten goes on, we should always refer to the 'systematic' derivation of, for example, sounds from marks. But we all know that it is possible to derive the wrong sounds. If someone does that: are they reading? Again, we could say that the essence of reading was the presence of a certain kind of inner experience, rather than a derivation. But we may, and do, have this experience while we are asleep or affected by drugs. Are we to say that, then, we are reading? Instead of looking for a definite and singular characteristic of reading, Wittgenstein suggests that we look upon reading as an "assemblage of characteristics". Moreover, according to Staten, these characteristics will:
"in each separate case of reading ... be variously reconstituted, and in these different reassemblings there will always be the infection of characteristics of what does not correspond to what we want to think of as really, essentially, reading .... It is as though these characteristics had dual membership in two mutually exclusive sets." (Staten, 1986: 85) To summarise: firstly, we cannot prespecify the characteristics which go to make up reading. Secondly, if we could, we would always find them in new and varied combinations, in any actual case of reading, regardless of whether the activity takes place inside or outside electronic environments. Thirdly, we will always find, in amongst them, characteristics which we should not want to associate with reading as such but which are crucial to that actual case. Reading is like soup or slime. We should not want to specify its essence according to any neat digital calculus: not that it has no soul as such - rather it has a multiplicity of souls and "any one of them could at some stage take over and guide the sequence in its own direction" (Staten, 1986: 103). It is because of, not despite, their pleomorphism that we recognise cases of reading. Reading, then, is a classic instance of what Wittgenstein calls a family resemblance phenomenon. It is not a single or particular cognitive process - rather it is a family of
A. McHoul and P. Roe
such processes, and a family whose members do not depend on the particular macrotechnologies (books, computers, teacups, night skies, and so on) which happen to deliver texts. Instead, atter Wittgenstein, we could think of the manifold forms that reading can take as technologies in their own fight - many of which can be transferred between macro-technological sites. For example, the ways in which Landow and others describe the 'revolutionary' forms of reading involved in hypertext scanning appear to us to be extremely close to the ways in which readers use reference works such as encyclopaedias. Hardly anyone (except perhaps a proofreader) would read such texts from start to finish. Instead a particular set of interests will lead a reader to an index, then to the selection of an item in print, then (perhaps) to a graphic, or to a crossreferenced item, back to the index, to a different source text and so on. Each item can be thought of as a node, if need be; and (again, if need be) the encyclopaedia and the internal and external texts to which it leads can be thought of as a web of such nodes. There is nothing new in this. It is a perfectly ordinary procedure and one which is but a minor member of the vast family of possible forms of reading cognition. The fact that it has currently cropped up in a particular electronic macro-technology is cause for neither celebration nor despair. Reading remains a complex family of activities, language games, or technologies. It always already had no single defining characteristic such that hypertext could be different from that characteristic. And it remains like this whether or not - today - we are referring to printed or electronic means of delivery (macro-technologies). Everyday life continues pretty much as it always has: perhaps a little faster, that's all. REFERENCES
Barthes, Roland, 1974. S/Z. Trans. R. Miller. New York: Hill and Wang. Barthes, Roland, 1977. From work to text. In: Image-music-text, 155-164. Trans. S. Heath. New York: Hill and Wang. Delany, Paul, and George P. Landow, 1991. Introduction. In: P. Delany and G.P. Landow, eds., Hypermedia and literary studies, 3-50. Cambridge: MIT Press. Derrida, Jacques, 1976. Of grammatology. Trans. G.C. Spivak. Baltimore: The Johns Hopkins University Press. Garfinkel, Harold, 1967. Studies in ethnomethodology. Englewood Cliffs: PrenticeHall. Gibson, William, 1988. Johnny mnemonic. In: Burning Chrome, 14-36. London: Grat~on Books. Landow, George P., 1991. The rhetoric of hypermedia: Some rules for authors. In: P. Delany & G.P. Landow, eds., Hypermedia and literary studies, 81-104. Cambridge: MIT Press. Landow, George P., 1992. Hypertext: The convergence of contemporary critical theory and technology. Baltimore: The Johns Hopkins University Press. Lucy, Niall, 1995. Debating Derrida. Melbourne: Melbourne University Press. McHoul, Alec, 1982. Telling how texts talk: Essays on reading and ethnomethodology. London: Routledge & Kegan Paul. McHoul, Alec, in press. Semiotic investigations: Towards an effective semiotics. Lincoln: University of Nebraska Press.
Hypertext and Reading Cognition
McKnight, Cliff, John Richardson and Andrew Dillon. 1989. The authoring of hypertext documents. In: R. McAleese, ed., Hypertext: Theory into practice, 138147. Oxford: Intellect Books. Nealon, Jeffery T., 1992. The discipline of deconstruction. PMLA 107:1266-1279. Ong, Walter, 1982. Orality and literacy: The technologizing of the word. London: Methuen. Provenzo, E., 1992. The electronic panopticon: Censorship, control, and indoctrination in a post-typographic culture. In: M. Tuman ed., Literacy online: The promise (and peril) of reading and writing with computers, 167-187. Pittsburgh: University of Pittsburgh Press. Shneiderman, Ben, 1992. Education by engagement and construction: A strategic education initiative for a multimedia renewal of American education. In: E. Barrett, ed., Sociomedia: Multimedia, hypermedia, and the social construction of knowledge, 13-26. Cambridge: MIT Press. Shneiderman, Ben, and Greg Kearsley. 1989. Hypertext hands-on: An introduction to a new way of organising and accessing information. Reading: Addison-Wesley. Slatin, John, 1991. Reading hypertext: Order and coherence in a new medium. In: P. Delany and G.P. Landow, eds., Hypermedia and literary studies, 153-169. Cambridge: MIT Press. Staten, Henry, 1986. Wittgenstein and Derrida. Lincoln: University of Nebraska Press. Whalley, Peter, 1993. An alternative rhetoric for hypertext. In: C. McKnight, A. Dillon and J. Richardson, eds., Hypertext: A psychological perspective, 7-18. Chichester: Ellis Horwood Limited. Wittgenstein, Ludwig, 1968. Philosophical investigations. Trans. G.E.M. Anscombe. Oxford: Blackwell. Yankelovich, Nicole, 1991. Reading and writing the electronic book. In: P. Delany and G.P. Landow, eds., Hypermedia and literary studies, 53-80. Cambridge: MIT Press.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 22 V E R B A L AND NON-VERBAL BEHAVIORS IN FACE TO FACE AND TV CONFERENCES Hiroshi Tamura & Sooja Choi* Kyoto Institute of Technology Department of Information Technology, Japan t amur a@hi sol. dj .kit.
It has oi~en been emphasized that facial or non-verbal cues are important in daily communication, and that the lack of visual cues in telephone talk is making speech communication unnatural or less satisfactory. One straightforward technology that will provide such lacking cues is the video-phone or TV conference which transmits moving images and speech at the same time. Assuming visual cues to be significant in human communication, people expected image transmission to be used widely and quickly in business and social communications. The use of image communication among public users, however, has been very slow in getting to be accepted. The primary reasons hindering its wide use were the high transmission costs and the lack of a comfortable use of the systems. Thanks to advances in electronics and digital transmission technology, the cost of image communication (when 64kbps transmission rate is chosen) is nowadays close to that of the speech communication. However, the cost of line transmission was not the only reason for TV conferencing and video telephoning's failure to be accepted by the users, many of whom felt difficulty in starting to talk to, and getting responses from, their audiences. Communication barriers seemed to be hidden behind high technology. A comprehensive explanation for the existence of such barriers was given by the researchers who participated in the development of these techniques. They were convinced that the mismatch in eye line-up among participants in the different sites, and the time delay between the image of the talkers in the display were the main reasons behind the psychological barriers. Some technical solutions focused on decreasing the time delay of the transmission, by implementing a broad band digital network (1.5 Mbps) and by improving the software of data compression and decompression. Also, some physical solutions have been proposed to minimize the eye
Sooja Choi, a co-author, has been granted partial support for the study reported in this paper from the Nakayama Foundation for Human Science in the 1995 fiscal year. The authors heartily appreciate this aid.
H. Tamura and S. Choi
line mismatch. But the mental stress in media conferences is not solely due to technical and hardware constraints. SUBJECTIVE EXPERIENCES It is not only in media conferences that people feel stressed. In meetings, they sometimes feel stressed because they are seated face to face, sometimes because they are seated apart so that they cannot communicate by sight. Actually this is a psychological phenomenon, not to be explained by technical reasons. We have done various experiments, using TV conference systems and videophone, and a great number of results of behavior observations have been collected. While these results reflect part of the truth, they are ot'ten hard to prove generally.
Figure 1. Face to face meeting
f/c,-) Out of sight, out of mind
Figure 2. TV conference with one person out of sight Fig. 1 shows a face to face meeting. Even in such meetings, people are not always watching each other. Nevertheless, if somebody talks to another person, the latter will
Verbal and Non-Verbal Behaviours
respond properly and everybody will recognize the event as happening. Now consider the case of TV conferencing, as illustrated in figure 2. Suppose there are three people (A, B, C) on one and the same site of a TV conference, but only two (A, B) of them are appearing on the screen. Even though the third person (C) can participate in verbal discussions, participants on the other site as well as person (C) him/herself will feel that C is being ignored in the discussion. Talk action by a person who is out of sight is not properly acknowledged by the participants on the other site. While many participants share this feeling, it is hard to substantiate it experimentally. As soon as we organize an experiment to test the disregard of the out of sight person in the talk action, such phenomena disappear. Thus, a proper experimental model of conferencing and a method of analysis are needed. TALKING HEAD IMAGE IN SPEECH UNDERSTANDING
Many papers have discussed the role of video in communication (Steve, 1993). Ostberg (1992) confirmed the positive effect of presenting a talking head video in a noisy environment in the case of students learning English as a foreign language. We have developed a method for examining the effect of the talking head image in an environment where two speech events are presented simultaneously. The differentiation of multiple speech events, using visual cues, has been discussed under the name of "cocktail party effect" (Cherry, 1953). The lateral differences in speech recognition were intensively studied in neurology (Kimura, 1961); selected overviews are given by Springer (1985). Our method is to evaluate the effect of the presentation of images in the presence of multiple speech input. First, over a hundred words are pronounced clearly; the visual image and spoken sound are recorded on a video disc. Each word is checked as to whether it can be perfectly understood when it is presented one after another in sequence. The spoken sounds are then stored in computer memory so that any pair of two spoken words can be reproduced simultaneously. The experimental task for the subjects is to listen to the two words presented simultaneously and to write down what could be recognized. Two methods of speech presentation are examined, as illustrated in figure 3. The first is the dichotic method, in which two spoken words are presented; the one from the left, the other from the right of the earphones. The second is the mixed method, in which two speech sounds are mixed electronically. The mixed sound is then presented from the both sides of the earphones. The task is labor intensive. Listing the speech events by a subjective estimate, recognition of a spoken word which is clearly pronounced is so easy that nobody feels stressed performing the task. If the subject has to recognize the words under considerable noise, however, the level of difficulty of the task may increase. If we subjectively define the difficulty of a speech recognition task in the absence of noise as level 1, then the same task of recognition in the presence of noise, magnitude of which is comparable to the signal, is of level 2. The task of recognizing speech presented dichotically being relatively difficult, it is estimated at level 4. A possible explanation for this difference in task levels goes as follows.
H. Tamura and S. Choi
A~b-~ mix ~ B
Figure 3. Dichotic and mixed presentations for plural speech recognition Reception of a word by one ear is disturbed by noise of a comparable magnitude from the other ear, which sets the difficulty of recognizing speech from one ear at task level 2. When the ears have to perform two tasks simultaneously, the level is doubled. When dichotic method is used, the words from the right ear are understood better than the ones from the left. The overall rate of correct recognition is about 63% for the novice; having gained some experience, a number of subjects show over 80% correct answers. Recognizing the speech events by the mixed method is still higher on the difficulty scale; it is estimated at task level 6. In the mixed method of presentation, there are no lateral cues for differentiating two spoken words. For novice subjects, the recognition rate is about 30%. If the auditory cues for differentiating two spoken words are less easily available, the subjects will have to make more intensive use of other, such as visual cues. Thus, the effect of image presentation will be more apparent in such intensive listening tasks. In the case of TV conferencing, various speech events are mixed within a site. Thus, the spatial cues are not fully available. Such a mixed presentation simulates the listening conditions in TV conferencing. The purpose of this experiment is to estimate the effect of image presentation. A talking head image was used. The video image of the talker is recorded, together with the speech sound, on a video disc. One talking head image is presented together with the two speech events. One word out of the two presented at the same time is accompanied by the talking head image. For short, the latter word is called 'word with image'; the former, unaccompanied by a talking head image, is called 'word without image'. The correct ratio of recognition in the case of the word with image is higher than that of the word without image (Tamura & Chen 1993). The difference, 12%, is statistically significant overall. The effect of image presentation is different for different consonants. For example, the effect is prominent for the labial consonants(19%). Also, some cultural and sex
Verbal and Non-Verbal Behaviours
differences account for different effects in presenting a talking head image (Tamura and Chen, 1994). Thus, we are able to confirm the positive effect of image presentation in speech recognition. We are now trying to find a research method that will provide us with reproducible results on TV conferencing. COMMUNICATION PROCESSES The purpose of a conference is communication among participants. In a limited sense, communication used to be defined as the transmission of messages. Various technical tools have been introduced to support clear and exact message transmission; the microphone, the loudspeaker, the slide projector, the overhead projector, and the copying machine. Communication, however is not a mere transfer of messages. Communication is the process of mutual understanding and reaching consensus. In order to be able to promote mutual understanding, people have to be aware of the barriers of mutual understanding, while maintaining their sense of intimacy. There are various barriers of communication, such as distance, time, language, status, knowledge. In order to further communication, it is necessary to find out where the barriers of communication are located. Communication is the process of finding and removing such barriers. The processes of communication may include finding the opponent one wants to talk to, specifying the problems, catching the opportunity to talk, preparing the proper expressions, following up on one's commitments, and so on (Tamura, 1990). In evaluating the media communication, not only message transfer, but all the different
Narrow sense Communication = message transmission Broader sense communication = mutual understanding. communication technology = to find out the barriers of communication and to remove them Processes of communication search for whom to contact refine problems catch opportunity proper expression transfer or redirect following up
Figure 4. Concept of communication
H. Tamura and S. Choi
communication processes should be examined, in order to see whether they are properly supported. It is important to differentiate between the various communication processes, because the introduction of technical tools like TV conferencing will effect each process differently. Most of the technical tools for communication support have supported message transmission. The introduction of TV or moving images into a conference as a means of transfer of visual messages will have a certain effect on communication, for example in a conference where the color, shape and motion of a new model car are to be discussed. The problem is not to emphasize the need of TVsupported conferencing, whenever visual materials are essential in the presentation and discussion. The problem is how it is useful in those televised conferences where mainly talking head images are transferred through the video channel. It is an interesting fact that there actually existed a telephone service to support conferencing, sometimes called 'the chat line'. Mostly, this service was (over)used by young people for trivial purposes of communication. But others have experimented with using the 'chat line' to coordinate international conferences and for the preparation of ISO dratts. Such speech-only 'lowbrow' conferencing is usable not only in trivial contexts, but also for purposes of higher order, in 'highbrow' conferencing. While this service has not been recognized as one of the respected media of social communication, the question remains how 'lowbrow' and 'highbrow' communication will each be influenced by the introduction of a visual channel. CONFERENCE MODEL In experimental studies of conferencing, various models are used to compare human conference behavior in various media environments. A model should be designed in such a way that every process of communication may be included, and examined as to whether it is properly supported. At the same time, the model should be easily understood by the participants: no special knowledge or interest should be required. Since the main purpose of the experiments is to show the effect of the media on conferencing, the individual talks at the conferences should not be too long, and as many participants as possible should take part in discussions in the shortest possible period of time.
Material Exchange A simple conference model is derived on the basis of an exchange of material scenario. Suppose a total of 6 participants, A, B...,F are located on two sites, S 1 and $2. Each participant is given a storage list of various materials, each corresponding to a row in Table 1. Storage status may be either rich (positive), balance (zero) or poor (negative). If the storage status is poor for one particular material, the participants have to find some others who are rich, and ask for an offer. Participants are not informed of others' storage status at the beginning, but must get to understand who are poor and who are rich by listening to the requests and offers from the other participants. When some rich participant makes a poor participant an offer, the rest of the poor participants try to catch the chance and get further offers before the rich participant's storage runs out.
Verbal and Non-Verbal Behaviours
session #
site individual A Sl B C
20 10 -20
-10 30 30
-10 20 -20
total storage
0 10 10
-30 -10 -10
-10 -10 -20
-30 -10 10
-20 10 20
10 20 -10
0 30 -10
0 20 LT, MT LT, MT
30 LT
-30 MT
-20 MT
Table 1. Session Table The total storage is the sum of the individual storage. A session may end fast, viz., when the total storage is positive; conversely it may last for a long time when the total storage is negative. The total storage is not known to the participants. Talk within a site is named local talk (denoted by LT); talk between sites is called media talk (denoted by MT in Fig. 5). Based on the allocation of storage, the type of talk which dominates a particular session may be hypothesized to occur as shown in the bottom row of Table 1. Talk not directed to a specific person, but addressing all participants is called public talk (denoted by PT). Public talk does not request others to respond specifically, but tries to manipulate the atmosphere. In case of ISDN TV conferencing, a call for talk is sometimes not acknowledged by the specified participants; this may be at least partially due to technical reasons. Such neglected call for talk is indicated by NT. The conference rules can be set as follows. The rich could offer the requested amount, if storage is sufficient; alternatively, they might offer an arbitrary amount in response to a request, or they might even refuse to make a deal altogether. A session starts by somebody nominating some material for exchange. When the positive storage is less than the nagative storage, the poor will try to find someone with excess positive storege; this may go on for a long time. When every participant has become rich, or when the remaining poor have given up requesting for offers, the session is closed. Then the players go on to the next session to discuss the exchange of another material. Face-to-face and Cable TV
Typical behavioral characteristics on the face-to-face and CATV conference are illustrated in Fig.5. In the face-to-face conference participants are seated on two tables facing together in a room. The local and the media talks are equally observed in faceto-face conference. In case of the local talk, they normally move faces to see each other. One typical behavior in face-to-face conference is the multiple media talk, that is more than two pairs of participants talk together. This is suggestive of the channel capacity in natural space communication.
H. Tamura and S. Choi
Conference experiments were done using cable TV networks. Cable TV transfers the talking head images and the speech sound from one to another site without delay. It is almost equal to TV conferencing, except the transmission time delay of the image is short. In C ATV conference, media talk and local talk are normally observed. Local talk is more frequent at the beginning of a session. The media and the local talks happen sometime simultaneously but the media talk happens one at a time. Some differences between face-to-face and CATV conferences are observed in the speech pattern of the talk. In the face-to-face conference, a talk is often not a complete sentence but a fragmental word. They constitute a meaningful sentence by some exchanges of talks. In CATV conference, participants exchanged more formal language in communication.
I ~,~
Jface to face ~,~
speech --~
speech with echo canceler
[~ image
\ PT public talk ,,,-, 9 .
speech synchronized to image with echo canceler
~ 1/~rd delayed Image
- 9 neglected talk
signs used to show auditory and visual media, speech actions Figure 5. Behavioral characteristics of face-to-face and CATV conference ISDN TV Conference
ISDN TV conference systems bidirectionally transmit speech and images from site to site, but with some delay in image transfer. An ISDN 64kbps image was used in order to evaluate the effect of the time delay more explicitly (the most frequently used transmission speed is also 64kbps). The speech can be synchronized with the image or be transferred immediately (desynchronized). In these conference systems, a special technology is normally installed to cancel the echo effect. When speech transferred from one site is reproduced on the other site, part of the reproduced sound may be picked up by the microphone on the other site, together with the original speech of the
Verbal and Non-Verbal Behaviours
participants on that site. The echo canceler is designed to specifically suppress the acoustic components reproduced from the speaker.
visual delay
~_______j N-T .~.J visual & a u d i t o r y delay
Figure 6. Behavioral characteristics of ISDN-TV conference But in the actual use, if one person on one site is talking loudly, the speech from the other site is totally suppressed, so that it is easy for a person who has started to talk to continue talking, whereas those who want to break into the ongoing talk have their speech suppressed. The behavioral characteristics of ISDN-TV conferencing are shown in Fig.6. The system shown to the left is the desynchronized system, in which speech from the other site comes through immediately, while images arrive some hundred milliseconds later. This setting shows both media talk (MT) and local talk (LT), while calls for talk across the media are sometimes neglected (NT), even though some participants engaged in vivid activities in order to catch the attention of the people on the other site. But due to the desynchronization of image and sound, many such calls are disregarded, which is not convenient for the participants. Especially when speech comes faster than pictures, the participants tend to talk without watching the talking head images. It is worth mentioning that in the natural environment, due to the transmission delay of acoustic waves in the air, auditory events are always associated with some delay compared to visual events. So the sensory perceptive systems of people are adapted to accepting auditory events as preceded by visual events. However, in multi-media environments, the visual and auditory events are impacting the receptive systems in a fashion that is contrary to what is experienced in the natural environment. The mental effects of this contrary relation have not been studied thoroughly so far. In Fig.6, the figure to the right shows the case of ISDN-TV conferencing when delays occur in transmission of speech and images. Due to this delay, the interval between one piece of talk and the next is longer than the interval in the other set ups. Also, due to the echo canceler effect, efforts to break into an ongoing speech event are oiien unsuccessful. And since it is hard to begin talking with one of the participants on the opposite site, participants tend to start talking locally. Thus, LT increases. In general, when their number increases, participants prefer LT to MT. The conference is
H. Tamura and S. Choi
no longer a discussion among all the participants, but it is more like a negotiation between two parties located on different sites. If a debate starts within one site, the participants on the other site become listeners to, but also outsider to, that debate. It requires special skills on the part of the chairman to have a debate that is participated in by all the participants.
Emotional Activity One important aspect of conferencing is to enhance intimacy among the participants. So it is important to establish a measure of the emotional activities going on at conferences (Choi and Tamura, 1995). The different dialogue types observed in face-to-face and C ATV conferencing could be examples of such a measure. In the model conference, the measure introduced to evaluate emotional activities was the degree of smiling occurring during the talk. Smiles cannot be measured objectively. Subjectively, smiling was rated into 4 categories by the experimenter, i.e. no smile, passive smile, active smile and laughter. No smile is the case when a participant is serious; passive smile is a smile alter smile of others; active smile is a smile initiated by oneself; and laughter is the case when smile or laughs occur which seem unstoppable and are shared by many participants. These measurements were then applied to face-to-face and CATV conferences. The results showed that participants are more emotionally active in face-to-face conferencing. Emotional activities may be enhanced or suppressed by modifying the conference rules. In conferences in which a deal among participants are permitted, emotional activities are enhanced. Nonvocal behaviors Depending on the scenario and the conference rules, visual cues in communication are used in different ways. In cases where participants have to look into printed data to answer requests, they do not look at the display screen even in TV conferencing. In many cases, model conferences can be done without looking at visual displays. In such cases, visual cues are not essential to communication. The participants are not aware of how they use visual cues in such conferences. Thus, for some well defined types of conferences, we could say that visual cues are used implicitly. Next, we are trying to examine those conferences in which the visual cues are used actively by the participants. A simple modification was introduced in the conference model described above. Previously, the order of the materials to be discussed tbr exchange was determined in advance. Under the modified rules, only the state of storage of various materials was given to each participants, and the order of discussion was lelt for the participants to determine. So a simple group decision procedure was introduced into the conference model. Most participants would like to have the materials discussed first for which their storage is negative. They use toss-ups, Jan ken (a Japanese style hand sign match ), facial expressions of request and offer. In case there are many conflicting interests, a show of hands is adopted as decision procedure. Introducing group decisions activates the entire conference process. Some participants raise their hands to get the floor, some start talking louder in order to appeal for a particular material to obtain priority. Toss-ups are not suitable for TV conferences because the image definition on the video screen is not sufficient. Jan ken is good, especially for CATV conferencing,
Verbal and Non-Verbal Behaviours
because it makes for a vivid atmosphere of action 9 However, in ISDN-TV conferencing, because of the delay in image transmission, the hand signs presented on one site are seen a second later on the other 9 Thus, the participants feel that the others are not showing their hand sign at the same time as they do themselves, and in this way participants may lose mutual confidence in their actions. As to a show of hands, this procedure is used quite otten. But, since it takes time to wait until everybody's hands are up, participants sometimes try to force the decision on which materials should be discussed first by using a loud voice. TV CONFERENCE BY MULTIPLE SITES More than two sites can be linked to organize TV Conferencing in ISDN network.
I "L
first priori - - - - ~ second priority a) structure with the central control
. . . . . . ~.;_..~....
b) structure by loop interactions
Figure 7. Configurations of Multi-Site TV Conference Various network configurations for multi-sited TV conferencing are possible. In the configuration of Fig. 7a, each site sends its own speech and images to the conference service unit located in the network station; the unit will determine which terminal's signal is to be transferred where 9 A simple logic for determining this delivery is based on the magnitude of the speech sound. The site whose speech sound has been largest in the recent past gets the first priority. The site with the speech sound of the second largest magnitude gets the second priority. The image of the site with the first priority is sent to the other sites. The image of the second priority site is transferred to the site of the first priority. Images of sites with lower than third priorities are totally neglected in this configuration. Thus, if one site has got the first priority, it is easy for that site to keep talking. By contrast, it is hard to begin talking from a site of inferior priority. Furthermore, a show of hands is not applicable as a decision procedure in this configuration, as the participants in the lowest priority sites will have to wait to get the floor until the participants in a site of higher priority stop talking entirely. Consequently, this type of conferencing may be used for one-to-many message
H. Tamura and S. Choi
transmission, but not for a real discussion among all the participants, since it may take sometime before everyone agrees to give the floor to a particular person. In the case where there is a chairman, no support for his coordinating the floor is provided. An alternative configuration of multi-sited TV conferencing is shown in Fig. 7b. The network connections constitute a loop. Each terminal receives image data from the previous terminal and appends its own data to the area of the screen specified for the terminal and sends out the image and speech data to the next terminal. In this configuration, the image screen is divided beforehand into, for example, 4 parts, thus keeping each subscreen small. This configuration may be used for talk between four people in different locations. FLEXIBLE CONNECTIONS OF N E T W O R K The needs for visual information vary in the course of discussions. Present day technologies are capable of providing various services, but not always in a timely fashion. For example, sometimes the talking head image is useful for allowing one to grasp the general atmosphere. When discussions go into details, a still picture of high definition is especially relevant. When specific parts of the picture are pointed at by the participants, it is essential that the pointer moves quickly. When participants are consulting written documents, the talking head image is not useful. When motion of the image is essential, the image transmission speed should be increased. The actual network service does not fit these requirements for flexibility. The network functions are mainly constrained by channel capacity, which determines the transmission speed of the data. When a connection is open, the user has to specify the channel capacity for the connection. The specified capacity is then set aside for the connection, regardless of actual traffic needs. If users want to have a smooth data transmission at peak traffic, they have to declare their exclusive use of a large channel capacity, which increases the cost to the users. The network service should be capable of changing channel capacity during communication without interrupting the connection. At present, in order to change the channel capacity, the user has to stop talking, break the connection and start all over again from dialing process. By the same constraint, when a user wants to change the conference configuration, or change from telephone mode to still picture mode, the connections are required to be set up again. Such troublesome requirements for network handling are not understandable for the users. Current developments emphasize the importance of a seamless communication. If such a seamless communication is only available when a very large channel capacity is exclusively used, this technology will never come to be accepted for a wider use. The first step towards a real seamless communication is a smooth change of channel capacity by the users. CONCLUDING REMARKS Bidirectional image exchange, although expected to further a more natural communication, still has various imperfections to deal with, such as transmission delay, an inflexible network structure, and various difficulties hindering participants from taking freely part in an ongoing debate. There is at present, however, technology
Verbal and Non-Verbal Behaviours
available for dealing with these problems, provided it is properly and knowledgeably applied. The service providers should develop a more flexible service style, while keeping the cost reasonable. Conferencing is a metaphor. People assume there to be free and unconstrained communication in conferencing. But actual conferences are more formal than ideally conceived, and communication is constrained in various ways. Speech acts are often unidirectional; participants are processing only the matters specified for their own tasks; decisions depend on shared responsibilities. Furthermore, participants are accustomed to use a verbal mode of expression for official communication. Even in the model conference which we experimentally organized, people tended to act as formal participants as much as possible, and didn't show much nonverbal or nonvocal action. When people are allowed a choice of media or communication channels they prefer the use of verbal expressions. When the verbal channel is busy or there are conflicting signals, people start using behavioral modes of expression. However, since it takes time to obtain agreement from all the participants in verbal negotiation, a visual decision process is introduced, like a show of hands. While official conferences are formal and apt to use correct speech, private meetings of arbitrary groups or daily family talks tend to use nonvocal expressions. One potential use of TV conferencing is in the non business area. Nowadays, members of a family more often live separately in different places. The question is whether TV phone or conferencing might be used satisfactorily for family communication in such conditions. In particular, the female participants in our experiments felt that could spend their time pleasantly, talking freely to each other through a TV conferencing setup. REFERENCES
Argyle, M., 1969. Social Interactions. London: Methuen. Chen, Y., H. Tamura, Y. Shibuya, and AIto, 1994. Analysis of Presenting Talking Head Video by Method of Recognizing Plural Speech Words. Transaction of Institute of Electronic, Information and Communication Engineers (Japan) J77-DII: 1484-1491 Cherry, E.C., 1953. Some Experiments upon the Recognition of Speech, with One and Two Ears. Journal of Acoustic Society America 25:975-979 Choi, S., and H. Tamura, 1995. Behavior Analysis of TV Conferences by Introduction of the Conference Model. Progress in Human Interface, ( in print ). Mey, J., and H.Tamura, 1992. Barriers of Communication in a Computer Age. AI & Society 6: 62-77. Ostberg, O., and Y. Horie, 1992. Contribution of Visual Images to Speech Intelligibility. Proceedings SPIE, Human Vision Processing, and Digital Display III 1666: 522-533. Shibuya, Y., and H. Tamura, 1993. Use of Bi-directional Image Exchange in Facilitating Precontact Communication. Advances in Human Factors/Ergonomics 19B: 943-948. Springer, S. P., and G. Deutsch, 1981. Left Brain, Right Brain. San Francisco: Freeman & Co.
H. Tamura and S. Choi
Steve, W., and S. Robert, 1993. Tuming away from Talking Head: The use of Videoas-Data in Neuro Surgery. INTERCHI' 93: 327-334. ACM. Tamura, Hiroshi, 1990. Invitation to the Human Interface. Journal of Institute of Television Engineers of Japan 44:961-966. Tamura, Hiroshi, 1991. Human Interface in Manufacturing. Human Interface 7: 639-644. Tamura, Hiroshi, Y. Chen and Y. Shibuya, 1993. Effect of Image Presentation to the Cognition of Plural Speech. Advances in Human Factors/Ergonomics 19B: 62-67. Tamura, Hiroshi, 1994. Human Information Technology. System Control and Information 38:245-251 Tamura, Hiroshi, S. Choi, K. Kamada, and Y. Shibuya, 1994. Representation of Mental Model of Media Users and the Application to TV Conference. Progress in Human Interface 3:31-38.
Chapter 23
Management School, University of Sheffield, UK [email protected]
ABSTRACT The advantages of electronic argumentation as it exists in current technology emphasise argumentation as a rational process of concept development and communication which needs to be systematically managed. These are important advantages which will push such technology forward into an increasing number of future applications. However, people like to use subtlety and tricks and to avoid explicitness to win arguments and current technology does not allow this. If this problem is ignored then either technology-generated dialogue will seem unnatural and over-explicit and information content will seem guarded or else people will avoid using the technology. Either the tricks, implicitness and subtlety have to be programmed or if this is too difficult then we must invite ongoing participatory design by users. INTRODUCTION Would electronic argumentation narrow or widen how you express yourself?. If you were able to electronically create, manipulate and exchange issues, positions, arguments, questions, options, criteria, assumptions, decisions, problems, and design objects, would this constrain or widen your opportunities for being effective within your organization? Such electronic environments are becoming available (Conklin and Begeman, 1988; Conklin and Burgess Yakemovic, 1991; Lee, 1990; Ramesh and Dhar, 1992), and so this question is opportune. C O M P U T E R MODELS OF ARGUMENTATION The usefulness of rhetoric for the computational modelling of argumentation has been suggested by Sillince and Minors (1992) and Sillince (1994). Rhetorical argumentation comprises arguments such as reciprocity (return a favour or a hurt), deterrence (dissuade an action by a threat), fairness (equal treatment for equal cases), consequences (do X because X has good consequences), commitment (keep going in this direction because too much effort has been invested to change now), and so on. It uses the Toulmin (1958) structure of datum (initial evidence), warrant (an inference
J.A.A. Sillince
rule linking evidence to claim) and claim. Some examples of warrants are given in Figure 1.
incompatibility X contradicts Y and X is true so Y is false. transitivity X implies Y and Y implies Z so therefore X implies Z. inclusion of part in whole Knowing the housing area they came from I knew what sort o f people they were.
inclusion of whole in part His behaviour lets the whole side down. deduction I f X is true then Y is true and X is true so therefore Y is true. fairness I f men and women do the same work, then they shouM get the same pay.
promise I shouM take Kathryn to the theatre because Ipromised. reciprocity He shouM mow my lawn because I mowed his lawn last week. commitment We shouM not give up now when we have sunk so much effort into the project.
deterrence precedent authority sacrifice dissociation hierarchy without limit example
I f the UN is a strong policeman then regional conflicts will be deterred from happening. What happened last year is relevant to what we shouM do now. Dr.Johnson had a low opinion o f patriotism and so shouM we. The book was so important that he gave up his holiday to write it. Mr X is not a real animal lover if he treats the pony like that. The women and children were first into the boats. The company cannot continue to lose money like this. Jesus washed the prostitute's feet. Similarly we shouM have compassion.
The two sides in Northern Ireland are like fighting cocks. They shouM be kept apart.
probability variability
In most similar situations X has been true so probably X is true.
The boxer was heavier than his opponent which gave an advantage.
The window broke just after the brick was seen to be thrown. So the brick broke the window.
quantitative difference
The missile agreement only requires some small concessions o f a few more warheads.
qualitative difference
The breakdown in negotiations is caused by a fundamental difference in outlook on how to verify agreements.
X is an end consequences
Our goal is a reduction in traffic accidents. I f there is no GA TT agreement then a trade war might occur.
Electronic Argumentation X is a means
More barriers in residential areas wouM reduce traffic accidents.
X is good for Y
Reducing traffic accidents is good for us all.
X is caused by Y
The weak pound is caused by poor economic performance.
minimise loss time.
Cut your losses and take your money out while there's still
maximise gain
Choosing the highest profits maximises gain.
Person X is bad
President X uses torture of pofitical prisoners.
Mr. X was sane and so was responsible when he did the murder.
X is necessary means
Some down payment is required besides a loan. A Channel bridge wouM be expensive.
X is costly Agent
The burglar wore a red jacket
It is normal to sleep at night, so sleeping at night is good.
Economy of means
The window broke, and a suspicious man, are parts of burglary theory, and evidence showing that the man broke the window strengthens the theory.
No alternative
You will have to take a taxi from the train because there is no bus.
Is implies ought
X is true therefore X shouM be true.
Necessary means
You will only get wet if it rains hard, but this is drizzly weather, so you don't need an umbrella.
We are discussing the abortion issue- the state of education is neither here nor there.
The X Party has supported the struggle of blacks in South Africa
Figure 1. List of a sample of rhetorical warrants, with examples. Argumentation arises from its context in two ways. Firstly, there are political turns, which develop from problems or conflicts and which are solved or resolved. Such turns take place continuously in a series of (sometimes) unconnected events. Secondly, any project represents a coming to a decision, or an intellectual movement from vague to precise idea forming. This movement can be represented as the evolution of an argument graph, whose nodes are premises or conclusions, and whose links are warrants which support or attack inferences from premises to conclusions. Such a graph will evolve from an early form, where there are many gaps in justification (i.e. many links missing between nodes) to a late form, where a main conclusion is attacked or supported by a large number of links or chains of links. Several argumentation-based models have been proposed for supporting group discussion and design. Examples include the graphical issue-based information system or gIBIS (Conklin and Burgess Yakemovic, 1991; CMSI, 1992). Recent extensions beyond argumentation have been proposed (Buckingham Shum and Hammond, 1994;
J.A.A. Sillince
R a m e s h and Dhar, 1992). The m o d e l discussed here attempts to include m a n y o f the features o f these extensions within a basic a r g u m e n t a t i o n model. A c c o r d i n g to this model, there are six types o f warrant which enable a conclusion to be inferred f r o m a premise. A claim-warrant enables a claim to be substantiated from a d a t u m (Toulmin, 1958) using rhetorical rules such as reciprocity, fairness, or deterrence, or quasi-logical rules such as d e d u c t i o n and induction. A solution-warrant enables a solution to be identified conclusion conclusion
= =
claim I solution [ resolution [ answer I premise fact
conclusion [ datum I claim [ problem I solution [ conflict I resolution I question [ answer claim-warrant I solution-warrant I resolution-warrant I answer-warrant [ theory-warrant
datum fact problem conflict question means claim claim-warrant
= =. = = = = = =
solution solution-warrant resolution resolutionwarrant answer
= = = =
answer-warrant goal
= =
(question, answer-warrant) I premise 'frame slot filled' fact
'means-goal link'
theory theory-warrant
= =
premise I conclusion I null premise [ hypothesis I null 'fact-hypothesis difference'
fact I premise 'database lookup' I premise I conclusion 'goal-fact difference' goal-goal difference' I ' means-means difference' [ 'agent-agent difference' 'frame slot empty' fact (datum,claim-warrant) I premise deduction I induction I fairness I reciprocity I commitment I minimise-cost I maximise-gain I without-limit I X is necessary means I X is costly I agent I consequences I responsibility I X is a means I X good for Y I hierarchy I authority I precedent I stages I promise I qualitative difference [ quantitative difference [ad hominem I dissociation I probability [ variation [comparison [ X caused Y [ categorisation [ inclusion of part in whole I inclusion of whole in part I analogy I example I sacrifice (problem, solution-warrant) ! premise 'means-goal link' (conflict, resolution-warrant) I premise 'goal-goal link' [ 'agent-agent link' '
Figure 2. Premise, warrant and conclusion. f r o m a problem. A r e s o l u t i o n - w a r r a n t enables a resolution to be identified f r o m a conflict. A g o a l - w a r r a n t enables a goal to be identified using a means. An answer-
Electronic Argumentation
warrant enables a question to be answered. A theory warrant enables a hypothesis to be tested. Because these warrants can be chained together (the end of one forming the start of the next one) it is possible for any claim, solution, resolution or answer to become the premise of the next, thus enabling the building of plans (Marshall, 1989). For example, from step 3 to step 7 in Figure 3 there is a type-change from Claim to Problem. 'The sky is red this morning'
'That means bad weather'
'You must take an umbrella' Figure 3(a) Example dialogue.
1. Premise: 2. Claim-warrant: 3. Claim: 4. Goal: 5. Fact: 6. Fact: 7. Problem: 8. Solution-warrant: 9. Solution:
(From user input) 'The morning sky is red'. (From database) 'If morning sky red then bad weather' (Program generated) 'Today will be bad weather'. (From database) 'Avoid getting wet'. (A claim can be a fact) 'Today will be bad weather'. (From database) 'People get wet in bad weather' (Fact-goal difference) 'Today will be bad weather'. 'If the weather is bad then take an umbrella', (Fromdatabase) 'Take an umbrella' (Program-generated)
Figure 3(b). Model applied to the example dialogue. Premises and conclusions form nodes, and the five types of warrants form links, in an Argument Graph (Figure 4), which represents the state of a discussion at any moment between several people. The graph is recursive, because any node may itself contain a graph (Sillince and Minors, 1992). An intelligent argumentation support tool enables people to argue within such an environment. It enables people to express themselves in terms of fairness, reciprocity, deterrence, precedence, generalisation, authority, probability, commitment, sacrifice, deduction, and many other logical or rhetorical ways. It structures and displays the arguments as evidence, warrants, and claims, and could even calculate an estimate of argument strength based on previously agreed user-determined criteria. The resulting structure is a screen-displayed graph of supporting and attacking claims. Several partial, experimental systems exist, although practical, complete and implemented systems do not. Their claimed benefits are illuminating (see Figure 5 for information about CM/1, a commercially available system which uses some aspects of argumentation for discussion and policy exploration). They emphasise argumentation as a rational process of concept development and communication which needs to be systematically managed. The message of this paper will be that despite the importance of such claimed benefits (whether or not they exist need not concern us here) there are other aspects af argumentation which we ignore at our peril. Technology which fails to
J.A.A. Sillince
support these other aspects will not be used, or will result in stilted behaviour which is poor in emotional, political and social content (Smithin and Eden, 1986). Graph
SetotEdges • SetofVertices
Edge *
Vertex *
Relation • Vertex • Vertex x Attributes
Node x Attributes
Opinion x Opinion x Timestamp
DegreeOfBelief x DegreeOfImportance x Hypothesis
Yes I No CardinalNumber
Attacks I Supports
Figure 4. Argument representation language (Sillince and Minors, 1992).. Argumentation support forces one to define one's own position on the value of individual conflict. If one thinks that technological support for conflict should be reduced as much as possible, there is the danger that such conflict will be pursued in other contexts (face to face, on the telephone) or in other ways (in the courts, as violence). We share the view of Easterbrook (1993) that suppressing conflict leads to frustration and misunderstanding. These comments can be generalised to the individual emotional, political and social responses to technological support, and how that support serves to help or hinder these dimensions of argumentation behaviour. One of the characteristics of argumentation is that it is self-referential. Users need to be able to question and continuously redesign any support system. Although it is tempting to let users decide the rules, if too much has to be resolved by consensus, then the discussants may waste time arguing about the ground rules. Nevertheless, members of organisations thrive on such ambiguous situations- which provide a respite from 'work', enabling social, emotional and symbolic communication to flourish, providing a means of complaining and thus for letting off steam, enabling the indulgence in gossip and intrigue, and putting ordinary organisation members at centre stage. One of the advantages of electronic discussion is that group membership can fluctuate according to relevant knowledge (group boundaries become more fuzzy). Indeed, some have argued for the importance of social, non-task related interactions, and tools are being built specially to support them (e.g. Root, 1988). So time spent by
Electronic Argumentation
the users on the ground rules (a kind of ongoing, participatory design) may have beneficial effects. Some of the issues of argumentation support can be subsumed within matters relating to groups and will not be explored further here. The questions of group size and composition (protagonists, judges, audience, witnesses) and organisational roles of group members, for example, are relevant because they raise issues of group dynamics, exclusion of minorities, scapegoating, bias to polarisation, unstable coalitions, tactical voting and alliances, and 'groupthink'. Similarly there are issues of floor taking - timerationed or unrationed turntaking, warranted (relevant) interruption and reversion, or deference to organisational status. Knowledge is relevant as evidence and backing in argumentation- it can be withheld or declared secret, exaggerated, exchanged, bartered, coerced, and attributed ownership (personal or institutional). However, lack of space prevents further discussion of these matters. Decreased time in meetings Acceleration of teamwork Reduction of reinventing the wheel Higher quality decisions Equal valuation of individual contributions Circumvention of hierarchy-based decisions Exposure of hidden assumptions Exposure of rhetoric and hand-waving Fewer restarts Access to previously developed solutions Reduction of interpersonal confrontation and non-pertinent interactions Easy topic reorganisation and grouping using unique hypertext mechanisms Direct access to interrelated documents and other artifacts Powerful retrieval of archival structures using both visual navigation and search technology Organisation of all information leading to decisions Documentation of decision processes Promotion of learning through a living archive of the decisions Information sharing Transfer of knowledge Figure 5. Claimed benefits of an argumentation-based issue based information tool (CMSI, 1992).
Some argumentation behaviour is defined by common agreement or by centrally defined rules, with proscriptions and threatened sanctions. When the discussion is
J.A.A. Sillince
mediated using computer technology, new kinds of problems arise. There is a tradeoff between implementation of such control strategies and overhead borne by users, in terms of information overload, failure to remember, coordination problems, and production blocking (Nunamaker et al., 1991), and extra steps required of the user (Buckingham Shum and Hammond, 1994) who may not be the person who benefits from the extra work involved (Grudin, 1988). We consider some aspects of what the user might want to do, and how the technology might help or hinder this, in terms of some hypotheses. HI. Information is more open and honest when senders can eontextualise and ambiguate it. Senders want their contributions not to be taken out of context and only
to be used in conformance with the sender's wishes. A criticism to the boss may have to be couched in softer language than a criticism of a subordinate. If it were not possible to criticise tactfully then other media (face to face or telephone, or via an intermediary) might be used instead. A choice of bandwidths gives users greater control. Telephone contact avoids eye contact. Asynchronous interaction may be preferred over synchronous interaction when social relations are poor (Markus, 1992). The adding of tactful or polite prefaces or disclaimers to messages may not be totally effective when sensitive information which had an identifiable sender (such as a criticism, or information which formalises the knowledge of someone and thus makes them more dispensable) can be cut and pasted out of context and sent to other than the original receiver. Therefore propositions (like the decisions they justify) should ideally be situated by the use of disclaimers and qualifiers - users should be able to choose the degree of 'fuzziness', ambiguity, uncertainty or softening. In argumentation, contextual information would include - claim by whom, what a claim points to, what it attacks or supports. It is the absence of this information that Suchman (1994) takes such objection to, when she criticises technology for forcing explicit user selection of illocutionary force, propositional content, and temporal relationships to other speech acts. We would argue that such explicitness and directness needs to be able to be softened or blurred. People may wish to be tactful, or to avoid commitment, or to delay position-taking, or to seem not to be exerting pressure, or to seem not to have an opinion on a topic. In many cases people do not wish to make their intentions k n o w n in argumentation, having to divulge one's intentions would be to weaken one's own position because in many cases arguments are more persuasive when their intention is hidden. Constantly having to make intentions explicit would decrease organisation members' autonomy (Suchman, 1994). The cost of avoiding this is the time-consuming task of enabling users to choose when intentions should be revealed. Another aspect of disclaiming ('only my opinion', 'strictly off the record', 'correct me if I'm wrong') is that there is an expectation that others will not hold the sender to account for the information. The technological implication (if designers wish not to frighten senders from making their information open and honest) is that the organisation's memory should show qualities of propriety, filtering information using such disclaimers prior to retrieval. H2. Technology which is biased towards preemptive closure will cause resentment and will be identified with organisation members who own deadlines.
People often use formalities, formulas, or other means of hedging in order to delay the determination of a message's performative effect as long as possible. Or they often have the opposite intention- to reach a precise meaning as quickly as possible. For
Electronic Argumentation
example, a research student may wish her supervisor to commit himself that her thesis is ready to be examined, whereas the supervisor may prefer to wait. Technology which is insensitive to this tension may be biased one way (towards pre-emptive closure for example). This aspect needs to be subject to user redesign and negotiation. H3. Anonymity makes deception easier. Because the facts about a case are not usually completely known, much has to be taken on trust from information offered by users. It is therefore theoretically possible for users to deliberately offer false information. It may be that deceiving is an important part of human behaviour with beneficial effects, such as spurring human vigilance, rewarding undogmatic viewpoints, keeping some element of sport in interaction, or helping the creative construction of social reality. Usually there is some degree of double standards wherein deceitful behaviour is considered disreputable yet understandable. Deceiving may be a way of coping with de-individuation (loss of identity leading to an antisocial, uninhibited, deregulated state), or it may be a response to de-individuation which reinforces the sense of isolation even further. Certainly, a user who has little to do (low role status), and is anonymous, may experience de-individuation and may. be more than usually tempted to behave deceitfully. If this is so, then relentless monitoring and checking of users' statements may be less appropriate than reducing the anonymity of participation.
H4. Inability to change or widen topic or position leads to cognitive inertia. Cognitive inertia occurs when discussion stays on the same track without any change because members avoid contributing unrelated comments. The opposite problem is that uncontrolled change or widening leads to superficial discussion and dysfunctional cycling. One device might be that the user who abandons a topic pays some sort of price (for example, if it is an argument, then the topic is represented by a main claim, which the abandoner 'loses'). This has the advantage that the more important it is to win a particular, central claim, the more reluctant people will be to leave that topic. But in an asynchronous meeting under those conditions people might not be willing to change topic. Another device would be to be able to defer issues that are currently taking too much time. H5. Technology devotes a higher proportion of time to definitions. Debates are often about what 'the rules' should be or what the facts are. Technology enables users to change rules and facts more easily, and so a higher proportion of time will go on such matters. Devices might be needed to move debate forward, by generalising until an acceptably vague definition is found, or if compromise is impossible, by deferring the definition until later. H6. Technology devotes a higher proportion of time to setting priorities. Argumentation is considered strong according to factors such as its appropriateness, its degree of balance between constructiveness and destructiveness, relevance, simplicity, emphasis on ends rather than means, upbeatness, consistency, and match with the audience's expectations (with regard to change and rationality). The weighting of these criteria depends on the social group and the occasion. Circumstances may change very quickly. Technology makes these processes explicit and this need for the user to be explicit may intrude into concentrating on the task. H7. Technology which ignores emotion in argumentation causes unnatural dialogue. It is the emotional appeal of argumentation that is the most difficult to program, so that these influences would need to be expressed by other means such as allowing users to judge argument strength themselves. Ignoring the problem would
J.A.A. Sillince
lead to very stilted argumentation- users would react by using 'logical' or respectable argumentation as a public expression of their private, emotionally influenced positions (such behaviour happens anyway but may be accentuated by insensitive programming). HS. Technology for dealing with diversionary tricks slows things down. Asynchronous discussion is easier to sabotage in this way (how do you identify the topic of an asynchronous discussion?) in the sense that by the time a person has finished, he may have succeeded in changing the topic of discussion. This can lead to dysfunctional cycling back to the same topic or to omission of vital topics. In synchronous discussion, participants are more likely to notice diversionary tricks, but then there is the problem of attention blocking (new comments not being generated because members constantly having to listen to others), requiting rules which control things such as maximum input time and turn taking. Diversionary behaviour in synchronous argumentation means that time-consuming structuring devices are unfortunately needed (at least, when challenges take place), such as requiting users to say what a comment relates to (a relevance condition), or what a claim is justified by (an evidence condition), or what a comment leads to (an outcome condition). H9. Anonymous argumentation is more helpful in later than in earlier stages of group decision making. Unlike in conventional debates, electronic argumentation enables users to be anonymous. Anonymity of action and status can cause resentment and reduce group cohesion (Tatar et al., 1991). Some group members (dominant personalities, and higher ranking organisation members) have less to gain from anonymity than others. Tuckman (1965) suggested that decision making groups evolve through four stages: (i) development of group norms, (ii) conflict, (iii) development of group cohesion, and (iv) functional performing, where interaction focusses on task and goal accomplishment. There may be some stages (i), (iii) and (iv) in group decision making where being identifiable is important for creating norms, conflict resolution and the forming of group cohesion (Kraemer and Pinsonneault, 1990). Anonymous interaction may be most appropriate in stage (ii) for removing inhibitions about expressing conflict. H10. Meta-comments clarify argumentation at debate time and during later review. Participants often feel the need to explain why they are claiming something, or why they have challenged someone else's claim. An example of the use of meta-level communication tools is Trigg et al. (1986). H l l. The wider the communication bandwidth the easier is it to establish common ground. Establishing common ground enables participants to decide what it is they disagree about. Although high bandwidth does not automatically guarantee effective communication of ideas (Heath and Luff, 1992), purely text-based message passing media has been found to be inferior to media with co-presence, visibility and audibility (Easterbrook et al., 1993). H12. Technology which requires logical rather than rhetorical argumentation will result in unnatural dialogue. Many individual tricks (e.g. straw man, criticising an extension of the opponent's argument) depend upon suggestion, similarity and analogy, or (e.g. bluffing) upon a sketchy representation of meaning, or (e.g. making the opponent's argument seem extreme) upon cultural norms. Their subtlety might mean that either the technology accommodated them but in a time consuming, rigid (Greenberg, 1991) or cumbersome manner (so that the user avoided them), or that they are avoided altogether by the technology. The danger is that if such behaviours
Electronic Argumentation
are not enabled by technology, then people may shun them and opt for the behaviours which technology does make possible- the more explicit, highly structured behaviours. One commercial system (CMSI, 1992) claims to 'expose rhetoric and "hand waving"'. Such exposure may force users to concentrate on explicit knowledge and 'acceptable' communication methods. HI3. Technology which enables anonymous argumentation in small groups minimises the effect of users' evaluation apprehension. If evaluation apprehension (the fear of negative evaluation causing members to withold ideas and comments) is perceivable and can be identified with a named individual, then that individual's arguments will be less persuasive. Protagonists lose face with themselves and their audience when put on the defensive. There is a cultural norm that defensiveness reduces plausibility. So the effect of being put on the defensive or lacking confidence can be reduced by anonymity, and by small size of audience.
H14. Bluffing is related to 'sketchiness' of argumentation. A speaker makes a 'sketchy plan' (Scholtens, 1991) of his case, missing bits out, and leaves it up to his audience to challenge him on whether he has anything to fill the missing bits with. Most argumentation (and conversation) proceeds on the basis of only partially completed scripts, with listeners drawing inferences from cues and experience as to what has been missed out. In argumentation, gaps are credited to the author unless he is challenged. H15. Ability to change speed is related to 'sketchiness' of argumentation. Technology which enabled the use of sketchy scripts or plans would provide the ability for argumentation to speed up or slow down as the user considered appropriate. H16. Where communication bandwidth provided by technology is narrow and communicative power is low there is a risk of a downward spiral to greater conflict. If the receiver is not provided with sufficiently rich cues she may make a misjudgement of the degree of conflict present. The issue is complicated by the fact that technology ot~en increases group polarisation (Easterbrook et al., 1993). Argumentation has several layers, ranging from the upper 'polite' and cooperative layers where the intention is to see the other's point of view or to establish a truth rather than to win (the argumentational habitat of philosophy and science), to the lowest conflictual layer where the intention is to win at any cost (as enemies dragged to the peace table). Argumentation has many g o a l s - winning or influencing beliefs, discovering priorities, illuminating issues, identifying positions, engineering alliances, ventilating conflicts, and meeting deadlines, and these vary according to the level of conflict or cooperation present. It is possible to surprise an opponent by moving down a level (using a stratagem outlawed at a higher level) although this involves sacrificing a degree of trust and becomes a precedent for the opponent to revise judgements of appropriate action. H17. Outlawing fallacies may leave contestants with no justification for their arguments. A universal claim based upon a small amount of evidence or argument ad hominem (attacking a person rather than a policy) are examples of fallacious argumentation. These can probably be identified by technology. But should they? Often very little evidence exists for making a decision, so that flimsy reasoning is all that one has to go on. Standards (of evidence, or of logic) should not be inappropriately stringent. 'Logical' reasoning is only one of the dimensions of high quality discussionover-emphasis on this in the early stages of decision making may inhibit the rapid
J.A.A. Sillince
generation of alternatives. Also standards might be in danger of becoming culturally or organisational biased - for example, outlawing ad hominem arguments is biased in favour of those with organisational prestige and thus with most to lose by personal attacks. CONCLUSION. This discussion has been of a very limited interaction between human behaviour and technological s u p p o r t - systems to support argumentation. The complexity of problems in computer supported cooperative work (CSCW) suggests a narrow focus is sensible- indeed, there is some discussion currently of the merits of narrowly focussed CSCW research (Spurr et al., 1994). Many of the behaviours discussed above suggest that in order for an argumentation support system to function adequately there would either be a need for a highly complex and structured interface, or alternatively, that less emphasis should be placed on programming for such a rich variety of behaviours and considerable weight should be given to continuous self-design by users. There are dangers in both approaches, but an advantage of narrow system definition is that these dangers are more easily spotted. REFERENCES.
Buckingham, Shum Simon, and Nick Hammond, 1994. Argumentation-based design rationale:what use at what cost? International Journal of Human-Computer Studies, 40: 603-652. Conklin, Jeff, and Michael L. Begeman, 1988. glBIS: a hypertext tool for exploratory policy discussion. Transactions on Office Information Systems 6 (4): 303-331. Conklin, Jeff, and K.C.Burgess Yakemovic, 1991. A process-oriented approach to design rationale. Human Computer Interaction 6 (3 & 4): 357-391. CMSI, 1992. CM/1 Product description. Corporate Memory Systems Inc., 8920 Business Park Drive, Austin, TX 78759, USA. Easterbrook, Steve, 1993. CSCW: cooperation or conflict. Berlin: Springer-Verlag. Easterbrook, Steve M., Eevi. E Beck, James. S Goodlet, Lydia Plowman, Mike Sharpies, and Charles. C Wood, 1993. A survey of empirical studies of conflict. In: Steve Easterbrook, CSCW: cooperation or conflict, 1-68. Berlin: Springer-Verlag. Greenberg, Saul, 1991. Computer-supported cooperative work and groupware: an introduction to the special issues. International Journal of Man-Machine Studies 34: 133- 141. Grudin, Jonathan, 1988. Why CSCW applications fail: problems in the design and evaluation of organizational interfaces. In: Lucy Suchman, ed., Proceedings of the Conference on Computer Supported Cooperative Work (CSCW-88), 85-93. New York: ACM. Heath, Christian, and Paul Luff, 1992. Media space and communicative asymmetries: preliminary observations of video-mediated interaction. Human-Computer Interaction 7:315-346. Kraemer, Kenneth L, and Alain Pinsonneault, 1990. Technology and groups: assessment of the empirical research. In: J. Galegher, R. E. Krant and C. Egido, eds., Intellectual teamwork, 375-405. Hillsdale, N.J. :Lawrence Erlbaum.
Electronic Argumentation
Lee, Jintae, 1990. SIBYL: a tool for managing group decision rationale. Proceedings of the Conference on Computer Supported Cooperative Work (CSCW 90). New York: ACM. Markus, M.Lynne, 1992. Asynchronous technologies in small face-to-face groups. Information Technology & People 6 (1): 29-48. Marshall, Catherine C., 1989. Representing the structure of a legal argument. Proceedings of the 2nd International Conference on AI and Law, 121-127. New York. Nunamaker, Jay F., Alan R. Dennis, Joseph S. Valacich, Douglas R. Vogel, and Joey F. George, 1991. Electronic meeting systems to support group work. Communications of the ACM 34 (7): 40-61. Ramesh, Balasubramaniam, and Vasant Dhar, 1992. Supporting systems development by capturing deliberations during requirements engineering. IEEE Transactions on Software Engineering 18 (6): 498-510. Root, Robert W, 1988. Design of a multi-media vehicle for social browsing. In: Lucy Suchman, ed., Proceedings of the Conference on Computer Supported Cooperative Work (CSCW-88, 25-38. New York: ACM. Scholtens, Anneke, 1991. Planning in ordinary conversation. Journal of Pragmatics 16: 31-58. Sillince, John A. A., and Minors Bob H., 1992. Argumentation, self-consistency and multi-dimensional argument strength. Communication and Cognition 25 (4): 325-338. Sillince, John A. A., 1994. Multi agent conflict resolution: a computational framework for an intelligent argumentation program. Knowledge-Based Systems 7 (2): 75-90. Smithin, Tim, and Colin Eden, 1986. Computer decision support for senior managers: encouraging exploration. International Journal of Man-Machine Studies 25: 139-152. Spurr, Kathy, Paul Layzell, Leslie Jennison, and Neil Richards, eds., 1994. Computer support for cooperative work. Chichester: Wiley. Suchman, Lucy, 1994. Do categories have politics?: the language / action perspective reconsidered. Computer Supported Cooperative Work, (CSCW) 2: 177-190. Tatar, Deborah G., Gregg Foster, and Daniel G Bobrow, 1991. Design for conversation: lessons from Cognoter. International Journal of Man-Machine Studies 34 (2): 185-210. Toulmin, Stephen E., 1958. The uses of argument. Cambridge: Cambridge University Press. Trigg, Randall H., Lucy Suchman, and Frank Halasz, 1986. Supporting collaboration in NoteCards. In: D. Peterson, ed., Proceedings of the Conference on Computer Supported Cooperative Work (CSCW-86), Austin, TX, 1-10. New York: ACM. Tuckman, Benjamin W., 1965. Development sequence in small groups. Psychological Bulletin 64: 384-399.
Chapter 24 SHARED UNDERSTANDING OF FACIAL A P P E A R A N C E - WHO ARE THE EXPERTS? Tony Roberts Department of Psychology University of Southampton, UK [email protected]
"A few thousand years ago people of the Fertile Crescent invented the technology of capturing words on flat surfaces using abstract symbols: literacy. The technology of #teracy when first invented, and for thousands of years afterwards, was expensive, tightly controlled, precious. Today it effortlessly, unobtrusively, surrounds us. Look around now: how many objects and surfaces do you see with words on them? Computers in the workplace can be as effortless, and ubiquitous, as that. " (Weiser, 1993). "... with the development of decision support systems, and in particular the appearance of 'expert systems' concern has been growing about the potential for catastrophic errors created by these systems and, worse, the potential for catastrophes whose causes cannot be established." (Fox 1990). The inevitability of developments so keenly anticipated in the first quote is taken for granted by almost all of us. Nor are computers restricted to the workplace, as their presence extends into nearly every other aspect of our lives. The second quote reminds us that, in some situations, our reliance on automation can have its costs; not only in aircraft crashes and exploding nuclear reactors, but also in the hours of down-time thumb-twiddling that add up to millions in lost productivity. In this chapter, I wish to explore a slightly different aspect of what is commonly termed the 'impact' of computers on our lives, that is, the way in which implications of expertise can instil some degree of unjustified faith, and the consequences of this in terms of performance in certain situations. There are strong social pressures upon us to believe that experts, by definition, know better, or at least more about a particular domain than we do. Most of the time this is true, making the term 'expert system' a compelling one to use. The knowledge base that goes to make up an expert system results from our ability to be explicit about the elements in a certain domain, and make clear the contingencies and relationships between them. In this way, a working knowledge of a given domain, e.g., chest pain, can be built in to a computer system which can support the decisions made by a physician dealing with a patient. From our point of view, this doubly reassuring
T. Roberts
combination of a white coat and a hi-tech looking computer is, in most cases, likely to take some of the worry out of putting 'our life in their hands'. Leaving the notion of expertise aside for a moment, there is unquestioned merit in the fact that computer systems enable us to store and access massive amounts of information. We can search databases containing text, pictures, audio and video, either by browsing or by specifying criteria by which the search space may be narrowed. For example, we may have in mind a painting which contains a vase and some large yellow flowers, and wish to search for it in a database. By explicitly specifying the content of the picture in this way, we might quickly arrive at the identity of our target as "Sunflowers" by Van Gogh, together with other paintings that are similar in content. The crux of this process is in making explicit the similarities between our target and our mental representation of it, so that the search space may be reduced to a more manageable size, making identification more likely. A database of human faces would primarily be used in this latter way, i.e., to allow people to communicate something to others about the facial appearance of a person they have seen. In the case of a witness to a crime, it is often considered appropriate to allow the witness to search a database of known criminals' faces. If such a search fails, for whatever reason, other techniques may be used to construct a likeness of the witnessed face that others may use to identify the person in question. Either way, Face Recall Systems (FRS's) work at the level of individual features of the face, often employing sophisticated computerised graphical tools for blending these component features into a coherent whole face. Laugherty & Fowler, (1980) showed that in some circumstances the results of such a procedure are little better than chance, and that interaction with expert sketch artists can be significantly more effective, and there are a multitude of reasons why this should be the case. Not least is the finding of Sergent, (1984) that we perceive the features of the face interdependently rather than independently, which implies that it is not easy to deal with 'similarity' of faces and their features in the manner imposed by some FRS's. A further consideration is the tacit constraints introduced by the use of computers in tasks involving perception of faces. What I wish to explore is the possibility that the artificial nature of this process is compounded by the assumptions of expertise associated with the use of computers as the primary medium for storing, manipulating and displaying facial images. In short, the use of FRS's does not automatically reflect the astonishing ease with which we can recall and recognise thousands of faces in more natural settings. The role of the context is a crucial factor in everyday situations (Memon & Bruce, 1983), yet it has been neglected in the use of FRS's. What is consciously controlled and what is automated in our processing of the information present in images of the human face have crucial implications for our ability to communicate about them. Moreover, both are subject to interference from explicit and implicit aspects of the way the task is set. Here I describe a simple experiment that explores this issue. The experiment examines the relative importance of different facial features in a task where two individuals must arrive at the identity of a target face in a verbal 'question and answer' type setting in which the supposed role of a computer is varied. Before doing so, we need to consider briefly which aspects of performance might vary as a function of this 'interference'. Current models of the face recognition process, e.g., Bruce & Young, (1986) present a logically-structured account of the information processing involved in
Shared Understanding of Facial Appearance
recognising a face. While such models serve as useful accounts of what is common to us all, there may be important individual differences and important variations resulting from other influences, and I wish to suggest that these differences reside in the relative importance of different facial features. Haig, (1986) acknowledged that different faces may have different salient features. Moreover, Ellis, Shepherd & Davies, (1979) showed that individual differences in familiarity with the faces can affect the salience of certain features. Taken together, these findings suggest that features vary in salience in more ways than are constrained simply by anatomy. In effect, there may be other enduring individual differences in the relative importance of different facial features. If we accept the possibility of some kind of enduring individual differences, it is perhaps worth considering why this might be a problem. The consensus that grass is green does not preclude the possibility that green for one person may be a very different sensory experience to seeing green for another. Pragmatically this is of little concern, since we can all identify green, e.g., at traffic lights, just as we can all identify the faces of our friends. The real problem arises when we rely on judgements of similarity of faces, as we do with Identikit and Photofit systems. We frequently encounter situations where one will say "...isn't he just the spitting image of Bill Clinton?", and another will be unable to grasp the likeness. This has its implications for the principles followed by FRS' s, since a consensus of similarity is often all we have to go on. Some anecdotal clues to the issue of what kind of enduring individual differences there might be, can also be found in such conversations. The reasons for similarities, or differences, are often given featurewise: ". ...... no - Bill has a bigger nose .... " etc., while for the first observer the nose is clearly good enough. Featurewise comments are common, and can indeed be the first words heard by the new-born infant who "...has her father' s eyes". Perhaps then, at least as far as what we can communicate verbally, it is the relative importance of different facial features, that varies between individuals. Some people may attribute similarity between two faces to the eyes more than the nose, others vice versa. Hence, this experiment addresses the question of what we can communicate about the features responsible for facial appearance by using an array of pictures of unfamiliar faces in a task analogous to the guessing game 'Twenty Questions'. Participants may ask the experimenter questions about a set of faces in order to discover the identity of a target face in an array of many other faces. On the assumption that they are motivated to succeed, it is expected that records of their questions may be analysed to highlight important aspects or dichotomies in the perception of facial appearance, at least those which can be articulated. An important assumption here is that there are some natural categories of faces, e.g., male/female that are in common use at the linguistic level. Once these have been exhausted in the present procedure, around which features will participants be able to articulate other, less frequently verbalised categories? Pilot work has suggested that, in this initially rather crude sounding procedure, there is a direct but subtle relationship between the 'power' of a question and the degree of shared meaning a participant can assume, e.g., I know that you know what a male face is, but I could not be sure that your idea of an honest face is the same as
T. Roberts
mine. The more I can assume that you will know what I mean, the more faces I can eliminate with a single question. This is the way to play the game. The second aim is to explore the possibility that the relative salience of different facial features is as unstable in experimental situations (and thus in forensic ones) as it appears to be in everyday situations. This suggestion has arisen out of a number of conversations with Police artists, from which a disquieting theme emerged about the use of computer technology in face recall tasks. The following quote sums up the issue nicely: "Sometimes they'll come in here with a clear picture [ o f the criminal] and when they see all this [computer] gear they go blank ....... almost as if we can use it to .... just pull it [the face] out o f them by magic. They don't seem to realise that no matter how clever this stuff is, we can only go on what they can tell us."
(Police artist, personal communication, June, 1995) This was not an isolated comment, hence I wish to tentatively explore the implications of having a computer control some aspects of the way we communicate about faces. To this end, the participants' perception of the role of the computer was manipulated between groups by varying the way the computer was described in the instructions given to participants. One group (CONTROL) were told that the target face had been selected at random by the experimenter, the other (EXPERT) that the face had been selected for the experimenter by the computer. This latter group were also told that 'the research was part of the development of an expert system for face recognition". All other aspects of the task were held constant. Forty students taken from the University of Southampton took part in the experiment; twenty per group. From each participant a record of each feature-related question was taken for comparison of the two groups. The faces used were all monochrome, head & shoulders portraits taken in full-face view. The faces shown below illustrate the kind of views used: . ..:.:...:.:... 9
""" ....
.,.,..,.,... ..........
" :".:.
. . . . ................................... . .....................
An array of twenty faces selected at random from a pool of seventy appeared in a randomly-determined position in the array. Prior to the experiment, the participants had been given a detailed description of the nature of the experiment; its aims and its procedure. They were then seated in an experimental cubicle and shown examples of the faces to be used as stimuli. The experimenter explained that one of the persons depicted in the array had been selected, either by the computer or by the experimenter, as the "target" to be identified by the participants' questions, and that the faces of those people who were eliminated could be covered if so desired. Subjects were encouraged to find the target face as quickly as
Shared Understanding of Facial Appearance
possible, though no time limit was imposed. The task took each subject around 10 minutes. A record of their questions was kept by the experimenter for subsequent analysis. The great diversity of questions asked reflects the great diversity of ways in which human faces can differ from each other. Participants were able to reduce the set size dramatically with initial global questions about, say, the sex of the target face. This frequently left faces which could be distinguished between on the basis of local trivial features such as skin blemishes or spectacle frames. This was expected because of the open nature of the procedure, i.e., the fact that participants were not guided in any way to ask questions of one kind or another. Because of this diversity, and because of the fact that some of the questions were of little interest for the purposes of this study (despite their effectiveness from the participants' point of view) what is reported here is a greatly simplified account of the questions asked. For each group, the total number of questions asked, and the number of feature-related questions per condition are shown in the table below.
Taking each question as an independent occurrence, an overall Chi Square test was performed on the raw frequency of the five overall most frequently mentioned features in each of the experimental conditions. This was to establish whether any of these five features were mentioned to a significantly greater extent than any others. This showed that there was no significant association of either condition with questions about any particular feature (Chi-square = 5.9, DF. = 8, p>0.1). Nevertheless, it should be noted that in the first condition the distribution of questions relating to the eyes, nose and mouth is broadly consistent with the findings of other researchers, e.g., Roberts & Bruce (1988). The eyes and nose appear to be relatively more useful regions for forming categorical distinctions within a given set of faces. However, in the second 'expert system' condition, where participants were told that the computer had selected the target face for the experimenter, there seems to be no preference whatsoever for one feature or another. Considering the total number of questions asked, a one-way ANOVA showed that significantly more questions were asked by the 'expert' group (F = 22, DF. = 1,38, p<0.01). To consider why this might be so, I resort to qualitative evidence. Participants clearly believed that the computer was doing, or was supposed to be doing, more than simply displaying the initial twenty faces. In reality, it selected twenty out of seventy
T. Roberts
faces, at random, then selected one of those faces, again, randomly. Beyond this, the computer merely displayed the faces. What seems to have been the effect of augmenting the supposed role of the computer in the proceedings, even in this trivial way, was that questions of a categorical nature, i.e., those which might eliminate more than a single face from the target set, were used less frequently. This suggests that in such a task the only perceived virtue of visually-derived categories rests with the extent to which a consensus with others can be achieved through verbal means. The introduction of a computer into this delicately-balanced interaction appears to disrupt it. To speculate further, about either the mechanisms underlying this disruption or the extent to which these results might generalise to other situations, without further empirical evidence would be unwise. However, it was clear that many of the participants in the second condition behaved as though they were taking some kind of test in which the use of parsimonious and verbally-mediated categories was displaced by questions that had a high chance of eliminating one face at a time, e.g., 'Does he have a spot above his fight eye?". In short, the latter participants behaved in a significantly less discerning way because of the implied presence of something 'expert' 1 The issue appears to be one of who or what the participants were negotiating with in this task. Considering the extent to which we expect people and computers to work together, this deserves further exploration. More research is needed if we are to understand the tacit signals given by the involvement of a third party, namely an 'expert' computer, in tasks involving face perception, or any other judgements which are primarily automatic and/or non-verbal. Forcing participants to proceed by verbally encoding what is essentially visual, appears to introduce a diversity of complications that are worthy of note 2 , not only to those who wish to improve elicitation techniques in forensic situations, but to others involved in the automation of tasks that previously have relied upon shared meaning. Perhaps these participants felt they had less recourse to categorical terms for discriminating faces, as these are shared solely between people. The further insinuation of an active role for computer technology in such situations appears to introduce a greater abstraction from the natural pragmatic aspects of the task, and thus a greater deviation from the relative salience of different facial features for discriminating between faces, and ultimately, for identifying them. REFERENCES
Bruce Vicky, & Young Andrew, 1986. Understanding face recognition. British Journal of Psychology 77: 305-327. Fox, J. 1990. Automating assistance for safety critical decisions. Phil. Trans. R. Soc. Lond. B 327, 555-567. The pragmatic impact of labels such as 'expert system', as well as other instances of messages hidden in computer applications, on the end user performance was anticipated, but not explored empirically, by Gorayskaand Cox 1992 & 1994. 2 For a discussion of the usefulness of 'mutual knowledge' in communication, in particular of the role of verbal exchanges of information already available in cognitive environments from sources other then language, see, for example, Gorayskaand Marsh 1994.
Shared Understanding of Facial Appearance
Gorayska, Barbara, and Kevin Cox, 1992. Expert Systems as extensions of the human mind: A user oriented, holistic approach to the design of multiple reasoning system environments and interfaces. M & Society 6:245-262. Gorayska, Barbara, and Kevin Cox, 1994. Hidden meanings in computer applications. Presented at the 13th World Congress IFIP'94, Hamburg, 28 August - 9 September, 1994. Not in print. Gorayska, Barbara, and Jonathon Marsh, 1994. On mutual knowledge, meta puzzles, and wise men. AISB [Artificial Intelligence and Simulated Behaviour] Quarterly 88:5-7. Haig, N. D., 1984. The effect of feature displacement on face recognition. Perception 13: 505-512. Laughery, K. R., and R. H. Fowler, 1980. Sketch artist and Identikit procedures for recalling faces. Journal of Applied Psychology 65:307-316. Memon, A. and Vicky Bruce, 1983. The effects of encoding strategy and context change on face recognition. Human Learning 2:313-326. Roberts, Anthony D., and Vicky Bruce, 1988. Feature saliency in judging the sex and familiarity of faces. Perception 17 (4): 475-481. Sergent, J., 1984. An investigation into component and configural processes underlying face recognition. British Journal of Psychology 75:221-242. Weiser, M., 1991 The Computer for the Twenty-First Century. Scientific American. September 1991: 94-104.
Chapter 25 INTERACTIVE COGNITION: EXPLORING THE POTENTIAL OF ELECTRONIC QUOTE/COMMENTING Stevan Harnad Department of Psychology University of Southampton, UK harnad@ecs,
LANGUAGE, TECHNOLOGY, AND INTERACTIVE COMMUNICATION Human cognition is not an island unto itself. As a species, we are not Leibnizian Monads independently engaging in clear, Cartesian thinking. Our minds interact. That's surely why our species has language. And this interactivity probably constrains both what and how we think. Although Wittgenstein's (1953) argument that there could be no "private language"--because language is based on rule-following and rules are shared social conventions--is probably overstated and refutable, for present purposes it is valid enough: language is the main medium of interaction of our species and it is fundamentally interactive, dialogical. It did not evolve to leave us lost in solipsistic thought. In terms of the time we spend doing it, conversing probably exceeds all other forms of human interaction, including feeding, fighting, playing, mating, and the "grooming" that some have argued it has evolved to replace (Dunbar 1993). The origins of language have been the subject of much speculation (Harnad et al. 1976), but perhaps a few things can be said about it with some confidence. Language began hundreds of thousands of years ago, and whether it started as gesture and then moved to speech (Steklis and Harnad 1976), or went straight into speech, the kind of adaptation it was, was undeniably an interactive one: speech accordingly has a characteristic real-time dyadic tempo. There is certainly some variation in the rates at which people speak, the optimal speaking rates they understand, their attention spans, their memories, etc., but, to a close enough approximation, the timing parameters of a contemporary TV chat show are probably representative of our species since very near the advent of language. There are consequences of this: speak too fast or too slow, and I won't be able to understand you. A subtler consequence (having to do with the memory and attentionspan factor) is that if you speak for too long, I'll have trouble understanding too, and
S. Harnad
I'll only be able to respond to what you said near the beginning of your speech, or near the end, or to some selected portions that caught my attention in the middle. Chances are that the adaptive value of language in the original environment in which it evolved 1 derived from relatively rapid exchanges of relative short strings of information, again more like a conversation or a chat show amongst a few interlocutors than a long lecture by one orator to a throng--that came too, but it came later. Although the adaptive scenarios people have proposed are without exception mere speculations, they are all variants on the idea that the utility of language must have been connected with its use in hunting, tribal defense, tool-making, and/or training others (especially the young) in these or other essential hominid survival skills. With the exception of pedagogy (which was probably a later development), it's hard to imagine these uses of language as consisting of long monologues: relatively short interactive comment and response were probably the order of the day, performed at about the speeds we perform them today. But even if primal conversations were onesided rather than interactive, a rate-limiting factor was how fast we could speak and understand, and how big a chunk we could remember long enough for it to have any useful effect. So it is likely that because of real-time constraints on articulatory rate, the speed of thought co-evolved with the speed of speech; their rates converged on roughly the same order of magnitude (though one hopes we thought a bit faster than we spoke) and were in phase. And that's still the way things stand now, biologically speaking, for, after the advent of language, the rest of the developments in the linguistic arena were technological (and cultural) rather than biological--feats of "cognitive technology", if you like: We invented the new medium of writing, so words, and the thoughts they conveyed, could then be transmitted beyond the reach of any individual human's voice, ears or memory. (The oral tradition had done this in part, but imperfectly, and only through the mediation of a vocal internuncio.) Then we invented the medium of printing, so words and thoughts could be transmitted beyond the reach of any individual human's pen or paper. Writing and print were not only ingenious ways of preserving and distributing thoughts, but they also freed us from many of the immediate constraints of memory and attention span, because written words could be read and re-read, allowing messages to be longer and more complex than anything one could hope to convey orally. They could also be written and rewritten, allowing messages to be more careful and disciplined than anything the rapid pace of spontaneous conversation could ever generate. But this lapidary property of the written word was purchased at a price: the interactivity of speech was gone, or at least it was slowed down to a pace that hardly seemed worthy of the word "interactive" at all---considering the speed, commensurate with speech, of which human thought had already proved itself capable in the oral era. So literality did, in a sense, make us more like monads conducting monologues. To be sure, we were writing letters to one another, and replying, sometimes on the same day, but it was rather like what had formerly been a jig, danced together, turned instead into a sarabande, danced in lugubrious alternation, or, to pick a more cognitive example, a 1 In modern evolutionary biology this has come to be called the "Environment of Evolutionary Adaptedness" or EEA, to distinguish it from the present-day environment, in which an early adaptation may no longer be playing its original functional role, or could even have become maladaptive.
Interactive Cognition
long-distance chess game in which the players make only one or two moves a day, and spend the rest of the time waiting to learn their opponent's response: there was something profoundly out-of-phase about it, or rather about the thoughts behind it, which in real-time dialogue would have interdigitated instead of proceeding in fits and starts. The chess analogy is instructive, because, unlike a conversation, a chess game often involves long periods of motionless thought, and being rushed makes one play less well. Yet the game is interactive, and if it were to be played with limitless time between moves, it would no longer be the same game (and would perhaps no longer draw on the same cognitive capacities). Slow motion tennis would be even more obviously a different game. These analogies are imperfect, but the point I do want to make is that in written dialogue as well as in slow-motion chess, apart from the extra time one is happy to take in order to reflect more, there is a great deal of dead time too, in which one's thoughts are idling, waiting for the other shoe to drop. Nor is the limitless reflection time an unmixed blessing in itself: necessity is the mother of invention. What would become of the spontaneous wit of a brilliant salon conversationalist if each item of repartee could be put on limitless hold for pondering before transmission? Would dancing ability (in an age when dancing was still interactive) or tennis prowess be the same if each move and each shot could be preceded by hours of deliberation? Again, chess, being cognitive, is the most instructive case: in principle, given infinite time, every possible move could be tried in advance, and hence the optimal one could be picked. But trying every possibility is not usually the way cognition goes, and certainly not what we regard as "creative" cognition. The cognitive "moves" we regard as brilliant are not the ones that are a result of mechanically going through all the possibilities, but the ones that somehow find a pattern latent in all the dreary combinatorics, a pattern that swit'tly and directly generates a solution to a problem that looked hard until the pattern was discovered. The human mind occasionally discovers such patterns. It's impossible to quantify thismto say how occasionally this happens, or how improbable and consequential the patterns really are. Sometimes we discover things through long, noninteractive reflection, to be sure. But, given the evolutionary history and temporal parameters of language and thought, it is probably safe to say that it is in real-time cognitive interactions between minds that the resourcefulness of human cognition is most firmly engaged. This paper is by no means intended as a polemic for a return to an oral culture, however! The power gained from the discipline of slowing thought down to the pace of writing, and preserving it verbatim, answerable for its validity not merely to the persuasive force of one orator on one occasion, but the endless scrutiny of peers and posterity, was probably almost as revolutionary a technological and cultural advance for human thought as the advent of language itself had been. But let us not forget that in exchange for those virtues of the lapidary medium (writing) we sacrificed some of the virtues of the labile one (speaking), particularly the possibility of minds interacting at the speed of thought. Could one have the best of both worlds?
S. Harnad
Not in the medium of speech, which vanishes as it is uttered. Recording it is no help, because what one really needs is playback and editing capacity, for both one's own utterances and one's interlocutor's. And by whatever cognitive technological means one might secure this--whether the playback/editing is in the phonological medium or the graphemic oneDthe need to do two things rather than one (i.e., not just to listen and speak to one's interlocutor, but to monitor and modify the record of what has been said by both parties) rules out what might have seemed to be the ideal solution, namely, real-time interaction by writing. It's not just the slow speed of writing that is the problem; even if a speechrecognizer could generate error-free graphemes as fast as we could talk, and even if we could read these as fast as we can hear, this would still leave us back where we were with spontaneous conversation: we would merely have an instant transcript, but no more opportunity for reflection. It is in part for these reasons that--except for quick, urgent messagesDmost people find the real-time Unix "talk" facility so unsatisfying. It's not just the frustration of watching someone else's slow typing; but one feels that if this was just a chat, we could just as well have talked by phone, and if careful reading and serious reflection were called for, off-line email would have been better. So is email just a somewhat accelerated form of ordinary mail (which, as I said, even in the past sometimes had same-day turnaround)? And are we irretrievably severed by the written medium from the interactivity of the real-time dialogue for which our minds are biologically adapted? I think not. Although fast paperless mail was what email may originally have been intended for, it has turned out to have some unexpected consequences, opening up some revolutionary possibilities. First, let us not under-rate the speed factor. In principle, for a message of just about any length, it can reach my interlocutor the instant I complete it. Second, it can at the same instant be branched to multiple interlocutors (in principle, to everyone). These two factors of speed and scale are without precedent, but they are still noninteractive ones, insofar as the speed of thought is concerned. But that is not all. Instantaneous and flexible text-capturing, quoting and commenting capabilities allow a form of highly focused and selective off-line interaction with the text that does engage the real-time speed of thought, and engages it interactively, yet in the lapidary medium, and with precisely the playback/editing facilities that were missing in real-time dialogue. Recall the memory and attention-span constraint on the length of a particular utterance in oral dialogue: run on too long, and your interlocutor will lose continuity, forget, and misinterpret. One is tempted to say (to a long-winded interlocutor): why don't you just write me a letter? Long-winded conversations, if they do not turn into one-sided monologues (which are afortiori noninteractive), are more likely to be divergent duo-monologues, each interlocutor in turn launching off from some point in the primacy/recency memory curve for the foregoing peroration they have just endured impatiently waiting their turn, rather than convergent dialogues, which require each interlocutor to be relatively brief and to the point--so it can be ensured that it is the same point they are both addressing. Well, in the quote/comment capability that email has made possible, a long-winded passage can be given full attention (if it deserves it), it is preserved verbatim, free of memory constraints, to be re-read as often as one wishes, and, most important of all, it
Interactive Cognition
can be selectively edited down to the specific points one decides to address in replying, and the reply can then be focused on those passages, quoting them so as to provide the full requisite context. One' s own reply, too, has the benefit of the playback and editing capability, and can be written and rewritten till one feels one has gotten it right. Moreover (and it is for this reason that I have dubbed this form of interaction "skywriting"; Harnad 1990), in engaging in this form of real-time cognitive interactivity with electronic text, one can also keep in mind that it is not only one's interlocutor who will see one's quotations and comments, but all the others to whom the message was branched. This is like the benefit of a trial by jury without the realtime pressure and stage fright of oral testimony; or like a public debate conducted in writing; or like a symposium and discussion likewise conducted in writing; but writing in a new key: at electronic speed and scale, and with the powerful playback/editing capability just described. I realise that these resources are quite familiar to all of you, but sometimes a thing has to be "made strange," as Schopenhauer termed it, and looked at as if for the first time, if one is to see its true properties and potential, particularly if it is something relatively new that has become a familiar commonplace too quickly, as email has done. The text-capturing quote/comment tools and conventions that have rapidly developed in the past decade were not the work of cognitive technologists, experimenting with and optimising emerging interactive resources. They were simply co-invented out of expediency by emailers. I'm not sure where the ">" convention for setting off quoted passages started, but it quickly becomes unworkable with multiple levels of quoted text, even when supplemented by preceding the ">" with each interlocutors initials. It was probably born on Usenet and carried over to Unix mailers, or vice versa. It is certainly not a successful piece of cognitive technology, yet it is an absolutely remarkable capability that deserves to be closely analysed and developed, because it is the means by which the best of both worlds--labile speech and lapidary writingmcan be realised. But before closing with a description of the kinds of studies that we will be doing on text-capturing at Southampton University in the next few years, I would like to motivate them with two anecdotes from my own experience that I suspect will resonate with experiences many of you have had too. The first anecdote concerns my own first exposure to "skywriting". In 1980, there had appeared in Behavioral and Brain Sciences (BBS), the journal I edit, an extremely controversial critique of Artificial Intelligence (AI) by the philosopher John Searle called the "Chinese Room Argument." Most people, including me, thought the Argument was wrong; after umpiring several years of critical commentary on it in BBS, I was writing my own critique of it in the mid '80s when it was drawn to my attention that on ""--the Usenet chat group devoted to discussions of A I m a discussion of Searle's Argument had been going on for several years. I tuned in to see what had been said, and perhaps add my own critical voice to the throng, but quickly discovered two things: first, the critics were mostly not cognitive scientists but computer programmers and students. Second, all of their Counterarguments to Searle were wrong. Now I myself considered Searle's Argument to be wrong at that time, but before posting my own critique I wanted to dispel the clouds of invalid arguments that were in the air, so I took them on, one by one (though often they were just variants of the same wrong reasoning or assumptions or conclusions), and I consciously did so as if my contributions were formal commentaries in a learned journal (even if the postings
S. Harnad
I was commenting on had not been that scrupulous), except that I adopted the Usenet quote/comment convention. The results were quite remarkable. The archive of my own discussion quickly reached booklength. I spent countless hours on the Net every day, taking on all comers, patiently replying to different variants of the same bad arguments a different way each time, so it would not bore but inform the silent majority I assumed were following all this. (I still have no idea how many there were; though Usenet's Arbitron statistics estimate the total readership of each group, one does not know how many of themmand whomare following a particular discussion "thread".) And though it no doubt had obsessive-compulsive features, I don't regret the time I spent at it at all. Necessity is indeed the Mother of Invention, and in the course of that Skywriting Tournament, defending Searle against invalid criticism, I came up with some positive ideas of my own, including what the real problem underlying Searle's critique of AI was (the "Symbol Grounding Problem"mits name was first coined as the threads of the Searle discussion, and it has since become one of my more important papers; Harnad 1990a) as well as a potential solution (Harnad 1992). It is unlikely that I would have come up with those ideas otherwise. The essential features were the real-time interactivity, the quote/comment capability, and the long series of determined interlocutors. It has since occurred to me that the exercise might have been even more fruitful if my adversaries had not just been students and programmers, but the best thinkers in the fieldmand that impelled me to start a refereed electronic journal and to become a polemicist for electronic serial publication, but that is another story. What's relevant here is that, even with that less than optimal demography, the interaction proved to be such a powerful idea-generator for me. Nor was there any doubt in my mind that the quote/comment feature was at the heart of it. Which brings me to my second anecdote, mercifully shorter than the first. It was around that time that I noticed that Ima compulsive reprint/preprint collector since the 70's~completely lost my taste for paper texts. If someone sent me a paper reprint, I would email them to ask if they didn't perchance have an email version. Why? Not because I find the current generation of VDU's any more appetising to look at than you do, but because of the quote/comment capability. I had become addicted to it as a way o f interacting with text, irrespective of whether it came from a "live" posting or a "dead" text: either way it was alive for me. The technique was the same. Read it on-screen, save a back-up full copy, then start selectively deleting the irrelevant or uncontested passages, leaving only the skeleton of what I wanted to address in my "reply". But who was this reply for? Well, in some cases I could think of it as being for the author, but usually that was not enough for inspiration. So I set up some discussion groups involving multiple minds, all interested in the topic under discussion. (A population of Skyreaders is essential to the inspirational power of Skywriting.) And as with the symbol grounding discussion, ot~en, not always, but often enough, the interaction would generate the germ of my next published article. I don't want to overstate the case, since it is based on anecdotal experience-though I'll bet that many of my readers have had similar experiences, of ideas being generated through the electronic interaction with text, so let me close by describing a series of studies that will try to examine systematically the effects of the unique form of cognitive interaction that electronic skywriting--and the quote/comment feature in particular~has made possible.
Interactive Cognition
EXPERIMENTAL STUDIES OF TEXT CAPTURING Electronic networks have made two new forms of interactivity possible: one is between people and people (electronic mail, individual and multiple, Harnad 1991; Lea 1992; Sproull & Kiesler 1991) and the other is between people and texts (electronic texts and hypertexts, Ginsparg 1994, Romiszowski 1990; McKnight et al. 1991, 1993). These new interactive resources are already becoming widespread, but some of the unprecedented capabilities they have spawned (Harnad 1990, 1992; Berge & Collins 1994) now need to be systematically analysed and optimised from a cognitive technological standpoint (Hall 1994, Hutchings et al. 1993; Benabas & Todd 1993; Diederich et al. 1992) so the findings can be applied to the acquisition, generation, and sharing of information in research, education and commerce. What needs to be analysed experimentally is text-capturing, which I consider to be one of the most powerful of these new interactive capabilities, in order to evaluate and develop, through cognitive technology, its potential contribution to knowledge acquisition and generation. Here are some of the kinds of studies that need to be done; some are underway at the Cognitive Sciences Centre at Southampton University. P e r s o n - t o - person interactions
One-to-one Electronic Mail (Email) Interactions.
The power and value of email in the transmission of messages between people is well known (D'Sousa 1992; Golden et al. 1992; Sproull & Kiesler 1991; Carley & Wendt 1991). What seems to have gone unnoticed is the subtle but radical change that has evolved in the form of that communication, as discussed above. In an electronic message, it is possible to capture the text of the message to which one is responding and to re-quote it by responding in the form of point-for-point quotation and commentary. In speech, this is clearly impossible, as one forgets what one has heard too quickly to be able to requote it verbatim and then comment on it. As a consequence, verbal conversations may wander off the topic and both parties may feel that they have not been fully understood or accurately represented. In writing on paper, on the other hand, it takes much too long to retype what one's interlocutor has said, so again communication may wander from the point and discussion can be to varying degrees at cross-purposes. Adding to this divergent tendency in written interaction is the long turn-around time between written messages. Email makes written interactions much more focused and convergent, both by (1) scaling up their speed to something that is potentially almost as fast as an oral conversation, yet without forfeiting the discipline and permanence of writing (Hamad 1990; Bolter 1991; Horowitz & Samuels 1987), and (2) by making it possible, through quotation and requotation, for the interaction to converge ever more intensively on what is actually being said, rather than allowing the interlocutors to diverge in their own respective directions (Sein & Robey 1991). This novel quote/comment mode of interaction (henceforth Q/C) has not yet been objectively studied or systematically compared with standard modes of communication, oral and written, yet there are indications that it has opened the door to remarkable new possibilities, both in the acquisition of knowledge and in the development of ideas (Harnad 1991; Heseltine 1994; Holden et al. 1993).
S. Harnad
The first objective of our research on interactive cognition is accordingly to do a systematic comparison between electronic communication with and without the Q/C capability. Study 1
Participants will be one-hundred-and-forty undergraduate students, but Study 1 will be conducted outside the context of their normal studies and their major disciplines. Eighty students will be given group-based instruction by a professional instructor on a self-contained piece of nontechnical subject matter. Twenty of them (all group assignment will be random) will then take a brief essay and multiple-choice examination to test for mastery of the subject matter. The other sixty students will each be assigned one of sixty (non-instructed) student "tutees" to whom each must teach the subject matter. Of the sixty tutors, twenty will teach it orally, within a specified period of time (to be chosen on the basis of pilot studies), including time for questions and answers. Then both the tutors and the tutees will take the test. Forty remaining tutors will teach the subject by email. (All messages will be archived for analysis.) In the e-mail group, twenty pairs will have no Q/C capability. Instructions, comments, questions and answers will be exchanged directly, as in ordinary written correspondence. The number of iterations will be chosen on the basis of pilot studies. At~er a specified number of iterations (to be chosen on the basis of pilot studies) both tutors and tutees take the test. The remaining twenty pairs will be given the Q/C capability and will be instructed that all messages must take the form of comments on the fully requoted text of each preceding message. New text (in the form of questions, answers or comments) must be appended aider each paragraph or part of a paragraph, after whichever passage the new text addresses. Some parts may be left uncommented, but the text in its entirety must be quoted. In requoting requotations, they will have the option of deleting prior requotations and requoting only the new passages from the immediately preceding iteration (but they will be able to retain several levels of embedded requotations if they wish). The same number of iterations will be allowed as in the non-quoting email group; then both tutors and tutees take the test. Test performance will be compared for all seven groups. The prediction is superior performance for the quoting tutees over the nonquoting tutees. Higher scores are also predicted for tutors in general over those who take the test aider professional instruction only. The oral vs. email comparison will be of interest, although there will be no strict way to equate oral instruction time with number of writtten iterations. In the data analysis, tutors' scores can be used as a covariate in the analysis of their respective tutees' scores, to reduce the effect of differences in level of tutors' comprehension of the subject matter. It is expected that Q/C will result in superior performance, but it is unlikely that the task of students teaching other students newly aquired subject matter will be the central one for the Q/C capability. A series of further variables with which textcapturing is likely to interact will be tested as well. Each study is designed to be informative in its own right, and not only if the results go in one direction. Study 1 is a comparison of increasing degrees of interactivity in knowledge acquisition, from receiving passive oral instruction, to actively giving oral instruction, to giving and
Interactive Cognition
receiving written instruction, to full Q/C interactivity. Apart from the performance measures, the archived exchanges will be analysed to determine how Q/C capability influences the content of the interaction. Measures of relevance (Sperber & Wilson 1986) will be used to quantify the degree of divergence/convergence under the different experimental conditions. Beginning with the pilot experiments for this study, the techniques used for Q/C and the format in which the quotes and comments appear will be adapted to the task and to the ease and preference of the users (Cavalier 1992; Benbasat & Todd 1993; Hjalmarsson et al. 1989). There is no evidence that the most prevalent convention that has involved on the Internet--left adjusted ">" preceding every line of direct quotation, usually inserted automatically by the mail program, and then modifiable only by invoking a full-text mail editormwas selected because it was the optimal one, or even a good one. It seems to have emerged and spread spontaneously. Yet the optimal design for capturing and commenting on text is an important cognitive technology question. After each study, the participants will be interviewed for their recommendations about what features would have made the quoting/commenting easier and more useful to them. Each successive study, apart from its immediate experimental objective, will incorporate provisional improvements in Q/C display and control, but testing these formally must await the normative results of the basic studies on the cognitive contribution of Q/C in knowledge acquisition because the variance of the design factors is not likely to be of the same order of magnitude as that of the task and the Q/C variables.
Study 2 A variant of the design of Study 1 will be implemented in the debating context, as opposed to the instructional context, because the Q/C feature is likely to be especially powerful in the development of arguments and counterarguments. In a pilot study, two groups of students will be prepared with arguments and evidence for defending a thesis (12 students) and an antithesis (12 students), respectively. Three pairs will then engage in an oral debate for a specified time period; their performance will then be rated by a panel of 6 experienced judges from the University Debating Society. The other nine pairs will do a specified number of iterations of written debate. Half of them will have Q/C capability in their interactions, half not. There will be three subgroups of 3 pairs each: (1) Nonquoters (N) vs. Nonquoters, (2) Quoters (Q) vs. Quoters, and (3) Qs vs. Ns. As in Study 1, there will be an oral (O) debate control group. The performance of all the written groups will be scored by the same six judges, who will not be informed of the hypotheses being tested, but only that we would like them to rank order all 24 debaters, based on how well they defended their respective theses. Participants will be recruited from the Debating Society of The University of Southampton. The numbers suggested are provisional, as it is not clear whether 24 individuals are too many to be rank-ordered. Pilot studies may show that a better design would be to have only one pair in each category, for a total of 4 pairs and eight individuals to rate. This could then be repeated
S. Harnad
for three or more groups, with the same judges. The number of written iterations will also have to be calibrated on the basis of piloting. The prediction is that quoters will receive higher ratings than nonquoters overall, particularly when pitted against nonquoters. It will be of interest (though no prediction is made) to know how written (Q, N) vs. oral (O) debaters are ranked overall. Participants themselves will be asked to indicate, before knowing the verdict of the judges, whether they felt they prevailed or lost. (A future variant within-subject design could use the same debaters, with different topics, performing in all five roles (O vs. O, Q vs. Q, N vs. N, Q vs. N and N vs. Q). Then judges could rate the same individual's performance (and the each of the individuals could rate his own performance) under each of the conditions. The two experiments described so far, on the contribution of Q/C in the instructional and the debating context, begin to create a paradigm for evaluating objectively what this author and others are already informally trying out in the context of actual instruction, between real tutors and tutees (e.g., Hutchings et al. 1993; Brailsford & Davies 1995; Mayes & Neilson 1994; Rogers 1989; Poling 1994, Grigas 1994; Bailey & Cotlar 1994). Untested practical implementations have the disadvantage of not containing any objective basis for comparison, hence no way of validating the impression that something has been improved upon. As mentioned earlier, the experimental context also provides a test-bed for working to improve the cognitive technology of the features used. The experimental studies, and especially the pilot studies to calibrate the experimental parameters, will also be used to try out different methods for Q/C display, in search of the most effective one. Mouse-based methods, with automatic reformatting, indentation, type-face change, and quoteridentification will be investigated. The Media Research Group at Southampton University under Professor Wendy Hall will be contributing its expertise in these and other design aspects of the Project.
Many-to-Many Email ("Skywriting") Interactions. One-on-one interactions are only the first manifestation of the power of email interaction and Q/C. Reciprocal email ("skywriting," Harnad 1990), in which all contributors receive everything that is sent, represent a truly revolutionary new form of communication (Harnad 1991; Harnad et al. 1976) whose potential still remains completely unexplored form a cognitive technological standpoint, although it has already been informally and practically explored on the Internet in the form of thousands of discussion lists on Listserv and Newsgroups on Usenet. Here too, Q/C is in wide use, but its contribution has not yet been objectively investigated; nor has the optimal format for the Q/C and display been analysed. Study 3 follows lines similar to the two Studies of 1-1 email, except that the discussion will be multiple and hence the degree of interactivity will be greater. Study 3 In a design similar to that of Study 1, eight modular topics will be taught to a class of 60 students under four conditions: (1) Two topics will be taught by ordinary oral instruction followed by ad libitum oral discussion in a separate tutorial session (10 students in each tutorial group).
Interactive Cognition
(2) Two topics will be taught by ordinary oral instruction; then each student will be required to contribute a specified number of oral questions, answers, or comments during the tutorial session. (3) Two topics will be taught only via email, the written instruction being transmitted to an alias list that reaches all the students in the course and is then immediately archived and made available on the World Wide Web in Hypermail (Belew & Rentzepis 1990) format for re-reading at will. Multiple email questions from students, and replies by both instructor and students, will also be transmitted by email to all, and archived on the Web as Hypermail. Students will be required to post a specified number of questions, answers, or comments by email, but without Q/C capability. (4) Two of the topics will only be taught via email sent to the class alias list and archived as Hypermail. Again, the students will be required to post a specified number of questions, answers, or comments by email, but with the Q/C format mandatory, as in Study 1. The measure of performance will be quizzes for each module, as well as the students' subjective ratings and comments on the four conditions. The order of the conditions will not be random in this study, but 1 - 4 successively three times. Here too, format optimisation questions arise, because many different potential quoters must now be uniquely and clearly identified in their respective quotes; scheduling problems also arise, because contributions can get out of phase, and someone might be quoting and responding to an earlier iteration while someone else is already responding to a later one. There are many variables in this new form of interaction that need to be analysed from the cognitive engineering standpoint. One potential solution is to archive all iterations consecutively in a hypermail archive, so that all participants can readily review the prior iterations of the interaction. Hypermail as currently implemented is not optimised for any of this, but adaptations will be developed and tested in the context of this systematic analysis, again in collaboration with the Media Research Group. Person-to-Text Interactions
The hybrid use of Skywriting and Hypermail Archives on the World Wide Web probably represents the most advanced version of multiple person-to-person interaction currently, but the interactive power of Q/C does not stop with "live" email, 1-1 or multiple, for there is still the vast body of"inert" texts (i.e., papers, articles and books) that can also be read (and written) using this new form of interactivity. These texts consist, potentially, of the entire corpus of written literature; for now, however, we are restricted to that portion of it that is already electronically available (and hence quotable). It is in part for this reason that we will be creating a cognitive science archive and developing and monitoring more powerful ways of interacting with it. A natural next step in the experimental analysis of this form of interactivity is accordingly to test the comprehension of static texts (as opposed to live email messages), with and without Q/C.
S. Harnad
Study 4
In a design similar to Study 3, twelve primary texts (journal articles in cognitive psychology of approximately equal length) will be assigned as reading to a class of 60 students under three conditions: (1) Two texts will be assigned for reading in paper and then alloted ad libitum oral discussion in a separate tutorial session (10 students in each tutorial group). (2) Two texts will be assigned for reading in paper; then each student will be required to contribute a specified number of oral questions, answers, or comments during the tutorial session. (3) Two texts will be assigned for reading in electronic form on the Web and then alloted ad libitum oral discussion during the tutorial session. (4) Two texts will be assigned for reading in electronic form and then each student will be required to contribute a specified number of oral questions, answers, or comments during the tutorial session. (5) Two texts will be transmitted via email to the class alias list and archived as Hypermail. Multiple email questions from students, and replies by both instructor and students, will also be transmitted by email to all, and archived on the Web as Hypermail. Students will be required to post a specified number of questions, answers, or comments by email, without Q/C capability. (6) Two texts will be transmitted and archived as above. Students will be required to post a specified number of questions, answers, or comments by email, but with the Q/C format mandatory. The measure of performance will be quizzes for each module, as well as the students' subjective ratings and comments on the four conditions. The order of the conditions will not be random in this study, but 1 - 6 successively two times. One last variable to explore will be interactive means of evaluation in place of conventional quizzes or essays. Texts will be made available electronically with instructions to reduce them to their essentials and annotate them. This form of electronic analysis of the text is so new that the only way to evaluate it initially is to compare it only with itself: to see how well the rank ordering of performance in this task compares with the rank ordering using conventional forms of evaluation. If it proves to correlate well with already validated measures of mastery, it might be considered as a form of evaluation in its own fight.
Summary. The results of the experiments on the cognitive role of electronic text-capturing should begin to validate, quantify and optimise the very new interactive capabilities that have emerged recently. They will pave the path for applying Q/C in education, research, as well as industrial knowledge acquisition. E L E C T R O N I C ARCHIVES In 1990, a theoretical high-energy physicist called Paul Ginsparg (1994) contacted 100 colleagues with whom he had been exchanging paper preprints regularly and proposed exchanging them electronically instead. He set up a server in his Lab at Los
Interactive Cognition
Alamos, and all their preprints could be deposited there automatically, and then picked up automatically, by any of them. From this small high-energy theory preprint depository, Ginsparg's archive has grown in four years into one that now contains more than half the current physics literature world-wide. It receives 350 new articles per week and is accessed 45,000 times per day by the world's 25,000 physicists, who no longer rely on paper journals for their information, but on this archive (duplicated in many "mirror" copies of it being created all over the world to distribute the load and ensure that everyone can always access everything promptly). All of this is available world-wide, 24 hours a day, to everyone on the Internet for flee. Dr. Ginsparg's archiving facilities are supported by an NSF grant that covers the cost of his server, software, and system support. The system is completely automatised, requiring minimal human intervention. Yet there are safeguards against tampering with other people's texts or frivolous uses of the archive. The archive does not contain only the unrefereed preprints; authors substitute the refereed version (destined for a paper journal) as soon as it is available. No copyright issues have been raised by publishers, nor is it expected that they will be raised. The archive has completely revolutionised the communication and publication of scientific information in the world physics community. The implications are still being sorted out, in negotiations with learned societies and publishers in these disciplines. What is clear is that an unprecedented and invaluable new service has been offered to the world physics community, and one that they have without hesitation come to rely on extremely heavily. It is quite clear that the next step is to emulate Ginsparg's pioneering efforts in other disciplines. Ginsparg originally intended only to cover his own field of high energy theory, but the project very soon mushroomed to encompass the entire natural "module" of physics specialties. It makes much more sense for search and access purposes that these interrelated literatures be stored in a common comprehensive archive rather than in many different ones. So the expansion to include all of physics had a logic of its own. The Cognitive Sciences (including large portions~conceivably all--of Computer Science and Engineering, Psychology, Neuroscience, Behavioral Biology, Linguistics and Philosophy) provide another natural focus for a comprehensive electronic archive. The physics community had the advantage of already being (1) a preprint culture, accustomed to distributing and relying upon one another's work in (paper) preprint form prior to publication. They were also (2) a Tex culture, accustomed to preparing documents electronically in a standardised format for coding technical symbols. The cognitive sciences have the advantage of being (1) intrinsically involved in the electronic medium (computer science created it and cognitive engineering is now analysing and optimising it) and (2) apart from the cognitive science subfields that likewise use Tex for technical symbols, a large portion of the literature is ascii (plain text), making it a much smaller step to transmit a copy of a text a cognitive scientist already has in his own word-processor to the central electronic preprint archive. The Ginsparg archive can automatically accept, store, and make available for screen display texts in the many standard word-processing packages in use today.
S. Harnad
The Southampton Cognitive Sciences Eprint Archive At Southampton we will accordingly implement in the Cognitive Sciences what Paul Ginsparg has created in Physics. As allies and resources in this undertaking we will have (1) the expertise of the Editor and Editorial Office of Behavioral and Brain Sciences (BBS) Journal, one of the leading paper journals in the cognitive sciences, (2) the Editor and Editorial Offices of BBS's electronic counterpart, Psycoloquy, the first refereed electronic journal in Psychology, (3) the Cognitive Sciences Centre at the University of Southampton, (4) the Multimedia Laboratory at the University of Southampton, with many of the most advanced resources and skills in the electronic display, research and retrieval of scholarly and scientific documents, and (5) the help and guidance of Paul Ginsparg himself, who has agreed to make his system available to us. We, in turn, have the opportunity to enhance the capabilities of the electronic archive itself, and future ones modeled on it, with features such as Microcosm, developed at Southampton by Prof. Wendy Hall (Hall et al. 1994). Microcosm is an open hypermedia link service. The project is well-known in the international research community and the software has recently been released as a commercial product. Microcosm allows its users to integrate any disparate pieces of information into a cohesive web of documents, which can then be viewed with the user's everyday applications (word-processors, spreadsheets, databases etc.). The system provides additional functionality in the form of a communicating set of processes that control and relate the individual documents. Two of Microcosm's most powerful features are its ability to effectively overlay links onto any document, irresepective of its type, and the fact that it is constructed from a collection of simple processes working together. All links are stored separately from documents in link databases and link anchors can be content as well as location based. This means that links can be applied to documents over which the owner or author of the links has no write control, such as those commonly found on the Internet or on CD-ROM. Microcosm's flexibility means that new dynamic processes for link creation and/or resolution can be constructed and slotted into the system very easily. The archiving component of the project has many potential benefits (Heseltine 1994; Odlyzko 1994). If it follows the example of the Physics Archive, it will become an extremely widely known and widely used resource for all the disciplines involved. In generalising what has happened with the Physics archive to other disciplines, it will hasten the advent of a global scholarly/scientific archive (Harnad 1995). It will also help to guide the paper publishing industry in the direction where their expertise can be applied in future. As a European project (the Physics Archive having been an American one) it will put the UK in the front ranks of the revolutionary developments in academic publication and communication. And it will make the cognitive science literature available electronically for cognitive engineering analyses and applications of the kind that follow below. The literature will be attracted to the Archive by a series of Calls and Postings, inviting cognitive scientists to deposit their e-prints in the archive, and stressing the advantages (very high visibility; permanent, instant, global accessibility; textcapturability, for feedback, comments and quoting in other work; no need to pay to circulate preprints or reprints; speed; etc.). The Physics archive grew very rapidly without the need of any prompting, yet we will attempt to facilitate the process even more with regular Calls, as well as by publicising the availability of the cognitive
Interactive Cognition
science literature through the archive. Taxonomic indexes and keyword searching capabilities are provided by the Los Alamos system already, but these too will be enhanced to make them more useful to the world cognitive science community. The objective is also to provide a model that other fields, including commercial ones, can follow. CONCLUDING R E M A R K S In this paper I have suggested that electronic text-capturing and quote/commenting (Q/C) capability that has emerged in the last two decades has created the possibility of combining the discipline and reflectiveness of writing with the speed and interactiveness of speech in a form of interactive cognition that is sui generis and without precedent. Whether with another interlocutor or with a text, the interaction can now take place at the speed of thought, rather than at the lamentably slow turnaround time that paper communication had dictated. A series of experiments is described that will analyse and quantify the effects of Q/C interaction, both between people and between people and texts. In the interests of developing a corpus in the service of the latter form of interaction, plans for the establishment of a worldwide cognitive science e-print archive modeled on the Los Alamos Physics Preprint archive are also described. REFERENCES
Bailey, Elaine K., and Morton Cotlar, 1994. Teaching via the internet. Communication Education 43 (2): 184-193. Belew, R. K., and J. Rentzepis, 1990. HyperMail: treating electronic mail as literature. SIGUCCS Newsletter 20 (4): 26-31. Benbasat, I., and P. Todd, 1993. An experimental investigation of interface design alternatives: icon vs. text and direct manipulation vs. menus. International Journal of Man-Machine Studies 38 (3): 369-402. Berge, Z. L., and M. Collins, 1994. Life on the net. EDUCOM Review 29 (2):11-14. Blaye, A., and P. Light, 1994. Collaborative problem-solving with HyperCard: The influence of peer interaction on planning and information handling strategies. In: C. O'Malley, ed., Computer Supported Collaborative Leraning. Heidelberg: Springer Verlag 1994 Bolter, J. D., 1991. Writing space: The computer, hypertext, and the history of writing. Hillsdale, N.J.: Lawrence Erlbaum Associates. Brailsford, T. J., and P. M. C. Davies, 1995. Collaborative Learning On Networks Proceedings of Conference on Computer Assisted Learning, Queens College, Cambridge 1995, in press. Carley, K, and K. Wendt, 1991. Electronic mail and scientific communication- A study of the SOAR extended research group. Knowledge-Creation Diffusion Utilisation, 12 (4): 406-440. Cavalier, R. J., 1992. Course processing and the electronic agora: redesigning the classroom. EDUCOM Review 27 (2): 32-7. D'Souza, P. V., 1992. E-mail's role in the learning process: a case study. Journal of Research on Computing in Education 25 (2): 254-64.
S. Harnad
Davis, H. C., S. Knight, and W. Hall, 1994. Light Hypermedia Link Services: A Study of Third Party Application Integration. Proceedings of ECHT'94, ACM Press, 4150. Diederich, J., A. Thummel, and E. Bartels, 1992. Recurrent and feedforward networks for human-computer interaction. ECAI 92, 206-7. (Proceedings of 10th European Conference on Artificial Intelligence, Vienna, Austria, 3-7 Aug. 1992.) Chichester, UK: Wiley. Dunbar, R., 1993. Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences 16:681-736. Ginsparg, P., 1994. First Steps Towards Electronic Research Communication. Computers in Physics (August, American Institute of Physics) 8 (4): 390-396. (http://xxx. Golden, Peggy A.; Renee Beauclair, and Lyle Sussman, 1992. Factors affecting electronic mail use. Computers in Human Behavior 8 (4):297-311. Grigas, G., 1994. Distance teaching of informatics: motivations, means, and alternatives. Journal of Research on Computing in Education .27 (1): 19-28. Hall, W. 1994. Ending the tyranny of the button. IEEE Multimedia 1 (1):60-8. Hall, W., L. A. Carr, H. C. Davis, and R. J. Hollom, 1994. The Microcosm Link Service and its Application to the World Wide Web. Proceedings of the First International World Wide Web Conference, Geneva, May 1994, 25-34. Harnad, Stevan, 1979. Creative disagreement. The Sciences 19:18-20. Harnad, Stevan, 1984. Commentaries, opinions and the growth of scientific knowledge. American Psychologist 39:1497-1498. Harnad, Stevan, 1985. Rational disagreement in peer review. Science, Technology and Human Values 10:55-62. Harnad, Stevan, 1986. Policing the Paper Chase. (Review of S. Lock, A difficult balance: Peer review in biomedical publication.) Nature 322: 24-5. Harnad, Stevan, 1990a. Scholarly Skywriting and the Prepublication Continuum of Scientific Inquiry. Psychological Science 1:342-343 (reprinted in Current Contents 45: 9-13, November 11 1991). Harnad, Stevan, 1990b. The Symbol Grounding Problem. Physica D 42: 335-346. Harnad, Stevan, 1991. Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge. Public-Access Computer Systems Review 2 (1): 39-53 (also reprinted in PACS Annual Review Volume 2 1992; and in R. D. Mason, ed., Computer Conferencing: The Last Word. Beach Holme Publishers, 1992; and in: M. Strangelove & D. Kovacs: Directory of Electronic Journals, Newsletters, and Academic Discussion Lists, A. Okerson, ed., 2nd edition. Washington, DC, Association of Research Libraries, Office of Scientific & Academic Publishing, 1992); and in Hungarian translation in REPLIKA 1994.) Harnad, Stevan, 1992. Connecting Object to Symbol in Modeling Cognition. In: A. Clarke and R. Lutz, eds., Connectionism in Context. Berlin: Springer Verlag. Hamad, Stevan, 1992. Interactive Publication: Extending the American Physical Society's Discipline-Specific Model for Electronic Publishing. Serials Review, Special Issue on Economics Models for Electronic Publishing, pp. 58-61. Harnad, Stevan, 1994. Computation Is Just Interpretable Symbol Manipulation: Cognition Isn't. Special Issue on "What Is Computation" Minds and Machines 4: 379-390
Interactive Cognition
Harnad, Stevan, 1995a. Electronic Scholarly Publication: Quo Vadis? Serials Review 21 (1): 70-72. (Reprinted in Managing Information 2 (3) 1995) Harnad, Stevan, 1995b. Implementing Peer Review on the Net: Scientific Quality Control in Scholarly Electronic Journals. In: R. Peek and G. Newby, eds., Electronic Publishing Confronts Academia: The Agenda for the Year 2000. Cambridge MA: MIT Press. Harnad, Stevan, 1995c. The Origin of Words: A Psychophysical Hypothesis In: W. Durham and B. Velichkovsky, eds., Naturally Human: Origins and Destiny of Language. Muenster: Nodus Pub. Harnad, Stevan, 1995d. The PostGutenberg Galaxy: How To Get There From Here. Times Higher Education Supplement. Multimedia. P. vi. May 12, 1995. Harnad, Stevan, 1995e. Universal FTP Archives for Esoteric Science and Scholarship: A Subversive Proposal. In: Ann Okerson and James O'Donnell, eds., Scholarly Journals at the Crossroads; A Subversive Proposal for Electronic Publishing. Washington, DC., Association of Research Libraries, June 1995. Harnad, Stevan, ed., 1982. Peer commentary on peer review: A case study in scientific quality control. New York: Cambridge University Press. Harnad, Stevan, ed., 1987. Categorical Perception: The Groundwork of Cognition. New York: Cambridge University Press. Harnad, Stevan, H. D. Steklis, and J. B. Lancaster, eds., 1976. Origins and Evolution of Language and Speech. Annals of the New York Academy of Sciences 280. Heseltine, R., 1994. A critical appraisal of the role of global networks in the transformation of higher education. Alexandria 6 (3): 159-71. Hill, G., R. Wilkins, and W. Hall, 1993. Open and Reconfigurable Hypermedia Systems: A Filter Based Model. Hypermedia 5 (2): 103-118. Hill, G. J., and W. Hall, 1994. Extending the Microcosm Model to a Distributed Environment. Proceedings of ECHT'94, ACM Press, pp 32 - 40. Hjalmarsson, Ann, Lars Oestreicher, and Yvonne Waern, 1989. Human factors in electronic mail system design. Behaviour & Information Technology 8 (6): 461-474. Holden, M., and W. Mitchell, 1993. The future of computer-mediated communication in higher education. EDUCOM Review 28 (2): 31-7. Holden, M. C. and J. F. Wedman, 1993. Future issues of computer-mediated communication: the results of a Delphi study. Educational Technology, Research and Development 41 (4): 5-24. Horowitz, Rosalind, and S. Samuels, eds., 1987. Comprehending oral and written language. San Diego, CA.:Academic Press. Hutchings, G. A., W. Hall, W., and C. J. Colbourn, 1993. Patterns of students' interactions with a hypermedia system. Interacting with Computers .5 (3): 295-313. Kemp, F., 1992. Emphasizing rhetorical effectiveness through computer networks. (Computer Support for Collaborative Learning Workshop, IL, USA, 4-6 Oct. 1991). SIGCUE Outlook 21 (3): 29-31. Lea, Martin, ed., 1992. Contexts of computer-mediated communication. London, England: Harvester Wheatsheaf. Light, P., and G. Butterworth, eds., 1992. Context and cognition: Ways of learning and knowing. The developing body and mind. Hillsdale, NJ.: Lawrence Erlbaum Associates.
S. Harnad
Light, Paul, Karen Littleton, David Messer, Richard Joiner, 1994. Social and communicative processes in computer-based problem solving. Special Issue: Adult practices and children's learning: Communication and the appropriation of cultural tools. European Journal of Psychology of Education 9 (2): 93-109. Mayes, J. T., and I. Neilson, 1994. Learning from other people's dialogues: Questions about computer-based answers. Nantes JFIP Conference 1994 (in press). McCarthy, J. C., and A. F. Monk, 1994. Measuring the quality of computer-mediated communication. Behaviour & Information Technology 13 (5): 311-319. McKnight, C., A. Dillon, and J. Richardson, eds., 1993. Hypertext: A psychological perspective. Chichester: Ellis Horwood. Odlyzko, A. M., 1995. Tragic loss or good riddance? The impending demise of traditional scholarly journals, International Journal of Human-Computer Studies (formerly International Journal of Man-Machine Studies), to appear. Condensed version to appear in Notices of the Amercan Mathematical Society, January 1995. (ftp://netlib. att. corn/netlib/att/math/odlyzko/tragic.loss.Z) Poling, D. J., 1994. E-mail as an effective teaching supplement. Educational Technology 34 (5): 53-5. Rogers, Gil., 1989. Teaching a psychology course by electronic mail. Social Science Computer Review, 7 (1): 60-64. Romiszowski, A. J., 1990. Computer mediated communication and hypertext: the instructional use of two converging technologies. Interactive Learning International 6 (1): 5-29. Searle, John, 1980. Minds, brains and programs. Behavioral and Brain Sciences 3: 417-457. Sein, Maung K., and Daniel Robey, 1991. Learning style and the efficacy of computer training methods. Perceptual & Motor Skills 72 (1): 243-248. Shedletsky, L., 1993. Minding computer-mediated communication: CMC as experiential learning. Educational Technology 33 (12): 5-10. Sperber, Dan, and Deirdre Wilson, 1986. Relevance: communication and cognition. Oxford: Blackwell. Sproull, L., and S. Kiesler, 1991. Connections: New ways of working in the networked organization. Cambridge, Mass.: MIT Press. Steklis, H. D., and Stevan Harnad, 1976. From hand to mouth: Some critical stages in the evolution of language. In: Harnad et al. 1976, 445 - 455. Van Dijk, Jan A., 1993. The mental challenge of the new media. Medienpsychologie: Zeitschrift fur Individual- & Massenkommunikation 5 (1): 20-45. Wittgenstein, L., 1953. Philosophical investigations. New York: Macmillan Interactive Cognition: Developing and Analysing Electronic Capabilities
