Advances
in COMPUTERS VOLUME 28
Contributors to This Volume
MUSTAFA A. G . ABUSHAGUR H. JOHN CAULFIELD DASGUPTA SUB...
113 downloads
1404 Views
14MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 28
Contributors to This Volume
MUSTAFA A. G . ABUSHAGUR H. JOHN CAULFIELD DASGUPTA SUBRATA M . H . EICH A. R. HURSON ABRAHAM KANDEL MANFRED KOCHEN L. L. MILLER MIRMOJTABA MIRSALEHI S. H. PAKZAD MORDECHAY SCHNEIDER B. SHIRAZI
Advances in
COMPUTERS EDITED BY
MARSHALL C. YOVITS Purdue School of Science Indiana University --Purdue Indianapolis, Indiana
University of Indianapolis
VOLUME 28
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
Boston San Diego New York Berkeley London Sydney Tokyo Toronto
COPYRIGHT @ 1989 BY ACADEMIC PRESS.INC
ALL RIGHTS RESERVED. N O PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, CA 92101
United Kingdom Edition published b y
ACADEMIC PRESS INC. (LONDON) LTD. 24-2XOval Road. London NWI 7DX
LIBRARY OF CONGRESS CATALOG CARDNUMBER:59-15761 ISBN 0-12-012128-X PRINTED IN THE UNITTEV STATE Ok AMERICA
R9 90 91 92
Y H 7 6 5 4 3 2 I
Contents CONTRIBUTORS . . . . . . . . . . . . . . . . . . . PREFACE. . . . . . . . . . . . . . . . . . . . .
vii ix
The Structure of Design Processes Subrata Dasgupta
1. Introduction . . . . . . . . 2. The Basic Characteristics of Design 3 . Design Paradigms . . . . . . 4. Design as Scientific Discovery . . 5. Conclusions . . . . . . . . References . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . .
1
3 29 55 61 62
Fuzzy Sets and Their Applications to Artificial Intelligence Abraham Kandel and Mordechay Schneider
1. 2. 3. 4. 5.
Introduction . . . . . . . . . . . . . . FuzzySets . . . . . . . . . . . . . . . . Fuzziness and Typicality Theory . . . . . . . Applications of Fuzzy Set Theory to Expert Systems Conclusion . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
69
. 71 . . 79 . . 90 . 101 . 103
Parallel Architectures for Database Systems
. .
. .
. .
A R Hurson. L L Miller. S H Pakzad. M H Eich. and B Shirazi
. .
.
1. Introduction . . . . . . . . . 2. Classification of Database Machines . 3. Database Machines . . . . . . . 4. Conclusion and Future Directions . . References . . . . . . . . . . .
. . . .
. . . .
. . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . .
108 110
119 144
146
Optical and Optoelectronic Computing Mir Mojtaba Mirsalehi. Mustafa A G Abushagur. and H John Caulfield
. .
.
1 . Introduction . . . . . . . . . . . . . . . . . . 2. Basicoperationsfor Optical Computations . . . . . . . V
154 155
vi
CONTENTS
3. 4. 5. 6. 7.
Elements of Optical Computers. . . Analog Processors . . . . . . . Digital Processors . . . . . . . Hybrid Processors . . . . . . . Conclusion . . . . . . . . . . . References . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
157 175 196 212 219 221
Management lntelllgence Systems Manfred Kochen
1 . Introduction . . . . . . . . . . . . . . . . . . . 227 2. On the Nature of Intelligence . . . . . . . . . . . . . 234 3. What is a MINTS: Requirements and Uses . . . . . . . . . 242 4. Analysis. Design and Maintenance of MINTSs . . . . . . . 253 5. Managerial Issues . . . . . . . . . . . . . . . . . 267 273 6. Conclusion . . . . . . . . . . . . . . . . . . . . 274 References . . . . . . . . . . . . . . . . . . . . AUTHORINDEX.
. . . . . . . . . . . . . . . . .
279
INDEX. . . . . . . . . . . . . . . . . . SUBJECT
287
OF PREVIOUS VOLUMES . . . . . . . . . . . . CONTENTS
295
Contributors Numbers in parentheses refer to the pages on which the authors' contributions begin.
Mustafa A. G. Abushagur (153), Electrical and Computer Engineering Department. University of Alabama in Huntsville, Huntsville, Alabama 35899 H. John Caulfield (153), Center for Applied Optics, University of Alabama in Huntsville, Huntsville, Alabama 35899 Subrata Dasgupta (l), The Center for Advanced Computer Studies, University of Southwestern Louisiana, Lafayette, Louisiana 70504-4330 M. H. Eich (107), Department of Computer Science, Southern Methodist University, Dallas, Texas 7527.5 A. R. Hurson (107),Computer Engineering Program, Department of Electrical Engineering, Pennsylvania State University, University Park, Pennsylvania 16802 Abraham Kandel (69), Computer Science Department and The Institute for Expert Systems and Robotics, Florida State University, Tallahassee, Florida 32306 - 4019 Manfred Kochen' (227), School of Medicine (Mental Health Research Institute) and School of Business Administration (Computer & Information Systems), University of Michigan, Ann Arbor, Michigan 48109 of Michigan, Ann Arbor, Michigan 48109 L. L. Miller (107), Department of Computer Science, Iowa State University, Ames, Iowa 5001 1 Mir Mojtaba Mirsalehi (153), Electrical and Computer Engineering Department, University of Alabama in Huntsville, Huntsville, Alabama 35899 S. H . Pakzad (107),Computer Engineering Program, Department of Electrical Engineering, Pennsylvania State University, University Park, Pennsylvania 16802 Mordechay Schneider (69), Computer Science Department and The Institute for Expert Systems and Robotics, Florida State University, Tallahassee, Florida 32306-4019 B, Shirazi ( 1 07), Department of Computer Science, Southern Methodist University, Dallas, Texas 7.5275
vii
This Page Intentionally Left Blank
Preface
The publication of Volume 28 of Adaunces in Computers continues the in depth presentation of subjects of both current and continuing interest in computer and information science. Contributions have been solicited from well respected experts in their fields who recognize the importance of writing substantial review and tutorial articles in their areas of expertise. Advunces in Computers permits the publication of survey-type articles written from a relatively leisurely perspective; authors are thus able to treat their subjects both in depth and in breadth. The Advances in Computers series began in 1960 and now continues in its 29th year with Volume 28. During this period, which witnessed great expansion and dynamic change in the computer and information fields, it has played an important role in the development of computers and their applications. The continuation of the series over this lengthy period is a tribute to the reputations and capabilities of the authors who have written for it. Volume 28 includes chapters on design processes, fuzzy sets as related to artificial intelligence, database systems, optical computing, and intelligence systems for management. In the first chapter, Dr. Dasgupta states that design is one of the most ubiquitous of human activities. Anyone who devises a course of action to change an existing state of affairs to a preferred one is involved in the act of design. As such, it is of central concern not only in traditional engineering but also in the generation of symbolic or abstract devices such as plans, organizations, and computer programs. He provides a systematic presentation of current understanding of the structure of design processes. Quite independent of the specific design domain, design problems share a common structure so that it is possible to talk of general theories of design, that is, general, domain-independent, explanatory models of the design process. Dr. Kandel and Dr. Schneider are concerned with fuzzy sets and their applications, particularly to artificial intelligence and to knowledge engineering. They point out that the theory of fuzzy sets has as one of its aims the development of a methodology for formulating and solving problems that are too complex or too ill-defined to be susceptible to analysis by conventional techniques. They believe the main reason for using fuzzy set theory in artificial intelligence and expert systems is that much of the information that is resident in knowledge-based systems is uncertain in nature. Since the early 1970s, the complexity of conventional database management systems has gradually increased by the number and size of databases and the number and type of application programs and on-line users. Professor ix
X
PREFACE
Hurson and his collaborators state that conventional systems using typical software approaches fail to meet the requirements of the various applications, and that since the mid 1970s a great deal of effort has been directed towards the design of special-purpose architectures for efficient handling of large database systems, namely Data Base Machines. The primary goal of their chapter is to examine the impact of current technology on the design of special-purpose database machines. Professors Mirsalehi, Abushagur, and Caulfield tell us that optical computing emerged from the sciences of holography and coherent optical information processing in this decade and developed as an important discipline only in 1983-1984. The authors discuss the fundamentals of optical computing, the elements of optical computers, and different types of optical processors. Optical computing, they maintain, should not duplicate the architectures that have been used for electronic computers, but rather should utilize techniques that take advantage of the strengths of optics and avoid its weaknesses. One of the most important features of optics is its capability of global interconnections. Therefore, areas such as neural networks, which utilize this feature, are the most promising for optical computing. According to Dr. Manfred Kochen, a management intelligence system is intended to scan the environment of the organization it serves, making it possible for management to better assess its position, thus enhancing the value of the organization and its services. Simple versions of such systems, he points out, have existed for a long time. The management intelligence systems required by competing organizations in government and business are the best they can obtain, making use of advanced technology, such as artificial intelligence. The purpose of such man-machine systems is to support professional strategies, planners, and researchers, as would a good semiautomated research assistant. The requirements can also provide needed direction to research in artificial intelligence since both are necessary for a management intelligence system to be effective. The purpose of Kochen’s chapter is to emphasize the importance of management intelligence systems and to encourage studies involving them. I am saddened to learn of the sudden and unexpected loss of my good friend and valued colleague who wrote this chapter on management intelligence systems. Shortly after receipt of the final version of his article, Fred Kochen was suddenly and fatally stricken. He will be missed both as a friend and as a leader in our profession. He and his students have had a major effect on the development of information science, culminating in the innovative and important article in this volume. Fred and I have been close colleagues and friends for many years. I will especially miss his counsel and advice of both a professional and a personal nature. I am pleased to thank the contributors to this volume. They gave extensively of their time and effort to make this book an important and timely
PREFACE
xi
contribution to their profession. Despite the many calls upon their time, they recognized the necessity of writing substantial review and tutorial contributions in their areas of expertise. I t required considerable effort on their part, and their cooperation and assistance are greatly appreciated. Because of their emorts, this volume achieves a high level of excellence and should be of great value and substantial interest for many years to come. It has been a pleasant and rewarding experience for me to edit this volume and to work with those authors. MARSHALL C. YOVITS
This Page Intentionally Left Blank
The Structure of Design Processes SUBRATA DASGUPTA The Center for Advanced Computer Studies University of Southwestern Louisiana Lafayette, Louisiana
1. Introduction. . . . . . . . . . . . . . . . . 2. The Basic Characteristics of Design . . . . . . . . . 2.1 To Design is to Change . . . . . . . . . . . 2.2 Design Begins with Requirements. . . . . . . . 2.3 To Design is to Represent . . . . . . . . . . 2.4 The Satisficing Nature of Design Processes . . . . 2.5 The Evolutionary Nature of Design Processes . . . 2.6 Summary . . . . . . . . . . . . . . . . 3. Design Paradigms . . . . . . . . . . . . . . . 3.1 Some Terminological Clarifications . . . . . . . 3.2 The Analysis-Synthesis-Evaluation Paradigm. . . . 3.3 The Artificial Intelligence Paradigm . . . . . . . 3.4 The Algorithmic Approach . . . . . . . . . . 3.5 The Formal Design Paradigm . . . . . . . . . 3.6 The Theory of Plausible Designs . . . . . . . . 4. Design as Scientific Discovery . . . . . . . . . . . 4.1 The Hypothetico-Deductive (HD) Form of Reasoning. 4.2 Kuhnian Paradigms . . . . . . . . . . . . 4.3 Basis for the DSD Hypothesis . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . .
1.
. . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 3 3 5 8 14 11 28 29 29 30 34 38 40 41 55 55 51 51 61 62
Introduction
In its broadest sense, design is one of the most ubiquitous of human activities. As Simon (1981) has pointed out, anyone who devises a course of action to change an existing state of affairs to a preferred one is involved in the act of design. As such, it is of central concern not only in traditional engineering-dealing with such material artifacts as structures, machines, and production plants-but also in the generation of symbolic or abstract devices such as plans, organizations and computer programs. If we extend the sense of the term “engineering” to encompass the generation of all useful artifacfs then, 1 ADVANCES IN COMPUTERS. VOL. 28
Copyright ‘<,I 19x9 by Academic Press. Inc. All rights of reproduction in any form reserved. ISBN 0- 12-012 IZX-X
2
SUBRATA DASGUPTA
except in the simplest cases, we may safely say that engineering entails design It is quite natural, then, that design as a human activity should be of interest in its own right independent of any particular engineering discipline. Design theory is the discipline that takes the design process itself as the object of interest . The aims of design theory are basically twofold: (a) To construct models of the design process (from logical, methodological or cognitive perspectives) that further enhance our theoretical understanding of design; by implication, to help establish what are the common elements of design across engineering disciplines. (b) To allow us to construct more rational methods, tools, and systems to support practical design. As might be expected, the practitioners of design theory are drawn from a
variety of engineering disciplines, including civil and mechanical engineering, architecture, computer science and engineering, management science, and chemical engineering2. Interest in design theory has, in recent years, been particularly sharpened by two computer-related technological advances: computer-aided design (CAD) (Encarnacao and Schlechtendahl, 1983) and knowledge-based (KB) systems (also called expert systems) (Hayes-Roth, Waterman, and Lenat, 1983). C A D systems have traditionally relied on algorithmic approaches, while K B systems are mostly heuristic and rulebased. However, in recent times, a convergence of the two may be observed in the form of knowledge-based C A D systems (Latombe, 1978; Sata and Warman, 198 1 ; Gero, 1985). The aim of this chapter is to give a systematic presentation of our current understanding of the structure of design processes. We hope to show that, quite independent of the specific design domain, design problems share a common structure and that it makes sense to talk of general theories of design-that is, general, domain-independent, explanatory models of the design process. In discussing the structure of design processes, our examples and illustrations will be drawn from the domain of computing systems design-a
’
Simon (Simon, 1981) coined the tern “sciences of the artificial” in 1969 to denote all such disciplines concerned with the production of useful artifacts. While the term itself is attractive (complementing “natural sciences”) it has unfortunately not been absorbed into the general vocabulary of the practitioners of the relevant disciplines although the term is now 20 years old. Thus, as far as this article is concerned, we shall continue to use the term “engineering.” * While specific references will be cited in the course of this article, as general references to the discussion of design theory from a number of different perspectives we suggest the following: Dijkstra(1976). Simon (1981),Hubka(1982), Evans, Powell and Talbot (1982),Jaquesand Powell (198O),Cross(1984),Jones(1980,1984),Spillers(l972),Westerberg(l981), Broadbent (1973),Giloi and Shriver (lY85), Dasgupta (1984),and Middendorf (1986).
THE STRUCTURE OF DESIGN PROCESSES
3
term that embraces all abstraction levels at which one may design nontrivial computational devices. (This includes, for example, logic circuits, computer architcctures, computer languages and computer software.) By doing so we hope, firstly, to show the precise place and relevance of design theory in computer science and engineering; and secondly, to establish the current scope of, and apparent limits to, a rational foundation for computer-aided design and knowledge-based systems. We also note, however, that many of the ideas presented here apply as much to, and are drawn from, other fields of engineering, and we shall have occasion to refer to the work of design theorists in other disciplines-in particular, the field of architecture. Thus, in a broader sense, this paper is also intended as a contribution to the general theory of design.
2.
The Basic Characteristics of Design
Just what design is-that is, characterizing what one does when one designs and what the end product of this activity is-has been discussed at length in specific contexts such as architecture (Alexander, 1964; Broadbent, 1973; March, 1976; Akin, 1978; Darke, 1979; Lawson, 1980; Rowe, 1987), structural engineering (Rehak, Howard and Sriram, 1985), computer architecture (Dasgupta, 1984; Aguero and Dasgupta, 1987), software (Freeman, 1980a, 1980b; Lehman, 1974,1984), and digital systems (Zimmerman, 1981;Giloi and Shriver, 1985). More general reflections, independent of any particular engineering discipline include Jones ( 1980, 1984), Spillers (1972), Cross (1 984) and Simon (1981), while recent interest in automating the design process has resulted in new models of design expressed in the vocabulary of artificial intelligence and computer science (Gero, 1985; Mostow, 1985). Regardless of the various theories or models advanced by these writers on the structure of the design process, it seems clear that design-the process and its product-exhibits some obvious general characteristics that are either true by definition or can be empirically observed. Any adequate theory of design must take these characteristics into account. In this section we explicitly state these characteristics, provide examples, and identify (where relevant) their obvious implications for theorizing about design. 2.1
To Design is to Change
In an ultimate philosophical sense, the goal of design is to change or extend some aspect of the world. That is, we perceive an imperfection in the present state of affairs in some specific domain and we design to correct or improue this state of affairs. There are two important consequences of this rather obvious observation.
4
2.1.1
SUBRATA DASGUPTA
The Engineering-Science “Distinction”
At first, this basic fact about design appears to establish a demarcation between the engineering disciplines (as defined in Section 1) and the natural sciences (such as physics, biology or geology). The latter disciplines are concerned with understanding the natural universe; the engineering disciplines are concerned with its purposeful, artificial alteration or extension. Unfortunately, this demarcation of the natural sciences from engineering as a result of the differences in their respective aims has led to the perpetuation of a myth over several decades: that the activities of science and engineering are fundamentally different. This so-called difference has been articulated in a variety of ways, for example (a) that science is concerned with “analysis” and engineering with “synthesis;” (b) that science is “theory-oriented” while engineering is “result-oriented;” or (c) that engineering is “creative, spontaneous, and intuitive” while science is “ r a t i ~ n a l ” ~ . We suggest that in making such assertions, proponents have confused diflerences in Objectives for methodological diflerences. As noted above, that the natural sciences and engineering differ in their objectives is undeniable. But this by no means implies that the method of science is distinct from the method of engineering. We shall, in fact, go one step further. In Section 4, we shall propose the stronger hypothesis that the method of design is identical t o the method of science. 2.1.2
The Question of Values
The second implication of the observation that “to design is to change” is the following: if design is indeed concerned with how things ought to be, then the question of values surfaces in at least two different ways. Firstly, what constitutes a valid design problem is often determined by the designer’s “value system.” A state of affairs that appears unsatisfactory to one person-and thus is perceived as a design problem-may be quite satisfactory to another. Example 2.1 For almost three decades, researchers (mainly in universities) have been designing and implementing computer hardwarz description languages (CHDLs) (Dasgupta, 1982). The “problem” perceived by these investigators was that the quality of hardware design-especially at the higher levels of abstraction-would be greatly improved by using a formal language as a description tool. However, until recently much of the industrial A recent discussion along such lines is Lawson (1980, pp 30-33). Schon (1983) goes even further by proposing an entirely new epistomology, which he terms “reflection-in-action.” He contrasts this with what he calls the “technical rationality” of science. Cross, Naughton and Walker (1981) have also sought to uncouple design from science by invoking the “argument from objectives”-the fact that the objectives of science and those of design differ. See above for our comments on this argument.
THE STRUCTURE OF DESIGN PROCESSES
5
community perceived this as a “nonproblem,” and the developments in CHDL principles were largely ignored. It is only with the recent progress in VLSI technology (Mead and Conway, 1980) and the complexities attending the design of VLSI-based systems that hardware designers are beginning to perceive the potential for CHDLs (Shahdad et al., 1985). Example 2.2 A very similar situation prevailed in the domain of firmware engineering where, since the early 1970s, much effort has been expended in the design and implementation of high-level microprogramming languages (HLMLs) (Dasgupta and Shriver, 1985; Davidson, 1986). Here, again, the problem that was perceived was the unsatisfactory state of (rather primitive) microprogramming methods and tools, and the intended solution was to move the microprogramming effort to a more abstract and manageable level. The designers and implementers of these experimental HLMLs believed that higher-level tools were intrinsically more desirable than the low-level tools then being used. However, it was only a decade later that the validity of this design problem came to be recognized in the industrial domain, with the result that high-level microprogramming languages are now being commercially designed and implemented (Hopkins, Horton and Arnold, 1985; Sheraga and Gieser, 1983)4. Secondly, even assuming that there is a common recognition of a particular design problem within the design community, there may be wide disagreement as to the nature of the solution approach. The latter may, again, be determined in part by the individual designer’s system of values. Example 2.3 One group of designers may approach a design problem using criteria such as “simplicity,” “formal elegance” or “understandability” as the guiding principle, while another group may assign central importance to the issues of cost and performance. 2.2
Design Begins with Requirements
The assertion that to design is to alter some state of affairs into a more preferred one, translates in practical terms to the fact that the designer begins with a problem. Thus the starting point for any design activity is a description of the problem given in the form of a specijication of requirements. Some of the requirements stipulate the desired functional capabilities of the target system. Others may specify its desired performance and cost characteristics or other constraints that the system is required to satisfy. Note that just as values determine valid design problems, they also determine valid research topics in engineering since much of engineering research is concerned with design methods and issues. This, of course, has profound implications for the role of values in the funding of research (by funding agencies) in engineering or design-related disciplines.
6
SUBRATA DASGUPTA
Unfortunately, there are two kinds of problems that are often encountered by the designer in this context. We may call these, respectively, the preciseness issue and the incompleteness issue. 2.2.1
The Preciseness Issue
Clearly, in many design situations a requirement may be naturally stated in a sufficiently precise or formal manner that one can construct definite and applicable procedures for determining whether or not the design meets such requirements. Design problems that can be formulated in such a manner fall within the category of what Simon (1973) termed well-structured problems. Example 2.4 Consider the design of a program that computes the greatest common divisor (GCD) of two positive integers x and y . This is a “textbook” example of the well-structured problem, since it admits of a precise requirements specification that is naturally captured in mathematical terms. One formulation of the requirements (Hoare, 1987) is
Let Z = GCD(x, y). Then Z is such that: ( i ) Z divides x exactly; ( i i ) Z divides y exactly; ( i i i ) 2 is the largest of the set of integers satisfying ( i ) and ( i i ) . Note that the preciseness of these requirements, even though stated in prose, are guaranteed by the fact that such properties as “divides exactly” and “largest of the set of integers” are defined precisely by the laws of arithmetic and set theory. Unfortunately, many design problems do not admit of requirements that can be so precisely defined. Such problems are ill-structured (Simon, 1973) and exhibit, among others, the characteristic that there may be no definite, objective criterion or procedure for determining whether or when a design meets the requirements. Example 2.5 In one well known computer design project (Katevenis, 1985) a major requirement is to the effect that the instruction set must efficiently support frequent, time consuming operations present in a significant sample of high-level language programs5. Such attributes as “efficient support” or “significant sample” are inherently ambiguous and subjective. Thus, even if one does characterize these attributes in more precise terms to aid the design process, such a characterization will be subjective at best and arbitrary or ad hoc at worst. This particular formulation of the requirement is due to Aguero (1987).
THE STRUCTURE OF DESIGN PROCESSES
7
Fortunately, in spite of the ill-structuredness of many design problems, they do continue to be solved. Later (in Sections 3.3 and 3.6) we shall offer explanations of how such problems can be rationally addressed. 2.2.2
The Incompleteness Issue
Another issue is that, at the start of a design project, the requirements may be known rather incompletely. Indeed, it may be the case that a design project begins with one or a small number of requirements that collectively represent the basic problem that caused the project to be identified. These are, then, merely the “top level” or most fundamental requirements. To state this differently, problem identification in many design situations may be quite incomplete or poorly understood at the time the design begins. The reason why design may begin with incomplete requirements is suggested by Simon’s (1976, 1981, 1982) well known notion of bounded rutionality: the fact that designers and their clients, as all human beings, are limited in their capacities to make fully rational choices or decisions, in part because of their limited mental capabilities, and in part by their limited or imperfect knowledge of all the relevant information pertaining to a particular problem or situation. Thus, quite simply, we may not grasp all the requirements that the target system is to satisfy; or we may not realize at the beginning how different requirements interact with one another. We shall later see a more general and wide-ranging consequence of bounded rationality for the design process (Sections 2.4 and 2.5). But the implications of the incompleteness issue, both for constructing a theory of design and for the development of practical design methods, are in fact quite considerable: (a) Incompleteness blurs the distinction between “requirements” and “design.” After all, if a particular aspect of the problem-that is, a particular requirement -becomes evident only during the design process itself then is it really a part of the set of requirements or is it a component of the design?6 (b) Regardless of how we respond to (a), it becomes obvious that in a design problem of any reasonable complexity an inherent component of the design process is, in fact, the development, expansion, or identification of the set of requirements. Example 2.6 The design of the hardware description language S * M (Dasgupta, 1985; Dasgupta, Wilsey and Heinanen, 1985) began with the need
‘We shall return to this issue when discussing the theory of plausible designs (Section 3.6).
8
SUBRATA DASGUPTA
for a language that could be used to provide descriptions of microarchitectures for retargetable firmware development systems. Based on this fundamental consideration, an initial specification of the requirements was established. Among these requirements were the following: ( i ) The language must support both operational and functional modes of hardware description (Dasgupta, 1982). That is, the language user should be able to describe hardware modules in terms of externally observable behavior or function only (expressed, say, in axiomatic form in terms of precondition/postcondition pairs), or in terms of an ordered set of steps or operations that the module goes through when it is activated. ( i i ) At the micro-architecture level, computers are (usually) clocked systems. Thus, the language must provide constructs to specify the fact that each module (described functionally or operationally) is controlled by one or more clocks.
However, in the course of designing S*M one of the consequences of these two requirements soon became evident. If the functional behavior of a hardware component is specified in terms of a set of precondition/ postcondition pairs, and if the temporal behavior of the module is controlled by a specific clock, then at which stage of the clock’s action should the postcondition be assumed to hold? Thus, a new requirement for the language had to be decided during the design process itself in response to this question. Alternatively, of course, one could view the answer to this question as a design decision-that is, as a feature of the design itself. 2.3 To Design is to Represent
The ultimate result of a design activity is an explicit description or representation of the target artifact in some artificial symbolic language. This reference serves as a blueprint (or set of instructions) for implementing the system. In designing, then, a distinction is not only made between conceptualization and making; it also involves externalizing the concept in the form of an explicit representation. The question may be raised as to why an explicit representation of the artifact is at all necessary. For, as Jones (1980) has pointed out, the skilled craftsman of old did not make the distinction between conceptualization and making. The conceptualization was in the making. The separation of the two came about when the artifact became too large or complex for the craftsman’s cognitive capacity. It then became necessary first to capture the concept in an external symbolic language (usually, a drawing)-to “design” the artifact, in
THE STRUCTURE OF DESIGN PROCESSES
9
other words-and then to make the artifact based on the design. In Jones’s terms, the artifact passed from the craft stage to the stage of design-bydrawing’. The precise nature of the symbolic representation will, of course, be determined by the nature of the artifact concerned. Both traditionally or conventionally, the output of the design activity is the production of a structural form-that is, a description of the components of the artifact and their static interrelationship. This is, however, to take a rather myopic view of the function of a design representation. For a representation is not only to serve as a blueprint or an encoded set of instructions to guide implementation. It must also serve as a medium for the analysis and criticism of the design and as a basis for exploring and experimenting with alternute decisions. Thus the language of representation, as well as the representation itself, must satisfy the following constraints: it should be possible to recover or deduce many important characteristics of the artifact from the representation, including, for example, such characteristics as 0 0 0 0 0 0
structural form function or behavior performance cost reliability aesthetics.
Not all these characteristics may be relevant to a particular design activity, nor need they be explicitly stated in the representation. Indeed, the key to a good design representation language is its ability to produce representations in which one or two of the artifact’s characteristics are stated explicitly while other characteristics are deduced according to some set of rules. Needless to say, this is easier said than done.
Example 2.7 Figure 1 shows a part of a 16-bit computer’s data path, which may be the outcome of some particular computer design project. This is a representation (in pictorial form) of structural form. It shows the principal components (the registers In 1, In 2, Out and the functional unit A ) , and their
’
In a similar vein, Alexander (1964) in the context of buildings, writes of the unseycconscious process (of creating buildings) which has, roughly, the following characteristics: there is little thought about the design as such; there may be general “remedies” for particular types of failures but no general principles of design; there is practically no division of labor and, thus, specialization is rare; there are no external or written means of communicating ideas; and “design” decisions are made according to tradition or custom rather than the builder’s new ideas.
10
SUBRATA DASGUPTA
Register
Register
1FIG.I .
Register
Part of a data path (the arithmetic-logic-unit).
interconnections. Note, though, that one can infer neither the functional behavior of this fragment of the data path, nor its performance characteristics from the diagram alone. As a representation, this diagram is of very limited value (as, in fact, is the case with all block diagrams). A more complete representation of this “design” would necessitate in addition, or as an alternative, to the structural description of Fig. 1, a description that allows us to understand its function and performance. Figure 2 is thus a specification of the arithmetic-logic unit (ALU) in a hardware description language (HDL) where the interconnections are now implicit (but deducible) and the functional and performance characteristics are explicit. This particular representation (using the HDL S*M (Dasgupta, Wilsey, and Heinanen, 1986)) defines, first, a set of stores (fn 1, In 2, etc.) and then the characteristics of a 100-nanosecond, 3-phase clock. It then specifies the behavior of the ALU module by stating its input and output ports, the fact that it is active in the second (50 nanosecond) phase of the clock, and then characterizing the operation or function of the functional unit.
THE STRUCTURE OF DESIGN PROCESSES
11
store I n l , In?, Out : seq[l5..0] of bit; Ctl : seq [3..0] of bit; clock Clk dur IOOns suhclock
ph I : dur 25ns ph2 : dur 50ns ph3 : dur 25ns end clock
module A inport Inl. In2. Ctl outpnrt Out guard I true, Clk.ph2 I
effect case Ctl is when 2#0010# = > new Out = In1 and In2 when 2#0100# = > new Out = In1 or In2 when ?#0011# = > new Out = In1 In2
+
end case end module
FIG. 2. Description of the data fragment in a hardware description language.
Consider, however, the usefulness of Fig. 2 when someone is reviewing the design and begins to ask “why” questions-questions of ju.st$cation. For instance, the reviewer may want to know why the functional unit is designed to be active in phase 2 of the clock Clk; or why Clk has the characteristics that it has. As a representation, Fig. 2 is hopelessly inadequate in providing information of this sort-information that is crucial when one views a representation as a medium of experimentation and exploration. For such purposes, a design representation should include a documentation of the justification of design decisions and of the cause-effect relationships between design decisions.
Example 2.8 In the design paradigm called the theory of plausible designs (Aguero and Dasgupta, 1987; see Section 3.6 below), one of the principal issues addressed is that of the plausihiliry of a design-that is, to demonstrate through rhr design represrntation itself, the plausibility (or believability) of a design as a whole or of its specific components. For this purpose, the design of a target system is represented in a number of ways. Firstly, the design can be described in a formal, machine-executable language (e.g., a programming or hardware description language). Such a representation is most appropriate as a blueprint for implementation or for the purpose of experimentation using the description as an input to the simulator.
12
SUBRATA DASGUPTA
s1:
C: R(1) = The interconnection network I is reliable. A: Formal Description of I in S*M R : FD(1) A FT(1) where:
FD(1) = I is fault-diagnosable FT(1) = I is fault-tolerant V : formal and empirical methods
P: uuliduted
FIG.3. A plausibility statement.
In addition, each characteristic or feature appearing in a design is explicitly and completely defined by means of a construct called a plausibility statement which describes the (extent of the) plausibility of the feature and the nature of the evidence used to demonstrate or justify this plausibility. Plausibility statements thus serve to record the questions of justification alluded to earlier. Figure 3 is one such plausibility statement. Here, the property of interest is R(I), the reliability of the interconnection network I. This is a “second order” property taking as an argument a “first order” property, or feature, I. This statement documents the fact that the property R(I), as a feature of the design, will have a high degree of plausibility (more strictly speaking, will be in “plausibility state” validated) if, using a combination of formal and empirical methods, it can be shown there is evidence for “FD(1) and FT(1)” to be true. This statement thus documents why we may have confidence in R(I) as a component of the design. The third type of representation is used to show the interdependencies between the plausibilities of properties. For instance, the plausibility of R(I) depends, in turn, on the plausibilities of FD(1) and FT(1).This state of affairs is described explicitly by a plausibility dependency graph (Fig. 4). In general, a design at any stage can be represented by such a graph. Example 2.9 Liskov and Guttag( 1986)discuss an approach to program development in which a design is represented using a combination of an abstract (but formal) specification notation which expresses the functional characteristics of the target system, dependency diagrams which show dependencies between program modules, and descriptions of performance (or efficiency)constraints to be met by the system. Examples of efficiency constraints are:
worst case time = O(length(n) x length(n)), temporary space used Ilength(n),
THE STRUCTURE OF DESIGN PROCESSES
13
FIG.4. Plausibility dependency graph.
proc A
FIG.5 . Module dependency diagram.
where n is a character string. Figure 5 shows a module dependency diagram. This indicates that procedure A is dependent on (or uses) procedure B and abstract data type D in the sense that A calls B and uses one or more of the operations of D. B is also dependent on D.
14
SUBRATA DASGUPTA
2.4 The Satisficirig Nature of Design Processes
Consider the computer architect who, given a particular set of (initial) requirements, is to design an exo-architecture that meets these requirements’. The principal components of the exo-architecture are the following: (a) Storage organization: the types and organization of programmable storage. (b) Data types: definition and characterization of the data types and their representation in storage. (c) Addressing modes: the different methods of specifying the addresses of storage objects (d) Instruction set: specification of the syntax and semantics of the instructions. (e) Instruction formats: representation and encoding of instructions in storage. (f) Exception conditions: specifications of faults, traps, and interrupts, and their consequences. The problem in designing exo-architectures is that the components interact with one another in the sense that a set of decisions concerning one component will influence the design of, or design choices for, some other components. For instance, ( i ) The choice of a storage organization (including the word length and the unit of addressability) is influenced by the specific data types and their representations. The converse is also true. (ii) The design of instruction formats will be directly affected by the design of the instructions, data types, and addressing modes. (iii) There is an obvious close relationship between data types and the instruction set: the range and composition of the latter will depend on the composition of the former, and vice versa.
Furthermore, these relationships do not follow, or cannot be captured by, a neat mathematical equation such that one can quantitatively predict the effect of variation of one component on another. The computer architect, in other words, can rarely (if at all) make optimum decisions in the course of design. A computer’s rxo-architecture is the total structure and behavior of the computer as seen by the assembly language programmer, operating systems designer or compiler writer (Dasgupta, 1984, 1988a. 1988b). Other terms used to refer to this abstraction level are “instruction set processor architecture”(Siewiorek, Bell, and Newell, 1982) and the “conventional machine level” (Tdnenbaum, 1984).
THE STRUCTURE OF DESIGN PROCESSES
15
The aforementioned issue is typical of most design problems. In general, the process of design involves making decisions that are characteristically of the form 1. Given a number of interacting objectives or goals, how should these
goals be prioritized? That is, how should one order decisions concerning a number of interacting components when the nature of the interactions is known only qualitatively or imprecisely? 2. Given a number of alternative choices (say, for a component of a system) that are equivalent-in terms of function, performance, and/or costwhich of these choices should be made? As noted in Section 2.2.2 there are limits or bounds to the rationality that the designer can bring to a design problem. This is largely due to the complexity of the design problem and the designer’s imperfect knowledge of the long-term consequences of design decisions. The notion of bounded rationality led Simon (1981) to suggest that for design problems of any ressonable complexity, one must be content with “good” rather than “best” solutions. That is, the designer sets some criterion of sarisjactoriness for the design problem and if the design meets the criterion, the design problem is considered (even if only temporarily) to be solved. In Simon’s terms, design problem solving (and many other types of problem solving) are safisjcing procedures in that they produce satisfactory rather than optimal solutions.
2.4.1
The Exponential Nature of Well-Structured Design Problems
It may be remarked that the use of computer-aided design (CAD) or design automation alleviates to some extent the problem of bounded rationality since such tools facilitate the use of systematic or algorithmic means for solving design problems. However, even if the design problem is sufficiently wellstructured as to lend itself to algorithmic procedures, there are inherent limits to the practical attainment of optimal solutions. For, as is well known, most of the interesting optimization problems encountered in design require algorithms that are known to require exponential computation time (or space). That is, the time (or space) required to arrive at an optimal solution is O ( k “ ) where k is a constant and n a parameter characterizing the size of the design problem’. The very high computational cost of arriving at an optimal solution More exactly, such problems are said to be NP-hard or NP-complete (Horowitz and Sahni, 1978).
16
SUBRATA DASGUPTA
Microprogram written in a high level microprogramming language L
Compiler : phase 1
”Vertical” or ”Sequential” Microcode
Compiler : phase 2 (optimizer/compactor)
I
”Horizontal;’ microcode for micromachine H
FIG.6. Structure of an automatic microcode generator.
(especially for large values of n) would be sufficient to discourage the designer (even with the help of automated design tools) to seek solutions that are guaranteed to be optimal.
Example 2.10 One of the most widely studied problems in microprogrammed computer design is that of the automated generation of horizontal microcode from a high-level abstract description of the microprogram (Fig. 6 ) (Dasgupta and Shriver, 1985; Dasgupta, 1988b, Chapter 5; Mueller and Varghese, 1985). Here, the basic problem is, given a sequence of microoperations (“vertical” microcode) S
=
mlm2*9’m,,
possibly produced as the intermediate output of a microcode compiler, to generate a sequence of horizontal micro-instructions H
= I,12...Ik,
where each lj in H encodes or represents a subset of the micro-operations in S
THE STRUCTURE OF DESIGN PROCESSES
17
such that (a) each mi in S appears in exactly one lj in H ; (b) the data dependencies between the operations in S are preserved in H lo; (c) there are no conflicts in the usage of functional units between the micro-operations within the micro-instructions; and (d) the number of micro-instructions in H is minimized. This is a well-structured design automation problem for which an optimal solution (satisfying (d)) demands exponential time. Thus various algorithms have been developed that satisfyconditions(a),(b) and (c)but do not guarantee that (d) is met. All these algorithms run in polynomial time (that is, are of the complexity O(n’) where n is the size of the original microprogram S and k is a low integer constant) but are satisficing algorithms. 2.5
The Evolutionary Nature of Design Processes
To summarize, the main implication of bounded rationality is that the ramifications of a design decision, or a particular chain of such decisions, cannot always be comprehended at the time the decisions are taken. Decisions may also have wholly unintended side effects. Furthermore, since requirements may be imprecise, incomplete or both (see Section 2.2),the designer may not be able to demonstrate exactly that a particular design meets the requirements, or may have to generate requirements as part of the design process itself. Under such circumstances, a design process can be usefully viewed as an evolutionary process and the design itself, at any stage of its development (including the stage at which it is held to be “complete”), as a tentative solution to the problem originally posed; that is, the design is the evolutionary offspring from an earlier design form, and is likely to evolve further in the future. Our use of the term “evolution” in this context is deliberate. Thus it is necessary to establish exactly in what sense the design process is considered to be evolutionary. In biology, “evolution” refers to the unfolding and changing of organisms across generations through natural means. A t the risk of oversimplification, the hallmarks of biological (Darwinian) evolution can be concisely stated as
“’
Let m i . m, be in S such that m, precedes m, in S . Then possible dutu dependencies between m i and m, ure ( i ) m, writes into a register/store which is read by mi ( i i ) m , reads a register/store which is written into by m, ( i i i ) mi,m, both write into a common register/store.
In the literature on pipelined architectures, these are also referred to as hazards and specifically as read-after-write, write-ajier-read, and write-ajier-write hazards respectively (Kogge, 198 I ; Dasgupta, 1988b).
18
SUBRATA DASGUPTA
follows (Ruse, 1986; Bendall, 1983; Maynard Smith, 1975): (a) Within any population of organisms of a given species, there is considerable uariation among individuals largely brought about by genetic factors. (b) In a given environment, the likelihood of an individual surviving to adulthood and reproducing successfully will be influenced by the particular genetic characteristics or traits of that individual. If it does survive sufficiently to reproduce, then the offspring will inherit the traits and increase the frequency of occurrence of these traits in the population. On the other hand, if the traits are such that the individual organism is unable to reach maturity and reproduce, then the traits will be less likely to perpetuate in the population. This process is termed natural selection. (c) By this process organisms are constantly tested against the environment, and those that are genetically endowed to survive and reproduce successfully may be said to be “fit” relative to the environment. If the environment changes then some forms of organisms within the population may become fit relative to that environment while others may die out. Thus, organisms appear to constantly adapt to its surroundings. Clearly, in the context of design, neither the concepts of variation nor natural selection make any sense. What are relevant, however, are the concepts of testing and adaptation in terms of the following features: ( i ) A t any stage of the design process, a design is a conjectural solution to
the design problem. Since the adequacy (or satisfactoriness) of a design is determined solely with respect to the requirements prevailing at that stage, the design must be critically tested against the available requirements. ( i i ) If the design is found to meet the requirements, there is a fit between design and requirements. The former is adapted to the latter. ( i i i ) In testing the design against the requirements, there may be found to be a misfit between the two. The causes of misfit may be many. It may be that the design is incomplete relative to the requirements, or that it is incorrect; the design may be in such a form that it cannot be shown whether or not the design satisfies the requirements-in which case the former may have to be redefined or specified in a different form; or it may be that the requirements are given in such a form that it cannot be shown whether or not the design satisfies the requirements-in which case the latter may have to be reformulated or new requirements may have to be generated.
THE STRUCTURE OF DESIGN PROCESSES
19
Whatever be the cause of the misfit the design (and the requirements) may have to be modified to eliminate or reduce the misfit, thus producing a new (design, requirements) pair. (iu) The design process may be brought to a halt when the design is found to be adapted to the requirements. However, the design process never jbrmally ends since for fixed requirements one can always improve a design to increase the degree of adaptation (i.e., attempt a “more” satisfactory design); or the requirements may change, in which case a misfit between design and requirements is regenerated. Thus, the design process may be depicted according to Fig. 7. The dashed lines indicate the influence of one component on another. For instance, requirements influence design; or the nature of the test is influenced by both the design and the requirements. As Fig. 7 shows, design is a continuous process involving a cycle beginning with a (design, requirements) pair (in which the design component may be the “null” design) and ending with a (design, requirements) pair. The cycles are characterized not only by the fact that the (design, requirements) pair changes; the character of the misfit may differ from one cycle to the next, and this results in differences in the nature of the misfit elimination in successive cycles. This process may come to a halt when adaption has been achieved or it may extend-accompanied by intermittent halts indicating temporary adaptations-over the entire life of the target system.
Misfit Elimination
Misfit Identification
FIG.7. Design as an evolutionary process.
20
SUBRATA DASGUPTA
2.5.1
On the Distinction between lteration and Evolution
A commonly held notion in the design literature is that design is an iterative process where an initial “crude” design is transformed by a successive sequence of cyclic steps into acceptable form (see, e.g., Encarnacao and Schlechtendahl, 1983, Section 3.1 Chapter 3; Dixon, 1986; Mostow, 1985;Rehak, Howard, and Sriram, 1985; Lawson, 1980, Chapter 3, for different versions of this idea). However, in the ordinary sense of the phrase, “to iterate” is “to repeat.” This meaning is carried over more or less intact into computer science, where iteration is depicted by some variants of the schema
while cond do PROCESS repeat PROCESS until cond
The important point to note here is that ( i ) The same condition “cond” is tested in every cycle of the iterative
process, indicating not only that a specific type of misfit is expected, but also that the nature of the misfit is known beforehand. ( i i ) Once the terminating condition is satisfied (i.e., “cond” is not true in the while form or is true in the repeat form) the iteration does actually terminate. However, in the case of the design process consider the following facts: (a) The nature of the cycle itself-specifically the nature of the misfit identification and the misfit elimination stages-may difler from one cycle to the next. (b) The nature of the misfit may be quite unknown in each cycle prior to its actual identification. (c) Both design and requirements may change from one cycle to the next. (d) The fit between some aspect of a design and some aspect of the requirements-a case of adaptation being achieved-may be disrupted at a later stage of the design process, producing a new misfit. Clearly, the notion of iteration (at least in its computational sense) is hopelessly inadequate to describe such a process. Thus, we suggest that the assertion “design is an evolutionary process” is technically a more accurate and insightful description than “design is an iterative process.” 2.5.2
Evidence of Evolution in Design
That the design process is evolutionary in nature (in the sense described above) is an empirical proposition for which evidence must be provided.
THESTRUCTUREOFDESIGNPROCESSES
21
Furthermore, this evidence must be wide ranging and compelling. Unfortunately, limitations of space confine us to only a few examples in this paper, and we must leave it to others to provide further evidence either corroborating or refuting the proposition. We strongly believe, however, that the following examples are sufficiently compelling as to serve as strong corroboration of the proposition. One of the most general and influential principles of program design is that of stepwise rejinement, first proposed as an explicit doctrine in the early 1970s by Dijkstra (1972) and Wirth (1971). This principle can be stated concisely, though informally, as follows: 1 . If the problem is so simple that its solution can be obviously expressed in a few lines of a programming language then the problem is solved. 2. Otherwise, decompose the problem into well-specified subproblems such that it can be shown that if each subproblem is solved correctly, and these are composed together in a specified manner, then the original problem will be solved correctly. 3. For each subproblem return to step 1. A more formal version of stepwise refinement is based on the idea of developing a proof of correctness and the program together, and allowing the proof of correctness to guide stepwise refinement and program development. This version of stepwise refinement, which we may call the design-while-verify approach (or simply DWV), has been widely studied in the programming domain (Mills, 1975; Dijkstra, 1976; Alagic and Arbib, 1978; Gries, 1981; Hoare, 1987) and has also been applied in the domains of computer architecture (Damm and Dohmen, 1987; Dasgupta, 1984), firmware development (Dasgupta and Wagner, 1984, Damn et a!., 1986) and hardware circuits (Gopalakrishnan, Smith and Srivas, 1985; Gopalakrishnan, Srivas and Smith, 1987). We shall demonstrate how DWV conforms to the evolutionary model of design with a trivial problem. The triviality of the example serves two purposes. Firstly, it provides a concise, well-structured problem admitting a concise solution and is thus appropriate as an example in this article. Secondly, it also serves to show how even the most trivial and well defined of programming problems naturally exhibits evolutionary characteristics.
Example 2.11 The problem is to develop a program (call it MAX) ( i ) in Pascal, and ( i i ) which, given two nonnegative integers A and B, sets a variable Z to the larger of the two values. Here ( i ) is an implementation requirement, R i , while ( i i ) is a functional requirement, R,. The total requirement is R = (Ri,R,>.
22
SUBRATA DASGUPTA
Using DWV, we may solve this problem as follows. (a) In DWV, the program design problem is initially formulated in terms of a pair of assertions expressed in predicate calculus, called the precondition (PRE) and postcondition (POST) respectively. The former is a predicate which is satisfied whenever the program begins execution, while the latter states a predicate which must be true when (and if) the program terminates. The relationship between PRE, POST and the program P is notationally expressed as {PRE} P {POST} which states that if PRE is true when P begins execution the POST will be true when P terminates. For the specific problem at hand, this formula becomes (D1)
{PRE: A 2 0
A
B 2 0)
MAX: “place the maximum of A and B in Z ” {POST: z 2 A
A
z2B
A
(z = A
V
z = B)}
(DI) thus constitutes the .first oersion of the design. It is, however, quite conjectural since we have not yet shown that (Dl) satisfies the requirements. (b) In applying DWV, one attempts to prove the correctness of (Dl).Now suppose MAX, i.e., the string
“place the maximum of A and B in 2” is a very high-level machine-executable statement in some programming language. In that case, a proof of correctness of (Dl)can be attempted and, if successful, then the design (D1) would certainly satisfy the requirement R,, but not the implementation requirement R isince the program in this case is not expressed in Pascal. Thus, the design (Dl) fails to meet the given requirements R = (Ri,Rr). The source of the misfit is that the program MAX is not stated in Pascal. (c) Let us assume for convenience that the requirement R is unchangeable. Thus, (Dl) must be modified so as to continue to satisfy R, and, in addition, meet Ri.In DWV one thus attempts to modify the program part of the design with components ( i ) that come closer to the goal of expressing the program in Pascal such that (ii) the resulting design remains functionally correct. This may be attempted by replacing MAX with the assignment statement M A X ‘ : Z : = max ( A , B)
where mux is a function that returns the largest of its two arguments. The
23
THESTRUCTUREOFDESIGNPROCESSES
resulting design is
(D2) [PRE: A 2 0 M A X ' : Z:=
A
B 2 0)
~ U . Y( A , B )
IPOST:Z2 A
Z r B
A
( Z = A v Z = B)}
A
(d) Once more it is found that there is a misfit between the current design (D2)and the implementation requirement R , ;the function mux is not available in the particular implementation of Pascal assumed here. Thus further modification of the design is required. (e) In the next transformation, M A X ' is replaced by M A X " : if A 2 B then Z : = A else Z : = B
thus producing the design
(D3) [ P R E : A ~ O BA 2 O ) M A X " : if A 2 B then Z:= A else Z : = B
(P0ST:Z2 A
A
Z 2B
(Z = A v Z = B))
A
In testing (D3) against the requirements, it is seen that R iis indeed satisfied; furthermore, using the proof rule for the i f . . .then.. .else statement and the axiom of assignment as these are defined for Pascal (Hoare and Wirth, 1973), (D3) can be proved correct. Thus, the design also satisfies Rf,and it may be terminated at this stage. The development of this (quite trivial) program can be schematically described by Fig. 8.
I
Design
D1
Misfit
Requirements
R
=
Design D2
Misfit -Dpzrfji
I
- Design D3
FIG. 8. Evolution of the MAX program development.
Fit between D3 and R
24
SUBRATA DASGUPTA
In Section 2.2.2 we remarked that an often encountered feature of the design process is that requirements are often poorly understood at the early stages of the design and thus the development, expansion or identification of the requirements become integral parts of the design process. This characteristic is explicitly reflected in the evolutionary view depicted in Fig. 7. The following example, based on Aguero’s (1 987) dissertation shows how evolution may work in the very first stages of a design process during which “design” almost entirely involves the development and refinement of the requirements set.
Example 2.12 The objective is the design of an exo-architecture (see footnote 8) such that (Ro.l)The instruction set (“inst-set”) is eficient. (&) The instruction set supports frequent, time consuming operations from a significant sample of high-level language programs (“HLLsample”).
An exo-architecture, it should be recalled from Section 2.4, has several components, notably the instruction set, data types, addressing modes, storage organization, and instruction formats. Let us denote the exo-architecture to be designed, X . The initial design D o is a description of X that says nothing about the components of X except that X as a whole satisfies a particular set of predicates. These predicates are defined as follows. (a) “Eff (inst-set)” is a predicate that is true if the instruction set is eficient. (b) “Oper-supported(inst-set, HLL-sample)” is a set of operations such that op is a member of this set if and only if ( i ) op occurs in at least one member of the HLL-sample; and ( i i ) op can be synthesized from (or generated in terms of) the members of inst-set. (c) “Sig(HLL-sample)” is a predicate that is true if HLL-sample is significant, that is, is taken from a wide variety of applications. (d) “Freq(op)”is a predicate that is true if op is an operation that occurs frequently in HLL-sample. (e) “Time-cons(op)” is a predicate that is true if op is time consuming. The initial design Do of X does not describe the components of X . Rather, it describes X solely in terms of the following property being satisfied: (PO)
Eff (inst-set)
A
Sig(HLL-sample)
A
[Vop E Oper-supported(inst-set, HLL-sample): Freq(op) A Time-cons(op)]
THE STRUCTURE OF DESIGN PROCESSES
25
That is, Do is the statement
X such that Po is true. Clearly, if Po is true then Do would automatically satisfy R , and the design would be complete. Unfortunately, in attempting to test the satisfiability of R , by Do we find that there is no evidence that Po is true, hence Do is a mere conjecture at this stage of the design process. The source of the misfit between Do and R , is, simply, that we have no way of determining whether Q(i) Eff (inst-set) is true or not since we neither have inst-set nor d o we know what it means for inst-set to be “efficient.” Q(ii) Sig(HLL-sample) is true or not since we d o not know how to interpret the phrase “a wide variety of applications.” Q(iii) Oper-supported(inst_set, HLL-sample) is true or not since we d o not have inst-set. Q(iv) Freq(op) is true or not since neither is o p identifiable nor d o we know what is means for an operation to be “frequently 0ccurring”in HLLsample. Q ( u ) Time-cons(op) is true or not since neither is o p identifiable nor d o we know what it means for an operation to be “time consuming.”
Thus, to eliminate the misfit between Do and R , , these questions must all be resolved. Consider the resolution of the first of the above questions, Q(i).We see that this can be factored into two components: Q(i)(a) Defining the predicate “Eff” more precisely. Q(i)(b) Constructing an instruction set “inst-set” such that we can determine whether “Eff (inst-set)” is true or not. Note that the predicate “Eff” is what we desire of the instruction set, hence its precise definition ((a) above) is nothing but the need to modify the original requirements R , ! For this purpose let ( i ) “HLL-benchmarks” denote a set of predetermined benchmark
programs. ( i i ) “Code(p, inst-set)” denote the object code generated when a program p written in a high-level language is compiled into instructions from
inst-set. (iii) LOWSIZE and LOWTIME denote some specific integers.
26
SUBRATA DASGUPTA
We also define ( i u ) Size(Code(p, inst-set)) as the size (in bytes) of Code(p, inst-set). (0) Exec-time(Code(p,
inst-set)) as the (simulated) execution time of
Code(p, inst-set). Using these definitions, the original requirements Ro,l is replaced by the new requirement
(R1.1) V p E HLL-benchmarks:
Size(C:ode(p, inst-set)) ILOWSIZE
A
Exec-time(Code(p, inst-set)) I LOWTIME We have thus resolved Q(i)(a) by modifying the original requirements Ro,,. Similarly, we can resolve Q(ii), Q(iii), Q(iu) and Q ( u ) in part by defining the respective predicates more precisely. This will enable us to replace R , , with a new set of more precise requirements, R,,,, thereby completely replacing R , with R , . This will complete one cycle of design evolution in which we have not made much progress in the “design” itself, but have evolved the requirements (Fig. 9). As a final example of the evidence of design evolution, we refer to the extensive and systematic macrostudies of long-term program euolution conducted over a decade by Lehman and Belady (Lehman, 1980a, 1980b; Lehman and Belady, 1985; Belady and Lehman, 1976, 1979).
Example 2.13 As the empirical basis of their work, Lehman and Belady studied the changes in size and complexity of several large programs across a succession of versions or releases. Specific examples of such programs (Belady
4 DO Initial
4 Misflt 4
Detected
5
Replace RO.l by R1.l
1
Replace R0.2 by R1.2
----L
R1 New
THE STRUCTURE OF DESIGN PROCESSES
27
and Lehman, 1979) were
( i ) The IBM OS/360 operating system, consisting (at its final release) of approximately 3.5 million statements and over 6000 modules, involving 21 releases over 6 years. (ii) The IBM DOS/360 operating system consisting of approximately 900,000 statements and over 2000 modules, and involving 27 releases over 6 years. (iii) A banking system consisting of about 45,000 statements and involving 10 releases over a period of 3 years. In studying the pattern of changes across successive program releases, several global measures of size and complexity were used. These included (a) (b) (c) (d)
the actual number of source statements in the program the number of modules comprising the program the average number of instructions per module the number of modules that required change between successive releases.
The main results of these quantitative studies are summarized by Lehman and Belady in their three qualitative laws qf' proyrum euolution dynamics (Lehman, 1974; Belady and Lehman, 1976)":
I. Law oj Continuing Chunye A system that is used undergoes continuous change until it becomes more economical to replace it by a new or restructured system. 11. Law of Increusing Entropy ( Unstructuredness) The entropy of a system increases with time unless specific work is executed to maintain or reduce it. 111. Luw of Stutisticully Smooth Growth Growth trend measures of global system attributes may appear stochastic locally in time or space but are self-regulating and statistically smooth. 2.5.3 Ontogenic and Phylogenic Evolutions
Note that both Examples 2.1 1 and 2.12 refer to evolutionary processes that are highly loculized in time. Evolution takes place from the time when a ' I Later (Lehman, 1980b) two more laws were added. These are not however significant to our discussion.
28
SUBRATA DASGUPTA
problem is given to the designer to the time when the design is passed on to the manufacturer or implementer. During this period the design, as it were, unfolds from the initial form to the desired form. Borrowing a term from biology, we have previously (Dasgupta, 1988a, Chapter 3) referred to this as ontogenic design eoolution12. In contrast, Example 2.13 describes a form of evolution occurring over much longer time spans and representing the course of change of an implemented design. This form of evolution usually reflects changes in one or more of the various parameters that had dictated the original designe.g., change in technology, emergence of new modes of manufacture, changes in the operating environment of the system in question, or the discovery on the part of the user of different purposes for the system than those previously assumed. Such parameter changes produce new problems-that is, new requirements. Again borrowing a biological term, we refer to this type of evolution as phyloyenic design euolution (Dasgupta, 1988a, Chapter 3)13. Regardless of whether ontogenic or phylogenic evolution takes place, the means of evolution is identical: the use of critical tests, the identification of misfit, and the elimination of misfit (Fig. 7).
2.6 Summary To summarize Section 2, we have identified five fundamental characteristics of design: (a) The act of design originates in the desire to change some particular state of affairs. By implication the design act and its product are valuedependent. (b) Any design process must begin with some requirements. However, the requirements may initially be neither precise nor complete. Thus the development or elaboration of requirements may be an integral component of the design process. (c) The output of a design act is an explicit representation of the target system in some symbolic language. This representation not only provides the basis for implementing the system, it is also the medium of analysis and criticism of, and experimentation with, the design. (d) Design problems are usually complex in the sense that their solutions require many interdependent and intertwined decisions to be made. Consequently, design processes are very often satisficing procedures. Ontogeny: “the life history of an individual, both embryonic and postnatal” (Gould, 1977, p.483). l 3 Phylogeny: “the evolutionary history of a lineage conventionally.. .depicted as a sequence of successive adult stages” (Could, 1977, p.484).
’
THE STRUCTURE OF DESIGN PROCESSES
29
(e) The design process is an evolutionary process in which the design and/or the requirements are continually modified so as to be mutually adapted. When adaptation is achieved, the design process terminates. It may, however, resume if for some reason a state of misfit between the design and the requirements re-surfaces.
At this stage the following two questions may be posed: ( i ) To what extent or in what manner do actual design methods take into
account these fundamental characteristics? ( i i ) Does the value-dependent, satisficing and contingent nature of design
decision making impose a fundamental barrier on attempts to construct ,formal or scientific design methods? We will attempt to address these questions in the rest of this article. 3.
3.1
Design Paradigms
Some Terminological Clarifications
A design method is an explicitly prescribed procedure or set of rules which can be followed by the designer in order to produce a design. There are several reasons why design methods are at all proposed or invented: (a) Such a method when followed by an organization serves as a standard procedure that can be used by all designers within the organization, thereby easing the problems of understanding, checking, criticizing and modifying the design within the organization. Furthermore, the use of an established and familiar design method helps to economize on the cost and time of design. (b) A design method serves as a filter in the following sense: it represents a selection from the characteristics and complexities attending design, a small, specific, and mentally manageable set of concepts that can be used to impose an order on and to guide the process of actual design. All other aspects of the design situation-not stated explicitly in the design method-are effectively suppressed. In other words, a design method is a management tool to cope with the complexities of design. (c) A design method may also serve to embody a particular model or theory of the design process. That is, the method serves as a concrete and practical embodiment of that particular theory. If the method is successful as a practical design tool, the theory may then be said to have been empirically corroborated.
30
SUBRATA DASGUPTA
It is necessary at this stage to make a clear distinction between “method” and “methodology.” They do not mean the same thing. Methodology refers to the study and description of the methods or procedures used in some activity, including the investigation of the broad aims and principles attending such methods. Thus, one talks of the methodology of science; correspondingly, design methodology is the discipline or study of design methods (which happens to be the subject of this article). Regrettably, many writers use “methodology” and “method” s y n o n y m ~ u s l y ~ ~ . Our concern in Section 3 is not to describe specific design methods since these are innumerable. Instead, we shall focus on a small set of significant design p ~ r a d i g m s ’ ~A. design paradigm ( i ) may be viewed as a particular philosophical approach to actual design that ( i i ) serves as a useful abstraction of, or a schema for, a family of similar design methods. In discussing these paradigms we shall also address the first of the questions posed at the end of Section 2, uiz., to what extent and in what manner do these paradigms reflect or embody the characteristics of the design process enunciated in Section 2. 3.2 The Analysis-Synthesis-Evaluation Paradigm
The most widely accepted design paradigm takes the general form shown in Fig. 10. We shall refer to this as the A S E paradigm. Given a set of requirements, the design process involves first a stage of analysis(of the requirements), then one or more stages of synthesis, followed by a stage of eualuation16. As Fig. 10 suggests, on the basis of some decisions made within a stage or problem encountered in that stage, the designer may return to an earlier stage for appropriate modification or clarification. Thus, the ASE paradigm does recognize the evolutionary nature of the design process (see Section 2.5). In very broad terms, the ASE paradigm is not objectionable. It is certainly true that the actiuities of analysis, synthesis and evaluation are performed during design. If, however, one were to accept this paradigm literally, then
l4 Or a s the Fonfuria Dicrionary of M o d r r n Thoughf (Bullock and Stallybrass, 1977) wryly notes, “some scientists use the word [methodology] as a more impressive-sounding synonym for method.” In this article, we shall use the word “paradigm” in two different ways. When we talk of a design paradigm (as in this section) we use the word in its dictionary sense to mean a pattern or archetype exhibiting the characteristics stated in (i) and ( i i ) above. Later, in Section 4, we shall introduce a technically more distinct sense of the word due to Kuhn (1970) for which we shall reserve the special term Kuhnian paradigm (or K-paradigm for short). The word “paradigm” by itself will be used to mean the design or the Kuhnian variety when the context or reference is quite clear. l 6 The most explicit statement of this paradigm is given in Jones (1963). For other versions see Middendorf (1986) and Rehak, Howard and Sriram (1985).
’’
THE STRUCTURE OF DESIGN PROCESSES
f
IDENTIFICATIONOF REQUIREMENTS
I
31
-
P
Analvsis of
Detailed Synthesis
f
Evaluation and Testing
TO IMPLEMENTATION
FIG. 10. The ASE paradigm.
several serious problems arise:
(PI) A n unstated but unmistakable implication of the paradigm is that requirements are ( i ) either well-defined at the beginning of the design or (ii) by virtue of the feedback edges, can be obtained by interrupting a later stage and returning to the requirements identification or analysis stages. But as we have already noted in Section 2.2, in many design problems the initial requirements may be vague, sparse or both, and that consequently the generation of new requirements is an integral and inseparable part of the design process, not a distinct stage as implied in the ASE paradigm”. in their well known paper, Rittel and Webber (1983) summarize this situation with the statement “problem understanding and problem resolution are concomitant to each other”.
32
SUBRATA DASGUPTA
(P2) The ASE paradigm strongly suggests, or is based upon, the notion that synthesis emerges from the requirements and only from the requirements. From a logical perspective this is a faithful rendering of the inductive model of the scientific method which states that “Firsr make observations and gather the data or facts; then construct a theory or law as a generalization of these facts.”
Translated into design methodological terms, we obtain “First accumulate all the requirements that the target system is to satisfy; then synthesize the system to meet the requirements”I8.
We have already noted in Section 2.2 and in (Pl) above, the difficulty of accumulating all the requirements (even for a subsystem) at the start of the design process. We now have the additional problem that the inductive principle implies that the designer’s mind is in an empty state (a tabula rasa) on which requirements impinge like sense impressions and, as a consequence, a design somehow emerges. How this feat of generalization has come aboutthat is, what is the logic of the requirements-to-synthesis step-has never been explained. (P3) The ASE paradigm ignores or leaves unstated the central role that the designer’s weltanschauung or “world view” plays in the design process. By this, we refer to the assumptions, facts, values, theories, heuristics and other rules that the designer possesses and which he or she brings to bear when posed a design problem. When faced with a set of requirements-that is, a design problem-the designer may be influenced by his world view in many ways. For instance, he may perceive an analogy between this particular problem and one for which some design solution is already known to exist, thus suggesting the initial design as an initial tentative solution. Or, based on a few key components of the requirements, the designer may be predisposed towards a particular design style for the system under consideration, which then forms the basis of the overall design (Simon, 1975; Dasgupta 1984). Darke (1979),in the context of building design, called this a “primary generator.” In any case, the designer interprets the requirements according to a particular world viewI9. Is A recent statement of the “engineering method” almost exactly along these lines is given in Middendorf (1986, p.3). l9 It may be noted that this conclusion is similar to and consistent with an aphorism well known to philosophers of science that “all observations are theory laden” (Hanson, 1972; Popper, 1968). We thus see a first connection between the methodology of design and the methodology of science. More will be said about this connection in Section 4.
33
THE STRUCTURE OF DESIGN PROCESSES
(P4) Finally, the ASE paradigm says virtually nothing about redesignthat is, modifying an existing design due to the appearance of a new misfit between the design and requirements. The cause of this misfit may be desired changes in functional or performance requirements, the need to exploit new technology, or the pressure to reduce cost (IEEE, 1987). The principal distinction between what are conventionally termed “design” and “redesign” is that the latter begins with some existing design of the system that has already been implemented and is operational (either as a prototype or “in the field”). Thus, the redesign process is constrained by (some part of) a “current” design (meeting a “current” set of requirements) as well as a “new” set of requirements that must be met. Note that the ASE paradigm, taken literally, does not take account of an existing design as an input to the design process. Example 3.14 In software engineering, the standard (though not the only) model of the software development life cycle is the “waterfall” model (Boehm, 198 1; Wiener and Sincovec, 1984; Sommerville, 1985; Ramamoorthy et al., 1987) which in its essence is depicted in Fig. 11. It will be noted that the nature of software development as conceived here conforms quite well with the ASE paradigm and thus suffers most of the shortcomings of the latter. One exception is that the waterfall model recognizes redesign. Note, however, that redesign appears here as a distinct activity called maintenance as if this is apart from, and possesses characteristics distinct to, the other activities of the life cycle. As we have already noted before, design-including the development of software systems-is a continuous evolutionary activity in which there is no real distinction to be made between what are termed in Fig. 1 1 as “design” and “maintenance.” Rather, keeping in mind Lehman and Belady’s studies and the evolutionary discussions above, software development is fundamentally a
Analysis &
Implementation & Unit Test c L
Conceptual Design
Detailed Design
Integration & System Test c r
s
Maintenance
34
SUBRATA DASGUPTA
Requirements
Requirements
t---4
Requirements
t---4
.....
..... Software
I
Software
Software
FIG. 12. Long term (phylogenic)evolution of software.
process of phylogenic evolution "in the large," the individual stages of which follow the process of ontogenic evolution. Figure 12 schematizes this situation. Over time, requirements change and the software system evolves (through design) to adapt to the new requirements. Each box thus designates a fit between requirements and the software that is brought about by a process (internal to each box) of the short term or ontogenic evolution. 3.3 The Artificial Intelligence Paradigm Within the past decade or so, the ASE paradigm has begun to be influenced and modified by ideas emanating from Artificial Intelligence (AI) (Simon, 1975,1981; Mostow 1985). Perhaps the most concrete embodiment of the A1 perspective is the recent emergence of the so-called expert systems for design (Latombe, 1978; Gero, 1985; Thomas, 1985; Hong, 1986). However, our concern in this section is not such automatic design systems but the design paradigm that the A1 perspective entails. We shall call this, then, the AZ design paradigm. From the A1 perspective, problems are solved by creating a symbolic representation of the problem, called the problem space, such that it can describe the initial problem state, the goal state, the constraints, and all other states that may be reached or considered in attempting to reach the goal state from the initial state. 'Transitions from one state to another are affected by applying one of a finite set of operators (that are also contained in the definition of the problem space). The result of applying a sequence of operators is, in effect, to conduct a search for a solution through the problem space (Langley et al., 1987). The A1 paradigm begins by recognizing that design problems are often, as pointed out in Section 2.2, ill-structured problems. By this it is meant (among other things) that there may be no definite, objective criterion for determining whether a proposed solution meets all the requirements (see Example 2.5). Another characteristic of ill-structured problems is that the problem space
THE STRUCTURE OF DESIGN PROCESSES
35
itself may not be completely definable because of the virtually unbounded range of state-space variables and operators that have to be considered (Simon, 1973). Example 3.15 Consider the problem of designing a uniprocessor computer. A complete definition of the problem space would have to consider (a) All conceivable choices of instruction formats, data types and instruction types. (For example, fixed and variable length instructions, fixed and variable format instructions, the use of expanding opcodes, the choice of frequency-based opcode lengths, etc.) (b) All conceivable types of components and structures for designing the computer’s internal architecture or endo-architecture (Dasgupta, 1984). (For example, single bus, multiple bus, and distributed interconnection structures, alternative forms of pipelining, various cache memory organizations, different styles for microprogrammed and hardwired control units, etc.) (c) All possible technological choices. (For example, TTL, ECL, NMOS, CMOS, customized chips, semi-customized techniques, etc.) Clearly, any attempt to define a comprehensive or complete problem space in this case is doomed to failure. The A1 paradigm for solving such ill-structured problems involves a schema of the type shown in Fig. 13. The designer interacts both with the current design problem space (consisting of goals, constraints, the current state of the design, etc.), and a knowledge base (KB). The latter may be a combination of the designer’s long-term memory, textbooks, a computer data base, etc.
a: Designer
Current Design Space
FIG. 13. Ill-structured (design) problem solving schema (after Simon (1973)).
36
SUBRATA DASGUPTA
Given the initial goals and constraints (which may be sparse, fuzzy or “highlevel”)in the current design space, the designer invokes (from the KB) an initial design. This may be as approximate as a design style (Simon, 1975; Dasgupta, 1984,Chapter 12) or a crude description of the highest-level set of components that the design will have (e.g., in the case of computer design, the initial components may simply be the components “processor,” “memory” and “processor-memory’’ interface). Design then proceeds by successively transforming the “current” design state using, at every stage, some feature or component of the current state to invoke an appropriate action (selected from the KB) thereby refining a part of the design. At the very heart of the A1 paradigm is the means used to converge effectively to an acceptable design. As mentioned previously, the design state transformations are effected by operators. In most design domains these operators take the form ofheuristics which are invoked so as to reduce the amount of search through the design problem space. Heuristics may be very specijic to the particular design or task domain, or they may be general-i.e., independent of a particular design problem and therefore applicable across a range of domains. Generally speaking, during the initial stage of the design process one may expect to use general heuristics; as the design becomes more detailed and the nature of the decisions more specific, domain-specific heuristics may be expected to be invoked. Alternatively, general and domainspecific heuristics may be used in a symbiotic fashion. The so-called (and unhappily named) expert systems are instances of automatic problem solving (including design problem solving) systems that rely very heavily on domainspecific heuristics (“expert-knowledge”).
Example 3.16 A well known instance of a general heuristic is meansends analysis (Newel1and Simon, 1972, p.416). Given a current problem space and a desired (or goal) state, the difference between the two is determined. A specific operator is then invoked to help remoue or reduce this difference. Of course, in a particular design or problem solving domain, some of the operators may themselves be domain-specific. As a specific example, consider a microcode synthesis system of the type described by Mueller and Varghese (1985) which, given a microprogram represented in an abstract form, automatically produces a functionally equivalent executable form (see also Example 2.10 and Fig. 6)”. At any stage of the synthesis system’s operation, the current state will consist of the executable microcode that has already been synthesized (or “designed”) by the system. It is thus a partial design of the executable ’‘Such a microcode generation system is thus an example of a computer-aided or automatic design system.
THE STRUCTURE OF DESIGN PROCESSES
37
microcode. Suppose this current state is such that the execution of the microcode “designed” thus far would leave the micromachine registers with values satisfying (at least) the condition R1
=a A
R2
=
b.
Here, R 1, R 2 are two micromachine registers and a, b are symbolic constants. For the immediate purpose of the synthesis system, (CS,) can be viewed as the current state. Suppose further that the new goal of the synthesis system is to produce more microcode (i.e., further develop the microprogram “design”) which, starting with (CS,) being satisfied, when executed will leave the machine registers satisfying (at least) the condition RI = a + b .
(GSO)
For the immediate purpose of the synthesis system, (GS,) can thus be viewed as the goal state. The synthesis system may use means-ends analysis as a general heuristic to guide the search through its knowledge base. According to (GS,), the desired sum “a + b” must be assigned to register R1. Given that in the particular micromachine under consideration the arithmetic-logic unit performs all additions, taking its inputs from two special registers A I L and A I R and leaving its result in a special register A O U T , the system may attempt to reduce the diflirence between (CS,) and (GS,) by invoking the following rule: If the sum of two registers x and y is to be placed in a register Reg then generate the micro-operation sequence; AOUT Reg
+AIL e
+ AIR;
AOUT
and produce as the new goal: A I L = x A AIR
=y
Applying this operator, (with the arguments R1, a, and b) the system produces the microoperation sequence WSO)
AOUT
e
AIL
+ AIR;
R1+ AOUT
which, when executed with A I L
=
a, AIR = b will produce goal state CS,.
38
SUBRATA DASGUPTA
The resulting new goal state produced is (GS,)
AIL
=a
A AIR = b
The synthesis system would now attempt to reduce or eliminate the difference between (CS,,) and ( G S , ) . In this example, then, the general heuristic used is means-ends analysis. The operator selected to reduce the difference between current and goal states (in this case, by transforming the goal state) is a rule (R)which itself is a heuristic that is quite specific to the task domain-i.e., the domain of microcode generation for a particular micromachine2'. The A1 paradigm provides several insights on the design process. The most important are the following: (a) That design involves (or may involve) a search through some problem space and, as in most search activities, a particular state of design (i.e., a particular state reached within the problem space) may be merely tentutive and liable to be rejected at some later point in the search. (b) The amount of search through a potentially unbounded problem space is reduced to practical levels by the use of general and/or domainspecific heuristics. (c) The role of the designer's world view (in the form of a knowledge base) as an influence on the design process. (d) The fact that the A1 paradigm provides a theory of how the requirements-to-synthesis step (see Section 3.2 and Fig. 10) can be brought about. According to the A1 paradigm, the logic of design is a special case of the logic of human problem solving (Newell and Simon, 1972), and involves the application of heuristics. These heuristics not only help to reduce the amount of search (see (b) above); they are the means to transform the initial problem state into the desired problem state. 3.4 The Algorithmic Approach While the most interesting design problems are largely ill-structured, some of their subproblems or components may, in fact, turn out to be well-structured. This means, essentially, that the requirements and constraints are well-defined, the problem space is bounded and there are definite objective criteria for We are not, by the way, implying that the Mueller-Vdrghese synthesis syslem actually uses the specific operator (R); however, the general nature of their rules is quite similar (Mueller and Varghese, 1985).
THE STRUCTURE OF DESIGN PROCESSES
39
determining whether a design meets the requirements. Indeed, such problems may be so well-structured that one may conceivably construct algorithms for their solutions. Thus, for a relatively privileged class of design problems it make sense to talk of an algorithmic design paradigm.
Example 3.17 While it is incontrovertible that the design of an entire computer poses an ill-structured problem, certain of its subproblems are, at least in principle, candidates for algorithmic solutions. A specific instance is the design of an optimal minimally encoded micro-instruction organization for a computer’s microprogram memory (Dasgupta, 1979). A precise specification of this problem can be formulated as follows: (a) Let M be the set of micro-operations for the computer that is being designed. (b) Let C, denote a subset of M such that any pair of micro-operations in C, cannot be executed in parallel. Call C, a compatible set. (c) Let C = ( C , , C 2 ,. . . , C,} be a set of such compatible sets such that each micro-operation in M appears in exactly one of the compatible sets. In general, there may be many such sets of compatible sets. (d) Let IC,I denote the number of micro-operations in a compatible set C,. Each C, can thus be encoded in a single field F, of the micro-instruction using
bits. (This takes into account the necessity of being able to encode uniquely each of the micro-operations in Ci and also the encoding of the “nooperation” condition.) The total length of the microinstruction would be k
bits. The problem, then, is to determine a set C of compatible sets such that B is a minimum. Clearly, by definition, the algorithmic design paradigm is also a paradigm for automated designz2. Thus, for example, one can construct a program which when executed would “solve” the above design problem. Unfortunately, as previously noted in Section 2.4.1, most of the interesting well-structured design problems require algorithms that are known to consume exponential computational time. The very high cost of solving these problems algorithmically has resulted in more heuristic search-based approaches such as is represented by the A1 paradigm.
2 2 Note, however that it is by no means the only instance of such a paradigm. The A1 paradigm is also quite specifically developed for automation.
40
SUBRATA DASGUPTA
3.5 The Formal Design Paradigm Two decades ago, seminal papers by Floyd (1967)and Hoare (1969)marked the beginning of a formalist school of programming (Dijkstra, 1976; deBakker, 1980; Gries, 1981). The aims of this school are most succinctly captured by the following propositions recently enunciated by Hoare (1986): (a) Computers are mathematical machines. That is, their behavior can be defined with mathematical precision, and every detail can be deduced from this definition by the laws of logic. (b) Programs are mathematical expressions. They describe precisely and in detail the intended (and unintended) behavior of the computer on which they are executed. (c) A programming language is a mathematical theory. It includes notation, concepts, axioms, and theorems which assist the programmer in both developing a program and proving that the program meets its specification. (d) Programming is a mathematical actioity. Its practice requires careful and systematic application of traditional methods of mathematical understanding and proof techniques. The reader has, in fact, already been introduced to the formal approach by way of Example 2.1 1 (Section 2.5.2). According to the formalist, a programmer’s task begins with a pair of assertions stated in a formal language such as the first-order predicate calculus. One of these, the precondition (“PRE”), is a predicate that will always be true when the program P begins execution. This corresponds, then, to the concept of the initial state in the A1 paradigm. The other assertion (corresponding to the goal state in the A1 paradigm), called the postcondition (“POST”), is a predicate that will always be true when P completes execution. The task of the programmer is to design and develop a program P in a particular programming language L such that the formula (HF)
{PRE) P {POST} is true. This formula, which may be termed the Hoare formula, states that if PRE is true to begin with, then the execution of P (providing P terminates) results in POST being true. Furthermore-and this is the key to the formalist’s approach-by proposition (b) above, P i s a mathematical expression (as are PRE and POST, by definition), and by proposition (c), P is expressed in a language L satisfying mathematical laws. Thus the truth of (HF) can be rigorously and logically proved using the laws of L .
41
THE STRUCTURE OF DESIGN PROCESSES
{PREo
I
I I I I I
:
X>O A Y>O}
I
Po :
I I I I I
begin
z :=o; u :=x;
I I I
I I
repeat
z :=Z+Y; U:=U-l
( POST ,:
z
=x
I I I
I I I
x Y}
FIG.14. A Hoare formula for program Po
The idea of programming as a mathematical activity is, for very obvious reasons, strongly appealing-indeed, to such an extent that the influence of the formalist school has reached beyond programming to several other branches of complex systems design, including circuit and logic design (Borrione, 1987),firmware development (Dasgupta and Wagner, 1984; Damm et ul., 1986) and certain aspects of computer architecture design (Dasgupta, 1984). In the general context of computer systems, we shall refer to this approach as the formal design (FD) purudigm.
Example 3.18 Suppose we have constructed a program Po in a Pascallike language L’ that computes the product of two positive integers by repeated addition. Figure 14 is a Hoare formula (call it HF,) for this program. To prove the correctness of this formula requires treating the language L’ as a mathematical theory (proposition (c) above). Figure 15 shows part of such a mathematical theory in the form of a set of axioms and rules of inference for L’ (Hoare and Wirth, 1973). The fundamental axiom is the axiom of assignment. Let P be an assertion and let P [ X j E ] denote P with all free occurrences of X in P replaced by E. Then the axiom of assignment (which itself is a Hoare formula) simply states that if P is the postcondition of the assignment X : = E (where E is an expression) then its precondition will be P [ X / E ] .We can, for instance, use this axiom to prove that the formula ( X 2 0 } X : = X + 1 { X 2 1)
is true by working hackwurd from the postcondition “ X 2 1.”
42
SUBRATA DASGUPTA
Axiom of Assignment :
{P [ X / E ] X } :=E ( P }
Rules of consequence :
(i)
{
(ii) {
P } S { R } ,R > Q IP)S{Q1
P A B } S { Q } ,P A - B > & { P }V B t h e n S { Q )
P A B } s{ P}
Iteration Rules :
{ P }w i l e B do S { P A - B } (ii)
{ P ) S { Q } ,& A - B > P { p ) repeat S until B ( Q AB }
FIG. 15. The basic axiom and proof rules for Pascal.
The remainingconstituents of Fig. 15 are rules qf inference(als0 called proof rules) which enable the designer to infer the correctness of one formula from the correctness of other formulas. Rules of inference are of the general form H , , H , , .... Hn H which states that if the premises (or untecedents) H , , H , , . . . , H,, are true then the conclusion (or cmseyuence) H is also true. Here H , , H , , . . . , H,,are either assertions (in the predicate calculus) or Hoare formulas, while H is a Hoare formula. As an example, consider the Hoare formula (obtained in Example 2.1 1, Section 2.5.2) (HF') (PRE:A 2 0 A B 2 0 ) M A X : if A 2 B then Z : = A else Z:= B { P O S T :Z 2 A A 2 2 B A (2 = A V 2
=
B}
To prove ( H F ' ) we need to apply the axiom of assignment and the proof rule for the if.. .then.. .else statement shown in Fig. 15. More precisely, to prove
THE STRUCTURE OF DESIGN PROCESSES
43
( H F ’ ) requires us to show that the formulas (F1) ( P R E A A 2 B) Z : = A ( P O S T )
(f-2) { P R E A A < B) Z : = B ( P O S T }
are true. Now starting with POST and applying the axiom of assignment backwards to Z:=A produces, as a precondition, A 2 B-that is, proves the correctness of the formula {A2
B )Z : = A ( P O S T ; .
Since PRE A A 2 B implies A 2 B, by the second rule of consequence (Fig. 15), it follows that (FI)is true. Usinga similar argument it can be shown that ( F 2 ) is true. Hence, by the proof rule for the if.. .then.. .else statement, ( H F ’ ) is proved true. Returning to (HF,), the original Hoare formula of interest, in order to apply the rules of Fig. 15 requires inventing new, appropriate assertions as (intermediate) postconditions of statements appearing inside Po. These assertions must be such that, in conjunction with P R E , and POST,, they will allow (HF,) to be proved. The result of inserting such appropriate assertions is a proof’ outline of the form shown in Fig. 16. If we can now prove that
( H F ,1 (X>OA Y>O)
z:= 0 u:=x { ( Z+ U x Y
=
X x Y ) A ( U > 0))
{ ( Z+ U x Y
=
X x Y) A (U > 0))
and (HF,)
repeat
z:= z + Y; U:=
until U {(Z+ U x Y
=
u - 1; =0
X x Y ) A ( U > 0))
44
SUBRATA DASGUPTA
{ ( Z + U x Y = X X Y ) A (U>O)}
I
I
I
I I I I I
!
repeat
{(Z+UxY =XxY)A(U>O))
1
1
I I I
I
!
I
then, by the rule of sequential composition (Fig. 15), we can show that HF, is true. Note that the key to the whole proof in this case is the identification of the assertion (Z+UXY = X X Y)A(U>O).
In general, the most creative and intellectually demanding activity in the formal design paradigm is the construction of such intermediate assertions. The FD paradigm evidently represents the most rigorous and mathematically precise approach to design. However, in light of the characteristics of the design process discussed in Section 2, a number of questions arise. These are discussed below.
3.5.7 Reconciling Formal Design and Design Evolution It will be noted that the Hoare formulas are nothing but theorems which are proved based on the axioms and proof rules of the language and previously
THE STRUCTURE OF DESIGN PROCESSES
45
proved Hoare formulas (theorems). Thus, in the F D paradigm, a design is a theorem.
In view of this, how does one reconcile the F D paradigm with the evolutionary nature of the design process as described in Section 2.5? After all, evolution by its very nature is tentative and contingent, while mathematics is precise and certain. As long as one adheres to a picture of the mathematical process as consisting of an inexorable chain of deductions from axioms and theorems to other theorems, there is indeed a conflict between the evolutionary nature of design and the FD paradigm. However, as has been described by DeMillo, Lipton and Perlis (1979), this picture of the mathematical process is itself a myth. Mathematics is a human activity, and the acceptance of mathematical arguments and proofs is fundamentally a social process within the community. Mathematical proofs are discussed, criticized and worried over by members of the community, and as a result of this sort of a social process a proof may be accepted or it may be rejected. Indeed a proof may be accepted for a considerable period of time before a flaw is detected, at which point it is rejected or modified. More recently, using the four-color theorem as an example, Scherlis and Scott (1983) have also described how a theorem may be only tentatively accepted until extensive critical analysis and discussion either satisfy mathematicians about the theorem’s correctness or reveal flaws in the proof. Thus we see no real contradiction between the fundamental evolutionary nature of design processes and the F D paradigm. In Section 2.5.2 (Example 2. I I ) we illustrated how in a specific instance of the FD paradigm- the desiynwhile-tlerify approach-the development of a formal design was very much within the evolutionary framework. 3.5.2
Limitations of the FD Paradigm
Given the mathematical foundation of the F D paradigm, it would be highly desirable to adopt it as rhe paradigm for design. Unfortunately, there are at least two critical limitations of the formal design approach which prevent its acceptability as the dominant paradigm. Firstly, the FD paradigm ignores or rejects the fact that design may begin with incomplete or imprecise requirements. For instance, a project for designing and building a user-interface (to an operating system, say) could have as a requirement the objective that the interface must have both a “novice usage mode” and an “expert usage mode.” What constitutes such modes or how they are to be characterized may remain quite imprecise or informal for a significant part of the design process. And for this portion of the design process the FD paradigm is simply inapplicable, since the paradigm begins with formally defined specifications. Clearly in designing such a system, other
46
SUBRATA DASGUPTA
paradigms become relevant until the requirements are translatable into formal specifications, at which time the FD paradigm can take over23. The second important limitation, which is closely connected to the first, is the fact that the F D paradigm, by its very nature, does not admit any form of evidence for supporting the validity of a design other than mathematical proofs of correctness. This is one of the reasons why the FD paradigm is virtually useless when the designer is faced with incomplete or imprecise requirements. However, that is not all; even where the requirements are precise or complete, the designer may be forced to invoke other kinds of evidence both as justification for design decisions and during the critical test/analysis of the design (see Fig. 7). Such evidence may be experimental data-gathered by the designer using simulation or test procedures-or it may be the evidence from prior research and analysis conducted by other researchers or designers and documented in the literature. Mathematical proofs are simply one of a number of kinds of evidence. While research or experimental data may not be as certain as proofs, they are capable of providing a very high level of confidence in the design. Besides, they may be the only kinds of evidence that can be used. This latter fact-that is, the necessity of invoking non-formal kinds of evidence (for the validation of a design) under certain circumstancesbecomes clear when we consider the design of a system for which requirements are precise and complete and yet formal techniques are impossible to apply. Example 3.19 Consider the design of a cache memory. In this case, “design” involves identifying the key parameters of the cache. These include (Dasgupta, 1988a; Smith, 1982) ( i ) The placement policy, which determines how main memory words are
mapped onto cache memory words. ( i i ) The size and nature of the blocks to be transferred between main and
cache memory. ( i i i ) The replacement policy, which determines which blocks are to be
removed from the cache to make room for a new, incoming block. (iv) The size of thz cache. Now, in designing a cache, a significant requirement is to establish an upper bound on the cache miss ratio-i.e, the proportion of memory requests that cannot be successfully serviced by the cache. The cache designer may, then, examine the extensive experimental and simulation data involving trade-offs This raises the intriguing possibility of the design process being governed by diflerent paradiynis at diRerent stages. We do not, however, believe that this is really warranted. See Section 3.6 for further discussions of this matter.
THE STRUCTURE OF DESIGN PROCESSES
47
between various cache parameters that have been gathered by researchers (and published in the literature as, for instance, in Smith (1982)).Based on this data he or she may identify a specific set of parameters for the cache memory. In such a situation it is virtually impossible for the designers to formally prove that the selection of these parameters implies satisfaction of the miss ratio requirement. The designer can, however, justify the decision by invoking the evidence in the literature or by simulation.
3.6
The Theory of Plausible Designs
The theory of plausible designs (TPD) is a design paradigm proposed very recently by Aguero and Dasgupta (1987) (see also, Aguero, 1987; Dasgupta and Aguero, 1987)which addresses, directly and explicitly, the following issues in, and characteristics of, design: ( i ) That requirements may initially be neither precise nor complete, and thus the development or elaboration of requirements is an integral component of the design process (Section 2.2). ( i i ) That a design (that is, the product of the design process) is a representation of the target system in some symbolic medium that is appropriate not only as a basis of implementation but also for criticism, analysis, and manipulation (Section 2.3). ( i i i ) That design is an evolutionary process in which the design and/or requirements are continually modified so as to be mutually adapted (Section 2.5). ( i u ) That design processes are very often satisficing procedures (Section 2.4). ( u ) That the evidence invoked or sought by the designer in support of a design decision, or to establish the validity of a design feature, may include formal proofs, experimental evidence, empirical data based on observing previous systems, the results of research and analysis, or sometimes, commonsense reasoning (Section 3.5.2). In addition to the above, the T P D paradigm rests on the following premises: (a) The design of any complex system proceeds on a stage-by-stage basis. This is depicted in Fig. 17. Here each stage denotes the system at a particular abstraction level, and the arrows between the stages indicate that the general direction of design is from the more abstract to the less abstract. (b) The system may be said to be completely described at any one of these
48
SUBRATA DASGUPTA
Abstraction levels
m design at level i
a design at level i
higher
I
-
lower
design at level 1
FIG.17. Stepwise design based on abstraction levels
levels. Equivalently, the description at each stage denotes the design of the system at a particular level of a b ~ t r a c t i o n ~ ~ . (c) Given a pair of adjacent levels i, i + 1 (where level i + 1 is more abstract than level i ) , one may regard the design at level i as a (possibly abstract) implementarion of the design at level i 1. The level i design in turn becomes a spec$cation that is implemented by the design at level i - 1, and so on (Fig. 18). In other words, a design at any particular ab-
+
serves as specification I
I
0 serves as specification
which is implemented by
<-----
I I I
0 which is implemented by FIG.
design at level i
-
1
18. “Design” as “specification” and as “implementation.”
24 Note that such notions as conceptual (or preliminary) design and detailed design (Fig. 10)are subsumed within the more uniform notion of designs at different abstraction levels. Note also that the nature and number of abstraction levels are entirely dependent on the designers and the type of the system being designed.
THE STRUCTURE OF DESIGN PROCESSES
49
straction level can be viewed as both an implementation (of a higher level) and as a specification (to be implemented at a lower level). (d) A design is a description or representation in some symbolic medium (such as diagrams, natural languages, mathematical notation, formal description languages). In contrast, a physical implementation is an operational version of the system. Thus we make the distinction between abstract and physical implementations. The former is a representation of the system and is thus a design. The latter is the system. (e) A design is a description of various funcrional, structural, and performance characteristics or features of the system that are to appear in, or must be met by, the physically implemented system. In the TPD paradigm, the generic term constraint is used to designate such a characteristic or feature. A design at any particular level of abstraction is, then, a description of an organized, mutually consistent collection of constraints which an implementation must satisfy. Note that the constraints are so-called because they are, so to speak, constraints to be met by the next lower level implementation. Furthermore, the constraints of a higher level design may be formulated in quite fuzzy or imprecise terms. A t some later stage of the design process, these same constraints will have been defined in terms of more precise, more primitive constraints. (f) At the beginning of some design step, one or more constraints will be postulated as requirements or goals. One completion of that (or some subsequent) step, the goal constraint should have become an actual feature of the design-a design fact. The task of the designer is to transform all goal constraints into fact constraints. Example 3.20 The following are typical examples of constraints stated in the form of goals or requirements. ( i ) Srgtware transparency to the user. The user should not know that he or
she is programming a fault-tolerant computer with fault-tolerance achieved in hardware. ( i i ) The sustained throughput across the Lawrence Livermore benchmarks must be 1.5 gigaflops. Example 3.21
When a particular computer design which began with
( i i ) above as a goal constraint has been completed, this same constraint will
(one hopes) be present as a fact constraint in the design: The sustained throughput across the Lawrence Livermore benchmarks is 1.5 gigaflops.
50
SUBRATA DASGUPTA
The theory of plausible designs is a design paradigm that is driven by the goal of establishing the plausibility of a design at each and every stage of abstraction. More precisely, the T P D paradigm provides a basis for establishing the plausibilities of the individual constraints constituting a design. 3.6.1
Plausibility States
Intuitively, a plausible construint is a constraint that we believe in because we have evidence that it exhibits certain desirable properties. Furthermore, in the course of design, depending on the evidence at hand, the plausibility of a constraint may change from one design step to another. More precisely, at any stage of design a constraint C may be in one of four plausibility states:
(a) Assumed: there is no evidence against C’s plausibility. (b) Validated: there is signiJcant evidence in favor of and no evidence against C’s plausibility. (c) Refuted: there IS evidence against C’s plausibility. (d) Undetermined: it is unknown whether or not there is evidence against C’s plausibility. This is the initial state of every constraint. The plausibility state of a constraint C is thus a function of the design stage k at which the state is being established and the available evidence Ekat that design stage. The evidence itself may be drawn from other constraints in the “current” design or from the knowledge base available to the designer concerning the particular domain.
3.6.2 Plausibility Statements At any stage of the design, the constraints are individually packaged in the form of plausibility stafrments which allow the designer (or any one else) to reason about the constraints using the rules of reasoning provided in the theory. A plausibility statement is defined as a 5-tuple, S
=
(C, A , R, V, P > ,
where 0
0
0
C is a specific non-empty set of constraints the plausibility of which is being established. A is the knowledge base (including other constraints) which are collectively used to assist in establishing C’s plausibility. R denoted the relationship (between constraints) or properties (of constraints) that must’be verified in order to establish C’s plausibility.
THE STRUCTURE OF DESIGN PROCESSES
0 0
51
V is the meuns employed for the verification of S (this is explained below). P is the plausibility state in which C will be placed upon successful verification of S.
Example 3.22 Thc following is an example of a plausibility statement for a constraint appearing as part of a processor chip design. ~
~
~~~
(S2): C: K-ary B-bit adder (ADD,,) that is efficient and VLSI-implementable. A : Description of ADD,, in S * M .
R : (a) The asymptotic latency of ADD,, is o(log K B ) . (b) The urea complexity of A D D K , is O ( K B log K B ) . (c) The structure of ADD,, is regulur.
V : Mathematical proof for (a) and (b), and heuristics for (c). P : Vuliduted.
Thus the plausibility state of C will be iididured (or more intuitively, the claim that an efficient and VLSI-implementable adder ADD,, has been designed, is rendered plausible) if it can be shown, using formal proof techniques, that (a) and (b)in R are true and, using heuristic reasoning, that (c)in R is true. In order to assist in establishing the plausibility of C , the formal description of ADD,, in the architecture description language S* M (Dasgupta, Wilsey, and Heinanen, 1986) can be used as stated by the field A .
3.6.3 Means of Verification The means of verification that is used to gather evidence in order to assign a constraint C to a plausibility state may be any combination of the following: (a) Precise: that is, formal, deductive logic. (b) Heuristic: that is, approximate techniques involving the use of heuristics. (c) Experimentul: that is, use of controlled experimental methods. Thus, TPD allows a continuum of verification strategies in order to provide evidence. This ranges from strictly deductive logic through more pragmatic experimental techniques to heuristic (approximate or common sense) reasoning. Note that this is one of the ways in which the T P D paradigm differs from
52
SUBRATA DASGUPTA
the FD paradigm, which admits only deductive logic as the means of verification. 3.6.4
Structure of the Paradigm
Space limitations d o not allow us to describe the formal aspects of T P D in further detail. Such issues as the logic of establishing plausibility states and of deducing one plausibility state from another are developed in Aguero (1987) and Aguero and Dasgupta (1987). Of more immediate interest is the overall structure of the design paradigm. In essence, a design process based on the T P D paradigm has the following characteristics:
(A) A design D, at any stage k is an organized set of constraints. Each constraint may be stated formally (in a description/design language, mathematical notation or logical formulas) or in the semi-informal language of scientific discourse (i.e., a combination of formal notation and natural language). (B) The plausibility of Dk at stage k is captured explicitly by a collection of plausibility statements. Since a design cannot contain any constraint that has been shown to be refuted, the plausibility state of each constraint in Dk will be the assumed, validated or the (default) undetermined state. (C) The constraints in Dk may have dependencies between them in the following sense. Let Si= (Ci, A i , Ri, 4) be the plausibility statement for Ciand let Cj be a constraint appearing in Risuch that in order to show that Ciis in plausibility state 4 requires showing Cj is in some plausibility state. In that case Ciis dependent on C j . Dependencies between constraints within a design Dk can be explicitly depicted by a directed acyclic graph termed a constraint dependency graph (CDG). An example is shown in Fig. 19. The vertices denote constraints, and there is a directed edge from vertex i to vertex j if Cidepends on C j . (D) Given a design Dk at some step k of the design process, the design further evolves by virtue of the fact that the designer attempts to ( i ) change an assumed constraint into a validated constraint, or ( i i ) assign an undetermined constraint into the assumed or validated state. In order to do either, the original constraint C may have to be refined or partitioned into a set of new constraints Ci,Cj, ..., such that the relationship between these constraints can be used to demonstrate the desired plausibility state for C2’.
c,
’’
TPD includes logical rules of partitioning such that when a constraint Cis partitioned into constraints C , , C, ,..., the designer can infer the plausibility of C in terms of the plausibility states of C,, C,, . . . .
THE STRUCTURE OF DESIGN PROCESSES
53
Fic;. 19. A constraint dependency graph.
Example 3.23 The very first constraint C, in Fig. 19 when initially postulated would be placed in the undetermined state. In order for it to be assigned to the cissumed state (say), it is partitioned into three constraints C,, C,, C,; the plausibility statement S, for C, would have its Vl field as assumed and would show in its R , field the relation
C, A C , A C,. If there is no evidence against this relation being true then S, is verified-that is, C, is assigned to the rrssumed state. The original “design” consisting of C, alone has evolved to a design consisting of C,, C, and C, related as shown by the topmost part of Fig. 19.
( E ) A design may evolve i n other ways also. For instance in the process of attempting to change the plausibility state of constraint Ci from the rissunied to the ilnlirfnted state, it may happen that the available evidence actually causes C, to be assigned to the rrlfutedstate. In that case Cimust be removed from the design. As a result, if some other constraint Cj is dependent on C,, then its plausibility state will have to be rcvised in the light of the new state of affairs. In other words, the plausibility states of constraints may well have to be revised on the basis of new evidence at handzh. Example 3.24 Referring to Fig. 19 again, suppose C, could not be assigned to the ussumed state on the basis of the evidence specified by its That IS,the logic of plausibility is a type of fl0n-~10)10t0ni(.logic which is widely used in artilicial intelligence systems (Turner, 1984, Chapter 5: Charniak and McDermott, 1985, Chapter 6 ) .
54
SUBRATA DASGUPTA
plausibility statement S,. By changing its P, field from assumed to refuted, however, S, could be verified-that is, C, is refuted and is thus removed from the design. As a result, both C2 and C 3 ,being dependent on Cg, would have to be reconsidered and their plausibility states revised. Note that as a result of all this, the CDG (Fig. 19) would itself alter. 3.6.5 Some Remarks on the TPD Paradigm
The most important characteristics of the T P D paradigm are (a) The unification of “design” and “requirements” under the single notion of “constraints.” As has been noted several times, during the design process, both design and requirements may evolve. This is recognized in T P D by the notion of a set of constraints evolving. (b) The explicitly evolutionary nature of the paradigm. Furthermore, one can see very clearly what it is that evolves, the direction of evolution, and the mechanism of evolution. What evolves is the set of constraints constituting a design. The direction is towards the attainment of plausibility states that are either assumed or (more preferably) validated. And the mechanism is the attempt to verify the plausibility states of existing constraints. This in turn causes the removal of refuted constraints, possible revision of the plausibility states of other existing constraints, and the generation of new constraints. The evolutionary cycle followed in T P D and its correspondence to the general evolutionary structure of the design process (Fig. 7) are depicted in Fig. 20. (c) The recognition that the evidence that can cause the designer (or anyone else) to acquire confidence in, or reject, a feature in the design can be not
Set of constraints with plausibility statements [Design
D
Verification of plausibility statements based on evidence at hand
+ Requirements] [Critical Test]
-
(a) Removal of rejuted constraints (b) Revision of plausibility
<
Identification of plausibility statements that cannot
THE STRUCTURE OF DESIGN PROCESSES
55
only formal proofs, but also experimental tests, the evidence of well documented experience, and the results of research. In this sense, the T P D paradigm as a design paradigm appears to be superior to the F D paradigm. More precisely, the latter is a special case of the former.
4.
Design as Scientific Discovery
In this article we have attempted to analyze in depth the structure of design processes. In Section 2 some of the fundamental characteristics of design were identified, while several design paradigms were discussed in Section 3. These paradigms have all formed the basis for, or were based upon, actual design methods and are thus firmly grounded in the empirical world of “real” design. In this section we would like to move into a more speculative realm concerning the “deep” structure of design processes and suggest a connection between design and the method of science. More specifically we suggest (and will attempt to outline an argument in support of this proposition) that under certain conditions and assumptions design problem solving is u speciul instance of scientzfic discovery.
We shall refer to this proposition as the Design-us-Scientific-Discovery (DSD) hypothesis. The logic and methodology of science deals with many issues, and there are many competing and contending theories of scientific methodology (see, for example, Harre ( I985), Losee ( 1 980), Suppe ( 1977),Nickels (1 980), and Laudan (1984) for general surveys and discussions of the various issues). It is thus important to establish precisely which aspects of scientific methodology are relevant to the DSD hypothesis. Our reference model of science uses two key ideas: the theory of hypothetico-deductive reasoning, and Kuhn’s concept of paradigmatic science. 4.1
The Hypothetico-Deductive (HD) Form of Reasoning
The notion that science follows a hyporhetico-deductii~e(H-D) form of reasoning was first articulated in the mid-nineteenth century by Whewell (1847).The modern form adopted here is due to Popper (1965,1968,1972) and his interpreters (see, e.g., Medawar, 1963, 1967, 1969, 1982). The principal features of H-D theory are as follows: (HDa) Given a phenomenon to be explained, the scientist constructs a hyporhesis (or a system of hypotheses). A hypothesis is a conjectural proposition that might be a correct explanation for the problem. There is no
56
SUBRATA DASGUPTA
one method by which the hypothesis may have been arrived at. It may have been constructed using some form of induction from given data, by following a deductive chain of reasoning from some other established theory, by analogical reasoning, by gestalt-like perceptions, or simply by guesswork. In other words, hypothesis formation in a particular case may indeed be rationally explainable. On the other hand it may not. (HDb) The hypothesis must be tested. This is done by treating the hypothesis as an axiom and determining whether or not the deductive consequences of the hypothesis conform to observed or observable reality. (HDc) If the hypothesis stands up to such a test then the scientist’s confidence in the “correctness” of the hypothesis is strengthened. However, a hypothesis is always of a conjectural, tentative nature. One can never prove that a hypothesis is true because of the fallibility of induction as a logical principal of verification: No matter how many deductive consequences of a hypothesis are upheld by observation or experiment-that is, no matter how many confirming instances of a hypothesis we may find-this can never imply the truth of that hypothesis. (HDd) Because of the non-conclusive nature of confirming instances, verification as a logical mode of hypothesis testing should be rejected. Instead, the falsification principle should be used. Thus, while no amount of confirming evidence proves the truth of the hypothesis, a simple piece of counter-evidence suffices to falsifv it. Thus the aim of testing a hypothesis should actually be to criticize it-that is, to attempt to falsify it. If the falsification attempt fails, the scientist’s confidence in the hypothesis is strengthened. The hypothesis is said to be corroborated. But it still remains tentative. (HDe) A hypothesis in science must be so constructed that it can, in principle, be subject to a falsification test. Any hypothesis that does not satisfy this condition is not a scientific proposition. (Popper calls such a hypothesis a metaphysical proposition). Thus, the demarcation criterion between propositions that qualify as scientific hypotheses and other propositions is the falsifiability principle. (HDf) A critical test may indeed falsify the hypothesis; on the other hand, it may turn out that in order for the hypothesis to be retained some other problem or anomaly must be acknowledged. In either case a new problem will have been generated and the cycle begins once more. This is the basis for Popper’s (1972, p. 145) scheme (PSI
PI + T T + E E
+ P2,
where PI is the original problem, TT is the tentative theory (hypothesis), E E (error elimination) is the critical attempt to falsify TT, and P2 is the resulting new problem that might arise.
THE STRUCTURE OF DESIGN PROCESSES
4.2
57
Kuhnian Paradigms
The second key component of our reference model is Kuhn’s well known concept of the scientific paradigm (Kuhn, 1970)which, as noted in footnote 15, we shall specifically refer to as the Kuhniun purudigni (or K-paradigm for short). The basic notion is that in a particular discipline, scientists solve problems within some prevuiling shured matrix of knowledge, ussumptions, theories and vulues. This is what constitutes the K-paradigm. It is what is described and presented in the “current” textbooks, is taught to students, and is the basis for a research student’s training and identification of research problems. In Kuhn’s view, most hypotheses proposed by scientists do not, normally, question the prevailing K-paradigm. Rather, under normal circumstances, a hypothesis attempts to explain a phenomenon within the prevailing K-paradigm. Kuhn terms this the practice of norrnul science.
Example 3.25 Physicists investigated optical phenomena in the eighteenth and nineteenth centuries under the shared assumption that light travels as waves. The wuzw theory of’ light was then the prevalent paradigm, established in the seventeenth century as a result of the work of Newton and others. This type of investigation is, in Kuhn’s sense, an instance of normal science. On rare occasions, however, a problem surfaces that appears to be explainable only by a hypothesis that actually challenges the prevalent paradigm. If this new hypothesis is subsequently corroborated then the science itself undergoes a purudignt shf-a transition from the previously dominant to a new K-paradigm. The hallmark of a scientific reoolution is, according to Kuhn, such a paradigm shift 27. 4.3
Basis for the DSD Hypothesis
If the H-D theory of scientific reasoning and Kuhn’s concept of paradigms are together accepted as a valid core model of scientific problem solving, it is easy to see why the DSD hypothesis may be plausible. Our argument in support of the DSD hypothesis can be sketched out as follows: (DSDa) Design begins with some (possibly ill-defined or ambiguously stated) requirements (Section 2.2). In other words, the design begins with a problem. 27 We apologize for the brevity with which Popper’s and Kuhn’s ideas have been presented here. For further appreciation of the richness of these seminal concepts the reader is referred to Popper (1965. 1968. 1972) and Kuhn (1970) as well as the extensive subsequent discussions by Popper, Kuhn. Lakatos and others in Lakatos and Musgrave( 1970).For arguments against Kuhn’s theory of paradigms, see Laudan (1977, 1984).
58
SUBRATA DASGUPTA
(DSDb) A design is produced as a tentative solution to the posed problem, and takes the form of a symbolic representation (Section 2.3). The mode of representation is constrained by the condition that it must be possible to determine whether or not the system that has been designed solves the original problem (i.e.,meets the original requirements). In other words, the design must be so described as to be critically testable. Otherwise it serves very little purpose. (DSDc) A design finishes as an assembly of‘ components which are themselves symbolic representations. The complexity in design arises from the fact that, though the properties and behavior of the design components may well be understood, a particular design solution may assemble such components in a novel way. Because of bounded rationality (Section 2.4), the resulting interactions of the design components may be neither fully predictable nor completely understood. The designer believes that the assembly will meet the requirements. (DSDd) From (DSDb) and (DSDc) we see that a design is a hypothesis in exactly the same way that a scientific theory is a hypothesis: ( i ) It is a tentative, conjectural proposition that states that if the target system is built according to the design it should meet the stated requirements. ( i i ) It may have been forged using a combination of induction, deduction, analogical reasoning or imaginative guesswork. (iii) The design must be tested by rational arguments relating the design to the K-paradigm which guided its development (see (DSDf) below), by deducing the properties from the design and determining whether these properties meet or violate the requirements, by conducting laboratorytype experiments (involving simulation or prototypes), or by testing it “in the field” under actual operating conditions.
(DSDe) Because of the ill-structuredness of most design problems, the phenomenon of bounded rationality, and the need to critically test the design, the design process is evolutionary (Section 2.5 and Fig. 7). Furthermore, regardless of whether ontogenic or phylogenic evolution takes place, the means of evolution is identical: the use of a critical test to eliminate errors in the design. There is thus a direct correspondence between the evolutionary cycle of Fig. 7 and Popper’s scheme ( P S ) .This correspondence is summarized in Fig. 2 1. (DSDf) Finally, designers are governed by Kuhnian-type paradigms. A design problem makes sense to the designer only in the context of a particular knowledge base (KB) consisting of an integrated network of theory, partial designs, design styles, and various heuristics. In a mature design discipline, the
THESTRUCTUREOFDESIGNPROCESSES
Natural Sciences
Design
Problem
Requirements
Hypothesis/Theory Critical experiment/
FIG.21
I
59
Design Critical test
Parallels between the H-D scheme for science and the evolutionary scheme for
design.
K B is not private to a particular designer. It is mostly public, objective knowledye shared across that particular design community. It is, in fact, the prevalent K-paradigm within which the designer makes his or her decisions28. The foregoing observations (DSDa-DSDf), in conjunction with the characteristics of the H-D form of reasoning (HDa-HDf) and the concept of K-paradigms, lead us to formulate the DSD hypothesis which can be restated
’”
We must, however. make t w o qualilications to this assertion. First, the particular Kparadigm dominating a designer’s perspective may not be as universally shared as paradigms are in the natural sciences. For example, designers i n a particular company may have access to certain principles of design. theory. tools and technologies that are “trade secrets,” i.e., private to that organization. Nonetheless. the designers within such communities are governed by a paradigm, albeit a restricted one. More accurately, the K-paradigm governing such a design organization consists of a genuinely public component and an organization-specific component. Second, i n Kuhn’s model, a natural science is dominated at any given time by one paradigm. In contrast it may be possible that a design discipline may contain several coexisting, alternative paradigms at any given time. I t depends on how one characterizes Kuhnian paradigms. For example, in structural engineering it may be reasonable to say that thedesign of steel and concrete structures involve separate paradigms although they obviously would share some components (e.g.. the general theory o f the strength of materials). In the design of programming languages one could. likewise, make a case for the “imperative” language and “functional” language paradigms. Thus, in an engineering discipline, rather than one paradigm being replaced by another (parcidigm shiji) ii neNly emergent paradigm could corris/ with more established ones. This corresponds to the fact that design dlsciplines deal with what might o r what oughr to be the state of atfairs (see Section 2. I) in contrast to the natural sciences which are concerned with what is or was the state of affairs. Design disciplines admit multiple possible worlds, hence mtrltiple paradigms. Exactly how paradigms are generated in a design discipline and what constitutes the analog to revolutions in the natural sciences are fascinating open research problems in the theory of design.
60
SUBRATA DASGUPTA
in slightly modified form as follows: Within the framework of the Popper-Kuhn hypothetico-deductive model of science. design problem solving is an instunce of scientiJc discovery.
Several further remarks need to be made about this hypothesis: (a) It must not be interpreted as a universal proposition in the sense that every design act that may have been performed in the past or that may be performed in the future is an instance of scientific discovery. After all, one can always design in such a way that one can neither confirm nor refute the claim that the design meets the given set of requirements. Rather, the DSD hypothesis should be viewed as a universal proposition in that ( i ) it applies to any particular design discipline or problem, and ( i i ) it is to be interpreted as stating that for any particular design discipline, the design process cun indeed he fiwmuluted such that it satisfies the Popper-Kuhn H-D model of science. (b) It must be remembered that the DSD hypothesis is a hypothesis. However, like scientific propositions, it emerged from observations, case studies, analogies, and other ideas. Thus, like scientific propositions the DSD hypothesis is itself an empirical hypothesis which must be critically tested. Clearly, it would be very easy to devise a test case that would corroborate the hypothesis. One would simply select a convenient design problem and show how its solution satisfies the Popper-Kuhn model of science. However, a cornerstone of Popper's theory is that the more daring the hypothesis and more stringent the test, the greater the payoff-that is, the higher our confidence in the hypothesis-if the test fuils to refute the hypothesis. Thus, a valid critical test for the DSD hypothesis would be to show that it is not falsified by design disciplines that have traditionally been known to be resistant to scientific methodology. Example 3.26 An appropriate candidate for the critical testing of the DSD hypothesis is the design of a computer language (such as a programming,
hardware description or microprogramming language). In spite of the vast accumulation of knowledge about the mathematical theory of syntax and semantics, the practical design of such languages remains largely informal. The general approach, whether done by individuals or design teams, is the ASE paradigm (Section 3.2). Requirements for a language design are established as extensively as possible subject to review and analysis (the A phase); the language is then designed (the S phase); it is then subject to public scrutiny and review (the E phase). Several iterations of these phases may take
'' Probably the most well known recent example of the ASE paradigm at work in this domain is the development of the Ada programming language (DoD, 1981; Wegner, 1980. pp uii-ix). Another recent example is the development of the VHDL hardware description language 1985) (Note: Ada is a registered trademark of the U.S. Department of Defense). (Shahdad el d..
THE STRUCTURE OF DESIGN PROCESSES
61
However, it is difficult to place any known language design effort explicitly within the framework of the scientific method. While particular features of a language, or the total philosophy of a language design effort, may have been dictated by specific theoretical principles, the idea of a lunyuaye design us u testahlc/,fal.st~uhlehypothesis appears a totally novel idea. Thus, language design would offer a particularly stringent test for the DSD hypothesis. (c) Thus, the DSD hypothesis as a general theory of design must be subject to extensive critical scrutiny. Nonetheless, from our previous discussion of the theory of plausible designs (TPD) (Section 3.6), the reader may realize that TP D is itself supportive of the hypothetico-deductive method. In fact, the following correspondences may be noted between T P D and the H-D model:
( i ) At some design stage k , the attempt to establish the plausibility of a design (in TPD) is a type of criticul test. ( i i ) The change of a constraint’s plausibility state from assumed/ualidated to rclfitcd is a type of error identification. (iii) Removal of the related constraint and revision of the other affected constraints (in TPD) is a type of Lwor eliminurion. ( i v ) The inability to establish the plausibility states assumed or uulidated (in TPD) is yet another type of error identijication. ( I ) ) The generation of new “lower level” constraints and their incorporation into the design (in TPD) is another type of error elimination. ( o i ) The set of constraints produced by ( i i i ) or ( u ) constitutes a new design (in TPD) and is an instance of a new (design) hypothesis. Thus, T P D is a design paradigm that is already within the framework of the HD model of science and can be used as a “working design method” for testing the DSD hypothesis. 5.
Conclusions
In this article, we have conducted an analysis of the structure of design processes. While much of what has been discussed applies to design disciplines in general, our particular emphasis has been the design of computer systemsprograms, computer architectures, computer languages, and hardware. Perhaps the most interesting point that emerges from this entire discussion is the extraordinary richness of the design act. Design is, above all, a human activity and an intellectual activity of many dimensions. It involves, as we have seen, issues of human values, and it invokes questions concerning human rationality. It relies on our ability to describe concepts in abstract yet precise ways that connect to mathematical formalisms on one hand and cognitive capabilities on the other. Design involves aesthetic issues. And, as we have
62
SUBRATA DASGUPTA
noted, there is an important evolutionary dimension to design. Indeed, the fact that design is an evolutionary activity is a vital component of what anthropologists call cultural evolution. In our final section, we speculated that out of all this multifaceted richness one may identify a single general theorythat design as a problem solving activity conforms to one of the most powerful and successful of human activities, namely, scientific discovery. We believe that the DSD hypothesis has such compelling appeal that further investigation of its validity presents an extremely interesting research problem in the theory of design. While an analysis of the structure of design processes is of intrinsic interest precisely because it is an intellectual activity, a practical motivation that also drives such analyses is the formulation of rational, usable design paradigms. In this article we have discussed five such paradigms (in Section 3). The most recent of these-the theory of plausible designs (TPD)-is based quite explicitly on our analysis of the characteristics of design described in Section 2. We have noted the strengths and weaknesses of each of the paradigms. However, we believe that in several important ways, TPD subsumes the other paradigms. We have also noted (in Section 4) how T P D is strongly supportive of the concept of design as a process of scientific discovery. In the course of this discussion we have made only the briefest of references to the technologies of computer-aided design (CAD) and “expert” design systems. This is because our main goal was to try to reveal the complexity and richness of design and thereby provide a reference framework within which the possibilities and limitations, the strengths and weaknesses of automated design tools can be gauged. Thus, while a discussion of the latter is beyond the scope of this article, it is pertinent to point out that most design automation systems developed to date are based on either the algorithmic or the A1 paradigm. At this time of writing this author and his collaborators are investigating the development of automated design tools based on the TPD paradigm. ACKNOWLEDGEMENTS In developing my ideas on design I have benefited enormously from discussions and correspondence with C. A. R. Hoare and Werner Damm. I am also indebted to Enrique Kortright for his detailed reading of the article and to Sukesh Patel for his help in preparing the manuscript. The contents of this article reflect many hours of discussion on various aspects of design with Philip Wilsey, Ulises Aguero, Sukesh Patel, Al Hooton and Enrique Kortright. My thanks to them all. REFERENCES Aguero, U. (1987). A Theory of Plausibility for Computer Architecture Designs. PhD Thesis, The Center for Advanced Computer Studies, Univ. of Southwestern Louisiana, Lafayette, Louisiana.
‘
THE STRUCTURE OF DESIGN PROCESSES
63
Aguero. U., and Dasgupta, S. (1987).A Plausibility Driven Approach to Computer Architecture Design. Comm. A C M , 30 ( I I). 922-932. Akin, 0.(1978). How d o Architects Design? In Latombe (1978), 65-104. Alagic, S . and Arbib. M. A. (1978). “The Design of Well-Structured and Correct Programs.” Springer-Verlag. Berlin. Alexander. C. (1964).“Notes on the Synthesis of Form.” Harvard University Press, Cambridge, Massachusetts. Belady. L. A., and Lehman, M. M. (1976).A Model of Large Program Development. I B M Sys. J . IS ( 3 ) .725-252. Reprinted in Lehman and Belady (1985), 165- 200. Belady. I.. A., and Lehman. M. M. (1979). Characteristics of Large Systems. In “Research Directions in Software Technology” (P. Wegner, ed.), pp. 106- 138. MIT Press, Cambridge, Massachusetts. Bendall. D. G., ed. (1983).“Evolution from Molecules to Men.” Cambridge University Press, Cambridge, England. Boehm. R. (1981). “Software Engineering Economics” (ch. 4). Prentice Hall, Englewood Cliffs, New Jersey. Boehm. B. (1984). Software Life Cycle Factors. In “Handbook of Software Engineering” (C. K. Vick and C. V. Ramamoorthy, eds.). Van Nostrand-Rheinhold, New York. Borrione, D., ed. (1987). “From HDL Descriptions to Guaranteed Correct Circuit Design.” North-Holland, Amsterdam. Broadbent. G . (1973).“Design in Architecture.” John Wiley, New York. Bullock, A., and Stallybrass, O., eds. ( 1977). “The Fontana Dictionary of Modern Thought.” Fontana/Collins, London. Charniak. E., and McDermott, D. (1985). “Introduction to Artificial Intelligence.” AddisonWesley, Reading, Massachusetts. Cross. N.. ed. (1984). “Developments in Design Methodology,” John Wiley, New York. Cross, N., Naughton, J.. and Walker, D. (1980).Design Method and Scientific Method. In Jaques and Powell (1980). 18-29, Damm. W., and Dohman. G . (1987).A n Axiomatic Approach to the Specification of Distributed Computer Architecture. Proc. Conj’.Purullel Arrhitcwitres und Lung. Europe(PARLE). Lecture Notes on Computer Science, Springer-Verlag. Berlin. Damm. W. c’f d.(1986).The AADL/S* Approach to Firmware Design and Verification. l E E E Sofiwure 3 (4), 27-37. Darke. J. (1979). The Primary Generator and the Design Process. Design Studies I (l), 36-44. Keprinted in Cross( 1984). 175-188. Dasgupta. S. (1979). The Organization of Microprogram Stores. A C M Comp. Surveys. 11 (1) 39 - 66. Dasgupta. S. (1982).Computer Design and Description Languages. In “Advances in Computers,” Vol. 21 (M. C. Yovits,ed.), pp. 91-155. Academic Press, New York. Dasgupta, S. ( 1984). “The Design and Description of Computer Architectures.” John Wiley (Wiley-Interscience). New York. Dasgupta. S. (1985). Hardware Descriptions Languages in Microprogramming Systems. Computer I8 (2) 67-76. Dasgupta, S. ( 1988a). “Computer Architecture: A Modern Synthesis, Volume I : Foundations.” John Wiley, New York. Dasgupta. S. (I9XXb). “Computer Architecture: A Modern Synthesis, Volume 11: Advanced Topics.” John Wiley, New York. Dasgupta, S., and Aguero, U. (1987).On the Plausibility of Architectural Designs. Pror. 8th Inrl. Con/. Cotrip. Hard. Descriplion Lung. und Applrcutions ( C H D L 87). (M. R. Barbacci and C. J. Koomen, eds.), pp. 177- 194. North-Holland. Amsterdam. Dasgupta, S., and Shriver. B. D. (1985). Developments in Firmware Engineering. In “Advances in Computers,” Vol. 24 (M. C. Yovits. ed.). pp. 101-176. Academic Press, New York.
64
SUBRATA DASGUPTA
Dasgupta, S. and Wagner, A. (19x4). The Use of Hoare Logic in the Verification of Horizontal Microprograms. Int. J . Comp. und Info. Scirnces 13 ( 6 ) ,461 -490. Dasgupta, S. Wilsey, P. A,, and Heinanen, J . (1986). Axiomatic Specifications in Firmware Development Systems. I E E E SoJwure 3 (4),49-58. Softwure 3 (4). I 8 - 26. Davidson. S. (1986). Progress in High Level Microprogramming. I deBakker, J. (1980). “Mathematical Theory of Program Correctness,” Prentice-Hall International, Englewood-Cliffs, N. J. deMillo, R., Lipton, R. J., and Perlis, A. (1979). Social Processes and Proofs of Theorems and Programs. Comm. ACM 22 (5). 271-2x0. Dijkstra. E. W. (1972). Notes on Structured Programming. In 0. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, “Structured Programming.” Academic Press, New York. Dijkstra. E. W. (1976). “A Discipline of Programming.” Prentice-Hall, Englewood-Cliffs, New Jersey. Dixon, J . (1986). Artificial Intelligence and Design: A Mechanical Engineering View. Pro(,. 5th N a t . Conf’. Artfficiul Intelligence (AAAI-86). Vol. 2, AAAI, 872-877. DoD (1981).“The Programming Language Ada Reference Manual.” Lecture Notes in Computer Science, 106, Springer-Verlag, Berlin. Encarnacao, J., and Schlechtendahl, E. G. (1983). “Computer Aided Design.” Springer-Verlag, Berlin. Evans, B., Powell, J.. and Talbot, T., eds. (1982). “Changing Design.” John Wiley. New York. Floyd, R. W. (1967). Assigning Meaning to Programs. “Mathematical Aspects of Computer Science.” XIX. Amer. Math. Soc., Providence, Rhode Island. Freeman, P. (1980a).The Context of Design. In “Software Design Techniques” P. Freeman and A. 1. Wasserman, eds., pp. 2-5. IEEE, New York. Freeman, P. (1980b). The Central Role of Design in Software Engineering: Implications for Research. In “Software Engineering” (H. Freeman and P. M. Lewis, 11, eds.), pp. 121-132. Academic Press, New York. Gero, J. S.. ed. (1985). “Knowledge Engineering in Computer Aided Design.” North-Holland, Amsterdam. Giloi, W. K., and Shriver, B. D., eds. (1985). “Methodologies for Computer Systems Design.” North-Holland, Amsterdam. Gopalakrishnan, G. C.. Smith, D. R., and Srivas, M. K. (1985). An Algebraic Approach to the Specification and Realization of VLSl Designs. Proc. 7th Int. Symp. Comp. Hard. L a n g . and Applicutions (CHDL 8 5 ) (C. J. Kooman and T. Moto-oka, eds.), pp. 16-38. North Holland, Amsterdam. Gopalakrishnnn, G. C., Srivas. M. K., and Smith, D. R. (1987). From Algebraic Specifications to Correct VLSI Circuits. In Borrione (1987). 197-223. Gould. S. J. (1977). “Ontogeny and Phylogeny.” Belknap Press of the Harvard University Press, Cambridge, Massachusetts. Cries, D. G. (1981). “The Science of Programming.” Springer-Verlag, Berlin. Hanson, N. R. (1972). “Patterns of Discovery.” Cambridge University Press, Cambridge, England. Harre. R. (1985). “The Philosophies of Science: An Introductory Survey.” Oxford University Press, Oxford, England. Hayes-Roth, F., Waterman, D. A,, and Lenat, D. B., eds. (1983). “Building Expert Systems.” Addison- Wesley, Reading, Massachusetts. Hoare, C. A. R. (1969). An Axiomatic Approach to Computer Programming, Comm. ACM 12 (lo), 576-580,583. Hoare, C. A. R. (1986). The Mathematics of Programming, Inaug. Lect., University of Oxford. Clarendon Press, Oxford, England.
THE STRUCTURE OF DESIGN PROCESSES
65
Hoare, C. A. R . (1987). A n Overview of Some Formal Methods for Program Design. Cornpurer 20(9).X5-91. Hoare, C. A. R., and Wirth, N. (1973). An Axiomatic Definition of the Programming Language Pascal. A(,ta In/ornicrficu,2, 335-355. Hong. S. J. (1986). Guest Editors Introduction. (Special Issue on Expert Systems in Engineering), Corrrpult’r 19(7). 12 15. Hopkins. W. C., Horton, M. J.. and Arnold, C. S. (1985). Target-Independent High Level Microprogramming. Pro(.. l X t h Ann. Work. on Micropro(/., pp. 137- 144. IEEE Comp. Soc. Press, Los Angeles. Horowitz, E., and Sahni, S. (1978). ”Fundamentals of Computer Algorithms.” Computer Science Press. Rockville. MD. Hubka. V. ( 1982). “Principleh of Engineering Design.” Butterworth Scientific. London. IEEE (19x7). (Special Report on Good Design). I E E E Spcvtrum 24 (5). Jaques, R., and Powell. J. A,, eds. (1980). “Design: Science: Method.” Westbury House, Guildford, England. Jones. J. C‘. (1963). A Method of Systematic Design. In “Conference on Design Methods” (J. C. Jones and D. Thornley. eds). pp. 10-31. Pergamon, Oxford. Reprinted in Cross (1984). Jones. J. c‘. (1980). “Design Methods: Seeds of Human Futures (2nd Edition). John Wiley, New York. Jones, J. C. (19x4). “Essays in Design.” John Wiley, New York. Katevenis, M. G . H. (1985). “Reduced Instruction Set Computer Architectures for VLSI.” MIT Press, Cambridge, Massachusetts. Kogge. P. M. (19x1). “The Architecture of Pipelined Computers.” McGraw-Hill, New York. Kuhn, T. S. (1970). “The Structure of Scientific Revolutions” (2nd Edition). Univ. of Chicago Press, Chicago. Lakntos, 1.. and Musgrave, A.. eds. (1970). “Criticism and the Growth of Knowledge.” Cambridge University Press, Cambridge, England. Langley, P. rr ul. (1987). “Scientific Discovery: Computalional Explorations of the Creative Processes.” MIT Press, Cambridge, Massachusetts. Latombe. J. C., ed. (1978). “Artificial Intelligence and Pattern Recognition in Computer Aided Design.” North-Holland, Amsterdam. Laudan. L.. (1977). “Progress and its Problems.” Univ. of Calif. Press, Berkeley. Laudan, L.. (1984). “Science and Values: The Aims of Science and their Role in Scientific Debate.” Univ. of Calif. Press. Berkeley. Lawson. B. (1980). “How Designers Think: The Design Process Demystified.” Architectural Press, I,ondon. Lehman. M. M.( 1974). Programs,Cities and Students-Limits to Growth’?Inaug. Lect., Imperial College of Science and Technology, London. Reprinted in “Programming Methodology” (D. Gries. ed.), pp. 42- 69. Springer-Verlag, Berlin. Lehnian, M. M. (19XOa). Programs, Life Cycles and Laws of Program Evolution. Proc. I E E E 68 (9). 1060-IY76. Reprinted in Lehman and Belady (1985), 393-450. Lehman, M. M. (19XOb). On Underslanding Laws, Evolufion, and Conservation in Large Program Life Cycles. J. Syst. urid So/tivare 1 (3),1980. Reprinted in Lehman and Belady (1985). 375-392.
Lehman. M. M. (1984). Program Evolution. / t $ ) . Proc. and M
66
SUBRATA DASGUPTA
Losee, J. (1980). “Historical Introduction to the Philosophy of Science.” Oxford University Press, Oxford, UK. March, L., ed. (1976). “The Architecture of Form.” Cambridge University Press, Cambridge, England. Maynard Smith, J. (1975). “The Theory of Evolution (3rd Edition). Penguin Books, Harmondsworth, Middelsex, England. Mead, C., and Conway, L. (1980). “Introduction to VLSl Systems.’’ Addison-Wesley, Reading, Massachusetts. Medawar, P. B. (1963). Hypothesis and Imagination. Times Literary Supplement, Oct. 25. Reprinted in Medawar (1982), 115-135. Medawar, P. B. (1967). “The Art of the Soluble.” Methuen, London. Medawar, P. B. (1969). “Induction and Intuition in Scientific Thought.” Amer. Phil. SOC., Philadelphia. Reprinted in Medawar (1982). 73-1 14. Medawar, P. B. ( 1 982). “Pluto’s Republic.” Oxford University Press, Oxford, England. Middendorf, W. H. (1986). “Design of Devices and Systems.” Marcel Dekker, New York. Mills, H. D. (1975). The New Math of Computer Programming. Comm. ACM 18 ( I ) , 43-48. Mostow, J. (1985). Towards Better Models of Design Processes. A1 Magazine, Spring, 44-57. Mueller, R. A,, and Varghese, J. (1985). Knowledge Based Code Selection in Retargetable Microcode Synthesis. I E E E Design and Test 2 (3), 44-55. Newell, A,, and Simon, H. A. (1972). “Human Problem Solving.” Prentice-Hall, Englewood-Cliffs, New Jersey. Nickels. T.. ed. (1980). “Scientific Discovery, Logic, and Rationality.” Boston Studies in the Philosophy of Science, Vol. 56. D. Reidel, Boston. Popper, K. R. (1965). “Conjectures and Refutations: The Growth of Scientific Knowledge.” Harper and Row, New York. Popper, K. R. (1968). “The Logic of Scientific Discovery.” Harper and Row, New York. Popper, K. R. (1 972). “Objectivc Knowledge: An Evolutionary Approach.” Clarendon Press, Oxford, England. Ramamoorthy, C. V. et al. (1987). Issues in the Development of Large, Distributed, and Reliable Software. In “Advances in Computers,’’ Vol. 26 (M. C. Yovits, ed.), pp. 393-443. Academic Press, New York. Rehak, D. R., Howard, H. C., and Sriram, D. (1985). Architecture of an Integrated Knowledge Based Environment for Structural Engineering Applications. In Gero (1985),89-117. Rittel, H. W., and Webber, M. M. (1973). Planning Problems are Wicked Problems. Policy Sciences 4, 155-169. Reprinted in Cross (1984), 135-144. Rowe, P. G. (1987). “Design Thinking.” MIT Press, Cambridge. Massachusetts. Ruse, M. (1986). “Taking Darwin Seriously.” Basil Blackwell, Oxford, England. Sata, J., and Warman, E. A., eds. (1981). “Man-Machine Communication in CAD/CAM.” North-Holland, Amsterdam. Scherlis, W. L., and Scott, D. S. (1983). First Steps Towards Inferential Programming. Information Processing 83 (Proc. I F I P Congress, R. E. A. Mason, ed.), pp. 199-212. North-Holland, Amsterdam. Schon, D. A. (1983). “The Reflective Practitioner.” Basic Books, New York. Shahdad, M. et al. (1985). VHSIC Hardware Description Languages. Computer 18 (2). 94-104. Sheraga, R. J., and Gieser, J. L. (1983). Experiments in Automatic Microcode Generation. IEEE Truns. Comput. C-32 ( 6 ) .557-658. Siewiorek, D. P., Bell, C. G., and Newell, A. (1982). “Computer Structures: Principles and Examples.” McGraw-Hill, New York. Simon, H. A. (1973). The Structure of 111 Structured Problems. Artificial Infelligence 4, 181-200. Reprinted in Cross (l984), 145-165.
THE STRUCTURE OF DESIGN PROCESSES
67
Simon, H. A. (1975).Style i n Design. In “Spatial Synthesis in Computer Aided Building Design” (C. M. Eastman. ed.), John Wiley, New York. Simon, H. A. (1976).“Administrative Behavior” (3rd Edition). The Free Press, New York. Simon, H. A. (1981). “Science of the Artificial” (2nd Edition). MIT Press, Cambridge, Massachusetts. Simon, H. A. (19x2). “Models of Rounded Rationality” Vol. 2. MIT Press, Cambridge, Smith. A. J. (1982).Cache Memories. ACM Comp. Suril. 14(3) 473-529. Sommerville, I . ( 1985). “Software Engineering” (2nd Edition). Chapter 1. Addison-Wesley, Reading, Massachusetts. Spillers, W. R., ed. (1972).“Basic Questions of Design Theory.” North-Holland, Amsterdam. Suppe, F., ed. (1977). “The Structure of Scientific Theories.” Univ. of Illinois Press, Urbana, Illinois. Tanenbaum, A. S. ( 1984).”Structured Computer Organization.” Prentice-Hall, Englewood-Cliffs, New Jersey. Thomas, D. E. (19x5). Artificial Intelligence in Design and Test: Guest Editor’s Introduction. l E E E Dusign untl Tesr 2 (4).21. Turner, R. (1984). ”Logics for Artificial Intelligence.” John Wiley, New York. Wegner, P. (1980). “Programming with Ada: Introduction by Means of Graduated Examples,” Prentice-Hall, Englewood-Cliffs, N.J. Westerberg, A. W. (1981). Design Research: Both Theory and Strategy. Tech. Rept. DRC-06-24X I , Sept.. Design Research Center, Carnegie-Mellon University, Pittshurg, Pennsylvania. Whewell. W. ( I 847).“The Philosophy of the Inductive Sciences” (2nd Edition). John W. Parker, London. 1967 Impression. Frank Cass, London. Wiener, R., and Sincovec. R. (1984).“Software Engineering with Modula-2 and Ada.” Chapter 1. John Wiley, New York. Wirth, N. (1971). Program Development by Stepwise Refinement. Comm. ACM 14 (4), 221-227. Zimmerman, G . (1981). Computer Aided Synthesis of Digital Systems. In Computer Hard. Description L ~ r i g ~ and ~ g eApplicalions (CIIDL X I j . (M. Breuer and R. Hartenstein, eds.), pp. 331 348. North-Holland, Amsterdam. ~
This Page Intentionally Left Blank
Fuzzy Sets and Their Applications to Artificial Intelligence ABRAHAM KANDEL* MORDECHAY SCHNEIDER* Computer Science Department and The Institute tor Expert Systems and Robotics Florida State University Tallahassee, Florida I. 2.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . Fuzzy Sets. . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Mathematics of Fuzzy Sets . . . . . . . . . . . . . . . . 2.2 Fuzzy Sets vs. Probability . . . . . . . . . . . . . . . . . . 3. Fuzziness and Typicality Theory . . . . . . . . . . . . . . . . . . 3.1 The Fuzzy Expected Value . . . . . . . . . . . . . . . . . . 3.2 The Fuzzy Expected Interval . . . . . . . . . . . . . . . . . 3.3 The Weighted Fuzzy Expected Value. . . . . . . . . . . . . . . 4. Applications of Fuzzy Set Theory to Expert Systems . . . . . . . . . . . 4.1 Fess-A Reusable Fuzzy Expert System . . . . . . . . . . . . . 4.2 COFESS cooperative Fuzzy Expert Systems . . . . . . . . . . . 4.3 Example of COFESS . . . . . . . . . . . . . . . . . . . . 4.4 Fuzzy Relational Knowledge Base . . . . . . . . . . . . . . . 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .
1.
69 71
71 78 79 79 82 88 90 91 94 95 99 101
103
Introduction
Although there is extensive literature on the theory of fuzzy sets and its applications, it is difficult for one who wishes to acquire basic familiarity with the theory to find recent papers that both provide an introduction and present an up-to-date exposition of some of the main applications of the theory, especially in artificial intelligence and knowledge engineering. In this paper we present some of the basic theories of fuzzy sets and demonstrate that some significant problems in the realm of uncertainty can be dealt with through the use of theory. We also introduce a unifying point of view to the notion of inexactness, based on the theory of fuzzy sets introduced by Zadeh. The term Fuzzy in the sense used here seems to have been first introduced in Zadeh ( 1 962). In that paper Zadeh called for a “mathematics of fuzzy or cloudy * This research was supported in part by NSF grant IST 8405953 and by the Florida High Technology and Industrial Council grant UPN 8510031 6 . 69 ADVANCES IN COMPUTERS, VOL 28
Copyright 1 , IY89 hy Academic Press. Inc All righi\ of reproduction 111 any form reserved ISBN 0-12-012128-X
70
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
quantities which are not describable in terms of probability distributions.” This paper was followed in 1965 by a technical exposition of just such a mathematics, now termed the theory of fuzzy sets (Zadeh, 1965).The reasons supporting the representation of inexact concepts by fuzzy sets has been given by Goguen (1967). Perhaps his most convincing argument is a representation theorem which says that any system satisfying certain uncertainty axioms is equivalent to a system of fuzzy sets. Since the axioms are intuitively plausible for the system of all inexact concepts, the theorem allows us to conclude that inexact concepts can be represented by fuzzy sets. The representation theorem is a precise mathematical result in the theory of categories, so that a very precise meaning is given to the concepts “system” and “represented.” Essentially, fuzziness is a type of imprecision that stems from a grouping of elements into classes that do not have sharply defined boundaries. Such classes-called fuzzy sets--arise, for example, whenever we describe ambiguity, vagueness, or ambivalence in mathematical models of empirical phenomena. Since certain aspects of reality always escape such models, the strictly binary (and even ternary) approach to the treatment of physical phenomena is not always adequate to describe systems in the real world, and the attributes of the system variables often emerge from and elusive fuzziness, a readjustment to context, or an effect of human imprecision. In many cases, however, even if the model is precise, computer simulations may require some kind of mathematical formulation to deal with imprecise descriptions. The theory of fuzzy sets has as one of its aims the development of a methodology for the formulation and solution of problems that are too complex or too ill-defined to be susceptible to analysis by conventional techniques. Because of its unorthodoxy, it has been and will continue to be controversial for some time to come. Eventually, though, the theory of fuzzy sets is likely to be recognized as a natural development in the evolution of scientific thinking, and the skepticism about its usefulness will be viewed, in retrospect, as a manifestation of the human attachment to tradition and resistance to innovation. In what follows, our attention is focused primarily on defining some of the basic notions within the conceptual framework of fuzzy set theory and exploring some of their elementary implications. The exposition is based on Kandel(l982) and Kandel(l986). The applications to A1 in general and expert systems in particular are based on the assertion that fuzziness is an integral part of the decision-making process in these systems. More specifically, there are many different forms in which knowledge may be represented in an expert system. Several different forms have been used very successfully for certain types of applications, all including some sort of imprecision as part of their implementation.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
2. 2.1 2.1.1
71
Fuzzy Sets
The Mathematics of Fuzzy Sets
Grades of Membership
Conventional set theory was founded by G . Cantor (1845-1918). Set theory has been developed to establish and utilize any possible systematic relations among the items within the same set as well as members of various sets. A set is defined as any number of definite, well distinguished objects (elements of the set) grouped together. Within the framework of classical set theory a given object is either a member of the set or else it is completely excluded from the set. For instance, if the universal set U consists of all polygons, then the set of pentagons, denoted by X and defined as
X
I
= {x x
has 5 sides},
is clearly a subset of U. Moreover, given any particular element x in U, there are two mutually exclusive possibilities to characterize the relation between x and X :
X
+ xx(x) =
1,
(ii) x r$ X
+ xx(x) =
0.
(i)
.Y E
or Although classical (nonfuzzy, rigid) set theory is mathematically sound, it is, in many real-life cases, not applicable to reasoning used in artificial intelligence. For instance, consider the universal set U of all people. Consider a subset A of U defined by
I
A = { x x is young}.
Clearly, set A cannot be characterized as a classical set, since the whole notion of "being young" is logically fuzzy. When is a person "young"? Is it at all possible to assign the term "young" any deterministic meaning? Since classical set theory was first introduced, it greatly contributed to the development of a wide range of applications. In particular it was very successful in predominantly theoretical applications. Successful utilization of methods based on classical set theory also took place in areas where the degree of precision, in terms of measurements, was necessary. However, when the contitions have not reflected the reality of the problem, then the applications based on classical set theory have been expected to perform less favorably. As the deviation from ideal conditions for applications of classical set theory are
72
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
greater, so is the anticipated deterioration in reliability of results derived from applications based on classical set theory. The problems described above led to the development of Fuzzy Set Theory, first introduced by Lotfi Zadeh in 1962. Fuzzy set theory differs from conventional set theory in one crucial concept: it allows each element of a given set to belong to that set to some degree-in contrast to classical set theory according to which each element either fully belongs to the set or is completely excluded from the set. In other words, classical set theory represents a limited case of the more general fuzzy set theory. Elements in a fuzzy set X possess membership values between 0 and 1, whereas in classical set theory their membership values are either 0 or 1. The degree to which an element belongs to a given set is called Grade of Membership. In mathematical terms, if X is a collection (set) of objects denoted by x, then a fuzzy set F in X is a set of ordered pairs, F
x&)) I x E x },
= ((4
where x F ( x ) is the membership function of x in F which maps x to the membership space M = [0, 11.
In other words, a fuzzy set can be formally defined as follows. Definition 1 Let U be a universe of discourse and X c U be a fuzzy subset of U. The membership value of x E X is denoted xx(x) E [0, 13. The fuzzy subset X is denoted by {(x, x x ( x ) )I x E X } . When the range of the membership space M contains only two points 0 and 1, the set becomes non-fuzzy and x F ( x )becomes identical to the characteristic function of the non-fuzzy (rigid) set. Traditionally, the grade of membership 1 is assigned to those objects that fully and completely belong to “F,” while 0 is assigned to the objects that do not belong lo “F” at allt. Therefore, the conventional “Set Theory” becomes a limited case of the more general “Fuzzy Set Theory,” according to which the more an object x belongs to “F,” the closer to 1 is its grade of membership. Since the membership of an element is measured with numbers in the interval [0, 13, it is clear that fuzzy set theory contains classical set theory. The numbers that are generated by the membership function have one more function. They indicate to what degree an item belongs to the set. Thus, if we compare two elements, one with a grade of membership 0.75 and the other
’
In a normalized fuzzy set, the degree to which any given element belongs to the set is represented by any value between 0 and I (inclusive), with 0 indicating no membership, 1 indicating full membership, and any other number between 0 and 1 indicating intermediate degree of membership in that set.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
73
with 0.9, then it is logical to assume that the second element fits better to the definition of the characteristics of the set. The utilization of a fuzzy set thus avoids the "all-or-nothing" syndrome of classical set theory. Definition 2
The membership function x c ( x ) of the intersection C
=
A n B is defined by x E X.
xc(x) = min { x n ( x ) ,x B ( x ) i ,
Definition 3 defined by
The membership function x D ( x ) of the union D
x D ( 4 = max ( x A ( d ,xH(.x)j, Definition 4
x
E
=
A u B is
X.
The membership function of the complement of a fuzzy set
A, x ~ ( xis) defined by xE
x4(x) = 1 - x n ( x ) ,
x.
Thus x n ( x ) is called the membership function o r grade of membership of x in A which maps X to the membership space M . (When M contains only the two points 0 and I , A is non-fuzzy and x A ( x ) is identical to the characteristic function of the non-fuzzy set). The range of the membership function is a subset of the nonnegative real numbers whose supremum is finite. Elements with a zero degree of membership are normally not listed. 2.7.2
Set Theoretic Operations
Let A and B be fuzzy subsets of X . We can now discuss some basic operations performed on fuzzy sets. 1. Two fuzzy sets, A and B, are said to be equal ( A = B ) iff r
r
or, for all x E X ,
2. A is contained in B ( A I €3) iff c
J
c
xA(-y)/x X
5
J
xB(x)/.x. X
3. The union of fuzzy sets A and B is denoted by A u B and is defined by
where v is the symbol for max.
74
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
4. The intersection of A and B is denoted by A n B and is defined by
b
A nB p
(xA(x) A
xdx))lx
where A is the symbol for min. A justification of the choice of max and min was given by Bellman and Giertz (1973). 5. The complement of A is denoted by 2 and is defined by
qx
Namely, for all x
E
(1
- XA(X))/X.
=
1-XAW.
X, XA(X)
6. The product of A and B is denoted by A B and is defined by
Jx
AB
Thus, A", where
GI
XA(x)XB(x)/x*
is any positive number, should be interpreted as r
Jx
Similarly, if
GI
G I S U ~ , ~ , ( X ) I1,
is any nonnegative real number such that then G I A4
b
c(xA(x)/x.
As a special case, the operation of concentration can be defined as
CON(A)
A*,
while that of dilation can be expressed by DIL(A) 7. The bounded sum of A and B is denoted by A 0 B and is defined by n
A 0B S
Jx
1A
(xA(X)
+
XB(X))/X,
where + is the arithmetic sum. 8. The bounded difference of A and B is denoted by A 0B and is defined by A
where
-
0B 4
jxo
v (x,&)
is the arithmetic difference.
- x&))/x,
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
75
9. For fuzzy sets of numbers the left-square of A is denoted by ’A and is defined by ’A 4i
where V
I
= (.x’ x E
1”
xA(x)I.x2,
X ) . More generally. ,.
“A
J v XA(x)/xa,
where V = {.PI x E X}. 10. If A , , ..., A, are fuzzy subsets of X. and w, ..., (l)k are nonnegative “weights” adding up to unity, then a convex combination of A,, . . . , A , is a fuzzy set A whose membership function is expressed by k XA
where in this case
=w
+
I X A ~
”’
+ WkXAk
=
1
wjxA,?
j= I
+ (C) denotes the arithmetic sum.
11. If A,, . . . , A , are fuzzy subsets of X I , . . . , Xk, respectively, the Cartesian product of A,, . . . , A, is denoted by A , x . . . x A, and is defined as a fuzzy subset of X I x . . . x X , whose membership function is ex-
pressed by X A , X , . . X A ~ ( . .~.~, ,7xk) = X A 1 ( X I )
Equivalently, A,
X
... X A ,
= XI x
s
”’
(XAI(Xl) x
A
’
A
” ’
A XAk(.‘k).
’ . A XAk(Xk))/(X1,
’ ’ ‘
3
‘k).
x*
Example 1 We will now illustrate the above discussion with a specific example. Let X = { I , 2, 3,4, 5, 6, 7) and let
+ 11.5 + 0.816, B = 0.913 + 1/74 + 0.616.
A = 0.713
Then
+ 114 + 115 + 0.816, A n B = 0.713 + 0.616, A = 111 + 112 + 0.3/3 + 1/4 + 0.216 + 117, AB = 0.6313 + 0.48/6, A’ = 0.4913 + 115 + 0.64/6, 0.5A = 0.3513 + 0.515 + 0.416,
A U B = 0.913
76
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
+ 1/4 + 0.3616, DIL(B) = 0.95/3 + 114 + 0.77/6, A @ B = 1/3 + 1/4 + 1/5 + 1/6, B 0A = 0.2/3 + 1/4 + 0.216, 2A = 0.719 + 1/25 + 0.8136, 4A = 0.8181 + 1/625 + 0.611296.
CON(B) = 0.81/2
In many cases it is convenient to express the membership function of a fuzzy subset of the real line in terms of a standard function, whose parameters may be adjusted to fit a specific membership function in an approximate fashion. Two such functions, the S-function and the n-function, are defined by
I1
for u 2 y for u Iy
n(u;P, y ) =
In S(u; a, 8, y), the parameter P, p = ( a + y)/2, is the crossover point. In n(u; P, y), P is the bandwidth, that is, the separation between the crossover points of n, while y is the point at which n is unity. In some cases, the assumption that zAis a mapping from X to [0, 13 may be too restrictive, and it may be desirable to allow zAto take values in a lattice or, more generally, in a Boolean algebra. For most purposes, however, it is sufficient to deal with the first two of the following hierarchy of fuzzy sets. Definition 5
A fussy subset A of X is of Type 1 if its membership function = 2, 3.. . ,if zAis a mapping from X to the set of fuzzy of subsets of Type K - 1. For simplicity, it will always be understood that A is of Type 1 if it is not specified to be of higher type.
zA is a mapping from X to [0, 11; and A is of Type K , K
Example 2 Let U be the set of all nonnegative integers and X be a fuzzy subset of U labeled “small integers.” Then X is of Type 1 if the grade of
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
77
membership of a generic element x in X is a number in the interval [O, 11. F o r example, we can define X by
O n the other hand, X is of Type 2 if for each x in U, xX(.x) is a fuzzy subject of [0, 11 of Type 1; for example, for x = 10, small i n t e g e r s ( l 0 ) = low,
where low is a fuzzy subset of [0, say,
XI"&)
=
11whose membership function is defined by,
1 - S ( 0 ; 0, 0.25, O S ) ,
u E [O,
11,
which implies that xlOu= JO1
(1
-
S ( r ; 0, 0.25, 0.5))/0.
In certain cases we may demand that the membership value be greater than some threshold T E [0, 13. The ordinary set of such elements is the T-cut (also called a-cut) X , of X.
x,. = i Y E x I xx(.x) 2 Tj. The membership function of a fuzzy set X can be expressed in terms of the characteristic function of its T-cuts according to the formula
where XT(X) =
1 0
ifTxEXT otherwise.
One of the basic ideas of fuzzy set theory, which provides a general extension of nonfuzzy mathematical concepts of fuzzy environments, is the extension principle. The extension principle, which is essential in fuzzy set theory, indicates that a function f between two sets, namely /':X1+X2, can be extended to fuzzy subsets in the form of
f :x(XJ
+
X(X,),
This is the basic identity that allows the domain of the definition of a mapping o r a relation to be extended from points in X to fuzzy subsets of X . More specifically, suppose that .f is the mapping from X to Y and A is a fuzzy subset
78
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
of X expressed as A = Xl/.X,
+ . * .+
x,/X,.
Then the extension principle asserts that
f(A) = f(X,/XI
+ ... + X,/-%) = Xl/f(XI) + ... + X n / f ( X f l ) .
Thus the image of A under f can be deduced from the knowledge of the images xl, . . . , x, under f.This can also be easily extended to n-ary functions.
2.2
Fuzzy Sets vs. Probability
In this section we touch on the relation between fuzzy set theory and the theory of probability. The main issues are (i) Is it true, as some claim, that the concept of a fuzzy set is merely a disguised form of subjective probability? (ii) Are there problems that can be solved more effectively by the use of fuzzy-set techniques than by classical probability-based methods? Basically we use Zadeh’s response to these questions (Zadeh, 1980): In essence, the theory of fuzzy sets is aimed at the development of a body of concepts and techniques for dealing with sources of uncertainty or imprecision that are nonstatistical in nature. Propositions such as ‘‘x is a small number” or ‘‘x is a number smaller than 9” convey no information concerning the probability distribution of the values of x. In this sense, the uncertainty associated with the proposition ‘‘x is a small number” is nonstatistical in nature. While some scientific problems fall entirely within the province of probability theory and some entirely within that of fuzzy set theory, in most cases of practical interest both theories must be used in combination to yield realistic solutions to problems in decision analysis under uncertainty. Ordinarily, imprecision and indeterminacy are considered to be statistical, random characteristics and are taken into account by the methods of probability theory. In real situations, a frequent source of imprecision is not only the presence of random variables, but the impossibility, in principle, of operating with exact data as a result of the complexity of the system, or the imprecision of the constraints and objectives. At the same time, classes of objects appear in the problems that do not have clear boundaries; the imprecision of such classes is expressed in the possibility that an element not only belongs or does not belong to a certain class, but that intermediate grades of membership are also possible. Intuitively, a similarity is felt between the concepts of fuzziness and probability. The problems in which they are used are similar and even
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
79
identical. These are problems in which indeterminacy is encountered due to random factors, inexact knowledge, or the theoretical impossibility (or lack of necessity) of obtaining exact solutions. The similarity is also underscored by the fact that the intervals of variation of the membership grade of fuzzy sets and probability coincide. However, between the concepts of fuzziness and probability there are also essential differences. Probability is an objective characteristic; the conclusions of probability theory can, in general, be tested by experience. The membership grade is subjective, although it is natural to assign a lower membership grade to an event that, considered from the aspect of probability, would have a lower probability of occurrence. The fact that the assignment of a membership function of a fuzzy set is “nonstatistical” does not mean that we cannot use probability distribution functions in assigning membership functions. As a matter of fact, a careful examination of the variables of fuzzy sets reveals that they may be classified into two types: statistical and nonstatistical. The variable “magnitude of x” is an example of the former type. However, if one considers the “class of tall men,” the “height of a man” can be considered to be a nonstatistical variable. In the next section we discuss the relation of fuzziness and fuzzy statistics in the form of typicality theory, which is essential to the applicability of fuzzy set theory in artificial intelligence in general, and expert systems in particular.
3.
Fuzziness and Typicality Theory
Given a population of people, one can ask “What is the typical age of the group? or “What is the typical weight of the group?” In general, given a set of elements, we can ask questions about the typicality of the set. In this section we examine three types of typicality measures. The Fuzzy Expected Value (FEV) requires complete information about the set for its computation. The Fuzzy Expected Interval (FEI) represents the typical interval in cases where the information about the set is fuzzy or incomplete. And finally we examine the Weighted Fuzzy Expected Value (WFEV) and show why in some cases it better reflects the typical value of the set than the FEV. 3.1
The Fuzzy Expected Value
Let B be a Bore1 field (a-algebra)of subsets of the real line R. A set function defined on B is called a fuzzy measure if it has the following properties:
/A(-)
1. p ( 0 ) = 0 (0is the empty set); = 1;
2. p(Q)
80
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
3. If a, fl E B with c1 c fl, then p(a) I p(fl); 4. If { a j l 1 _< j _< a}is a monotone sequence, then lim [ p ( a j ) ] = p [ lim ( a j ) ] . j-m
j-m
Let x A be a monotonic B-measurable function such that x A E [0, I]. The Fuzzy Expected Value (FEV) of xA over the set A, with respect to the fuzzy measure p(.), is defined as
FEV(XA)=
SUP
{minCT, ~ ( t T ) l } ,
T~lO.11
where tT= {XIx A ( x ) 2 T } . Now, p { x I x A ( x ) 2 T } = fA(T)is a function of the threshold T which is a fuzzy measure. The actual calculation of FEV ( x A )consists of finding the intersection of the curves T = fA(T),which will be at a value T = H , so that FEV(xA)= H E 10, 11. From the definition of FEV it can be seen that the result of the calculation of FEV must lie within the interval [0, I ] . The mapping of p to the interval [0, 13 is performed by utilizing the appropriate characteristic function whereas the mapping of T is performed by finding the ratio of the population above a certain threshold. The following example illustrates how the FEV can be computed.
Example 3 In order to calculate the fuzzy expected value of a certain distribution, it is necessary to utilize an appropriate characteristic function. Let X be a set of OLD people. Then its characteristic function is given by:
i:
~ ( x= ) x/lOO
ifxsO if 0 < x < 100 if x 2 100
and let the distribution of the population be 10 people are 20 years old, 15 people are 23 years old, 15 people are 29 years old, 20 people are 33 years old, 25 people are 39 years old, and 15 people are 44 years old. What is the typical age of the group described above? The calculation of the typical age of the group of people described above requires mapping the ages of the population to the interval [0, I] and likewise
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
81
its distribution. Following the mappings we apply the definition of FEV to find the fuzzy expected value. S t e p I . Using the characteristic function we can map the ages of the population to the interval [0, 11.The result follows:
10 people are 20 years 15 people are 23 years 15 people are 29 years 20 people are 33 years 25 people are 39 years 15 people are 44 years
old old old old old old
x = 0.20, x = 0.23, x = 0.29, + x = 0.33, + x = 0.39, x = 0.44. +
+ --f
--f
S t e p 2. Using the information about the distribution of the population we can find how many people are above each threshold. It is easy to see that
100 people are 20 years old or older, 90 people are 23 years old or older, 75 people are 29 years old or older, 60 people are 33 years old or older, 40 people are 39 years old or older, and 15 people are 44 years old or older. Thus the final result of the mapping is
1.0 0.9 0.75 0.6 0.4 0.15
0.2 0.23 0.29 0.33 0.39 0.44
S t e p 3. Let L be a list containing the results of min(pi, xi). Hence L = 10.2, 0.23, 0.29, 0.33,0.39, 0.15).
S t e p 4. The final step in the evaluation of the fuzzy expected value consists of finding the highest value from L. Using L we compute
max ( L )= 0.39 Thus the FEV is 0.39 and the typical age of the group is 39 years old.
82
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
The Fuzzy Expected Interval
3.2
The concept of the fuzzy expected interval was developed in order to cope with incomplete (or fuzzy) information. The FEI must be interrelated with FEV in such a way that the result of the evaluation will be either FEV or FEI but in both cases it must have a meaningful mapping back to the original numbers. Theorems 1 through 6 are the basis for the development of the theory of the fuzzy expected intervals (Schneider and Kandel, 1987b). Definition 6 Let S and R be two intervals, S = {s1...s,,} and R { r l . . . r,}, such that each interval may contain a single element. Then;
=
1. ( S , R ) = S if for every si E S there is rj E R such that si > rj. 2. ( S , R ) = S if for every si E S there is rj E R such that si < rj.
NOTE: Obviously, S or R could be single element intervals (i.e. s 1 = s, and r I = r,). , two intervals such Theorem 7 . Let S = { s l . . . s , ) and R = { r l . - . r m ) be that R n S = 0. Then R if ri > s, max(S, R ) = W S if s1 > r,.
Theorem 2. Let S that R n S = 0. Then:
=
{sl...s,} and R = { r I... r,}, be two intervals such
min(S, R ) =
R S
Theorem 3. Let S = {sl ... s,} and R that R n S # @, S 4 R, and R 4 S. Then max(S, R ) =
R S
if r, < s1 if s, < r l . =
{ r l . - . r m ) ,be two intervals such
if r, > s, if s, > r,.
W
Theorem 4. Let S = ( s l . . . s,} and R = { r I ... r,}, be two intervals such S 4 R , R 4 S, and s, > rm. Then that R n S # 0, min(S, R ) =
R S
if r, < s, if s, < r,
W
Definition 7 Let S = {sl...s,} and R = { r l ...r m } , be two intervals such that R c S. Then there is an interval T such that 1. max(S, R, T ) = T, if for every ti E T there is sj E S such that t i 2 sj and for every ti E T there is rk E R such that ti 2 r k . 2. min (S, R, T ) = T, if for every ti E T there is sj E S such that ti I .sj and for every t i E T there is rk E R such that ti I rk.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
Theorem 5. Let S that R E S . Then
=
83
{ s I ~ ~ ~ s and , J . R = { r , ... r,,,),be two intervals such
rn max(S, R ) = r l "'s,,.
Theorem 6. Let S that R c S . Then
=
isI ...s,,f and R = { r l ...r,,,], be two intervals such
rn
min(S, R ) = s 1 "'r,,,. Definition 8
Let c( and /3 be intervals. Then we say that of r is greater then the upper bound of
p if the upper bound Example 4
r
c(
is higher than
p.
Let the characteristic function of the variable OLD be ifx 120.
~ ( x= ) x/IOO
Let the distribution of the population be
30 people are of 20 people are of 30 people are of 20 people are of
the age ranging from the age ranging from the age ranging from the age ranging from
10 to 20
40 to 50 50 to 70 80 to 90.
Using the definition of F E V and the characteristic function, we compute 11 and x: I' 1 .oo
0.70 0.50 0.20
I
?,
0.1 -0.2 0.4-0.5 0.5-0.7 0.8-0.9
Now, the mins of the pairs are min (1.00, [O. 1-0.21)
=
[O. 1-0.23
min (0.70, CO.4-0.51)
=
CO.4-0.51
min (0.50, [0.5-0.71)
=
0.5
min (0.20, [0.8-0.91)
=
0.2.
Thus the ordered list of the MINs computed above is L
=
i[O.lLO.2], 0.2, CO.4-0.5],0.5).
84
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
Now, according to the algorithm the maximum of L is max(C0.1-0.23, 0.2, CO.4-0.51, 0.5) = 0.5 From the example above, it can be seen that the evaluation of typicality values is possible in spite of vague information. However, we assumed a complete knowledge about the distribution of the population. Next we extend the theory to handle incomplete information about the distribution of the population.
3.2.1
Additions of Fuzzy Intervals
When the information about the distribution of the population is fuzzy, it is necessary to find an interval which will contain the unknown value. UB, the upper bound of pj, is given by j
i=j
UB=, j
max(pi1, Pi,) j- 1
1max(pi,, pi,) + 1min(pi,, Pi,) i= 1
i=j
where pi, is the lower bound of group i and pi, is the upper bound of group i. L B , the lower bound of pj, is given by i n
C.min(pi,, Pi,)
LB = j
i=J
$ min(pi1, Pi,) + c max(pi1, pi,)’ j- 1
3.2.2 Building a Mapping Table The function of the mapping table is to map fuzzy variables into corresponding values. The chosen values are subjectioe and domain dependent. The construction of the mapping table involves four steps: 1. Finding an appropriate range to the domain (it is logical to assume that the range of the variable O L D is between 0 and 120). 2. Finding the possible values for the variables in the range (in the variable PEOPLE we can have only positive integers). 3. Finding the adjectives that may describe objects in the domain (“almost,” “more or less,” etc.).
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
85
4. Performing the mappings of the adjectives to the corresponding values without violating the boundaries of the range of the domain and its possible values. Example 5
Assume we have the following information:
almost 20 people more or less 30 people. Using the four steps described above we have 1. The range of the number of people is between 0 and 'm. 2. The possible values are all positive integers. 3. The adjectives are a. almost b. more or less c. over d. much more than e. etc. 4. Perform the mapping of the linguistic adjectives into intervals. Each variable has its own mapping table. The name of the mapping table is the name of the variable associated with it. The mapping table may contain three parts: 1. List of adjectives that may be associated with the variable. 2. The lower bound-the lowest possible value a number associated with the variable and the adjective may accept. 3. The upper bound-the highest possible value a number associated with the variable and the adjective may accept.
Using some subjective judgements we can construct the following mapping table. TAn1.I:
1
EXAMPL.~: OF MAPPING 'rABLE for the variable PEOPLE
Adjective almost more or less over much more than
LH
UB
Y -
10""
x-l
Y -
lo",,
Y
Y + l 2Y
Y
+ lo?<, + lo",, ri
86
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
The next example summarizes the methods developed in this section to find the fuzzy expected interval. It also introduces the use of fuzzy expected intervals in fuzzy expert systems.
Example 6
Suppose the following rule is in the Knowledge Base:
IF the typical age of the group is more or less 36 THEN .... Suppose the user provides the following information: Almost I5 people are between the ages 20 and 25. 20 to 25 people are of age 35. 20 people are between the ages 40 and 50. Will the rule described above fire? To solve this problem we have to 1. Map the ages into corresponding values (using some characteristic function). 2. Interpret the adjective “almost” to some value (using some mapping table). 3. Find the relation between “almost” and the subject. 4. Find the typical age of this group. 5. Interpret the adjective “more or less” to some value (using some mapping table). 6. Find the relation between “more or less” and the subject. 7. Try to match the data (provided by the user) and the premise of the rule. 8. If there is a match then fire the rule.
Using Table I, and the characteristic function
we can check whether or not the rule above will fire. Step 1. Find all the xs using the characteristic function
almost 15 people 20 to 25 people 20 people
1
0.2-0.25 0.35 0.4-0.5
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
Step 2.
87
Use Table I to map adjectives into the corresponding values:
I
row data
x
I
13 to 14 people 20 to 25 people 20 people
0.2-0.25 0.35 0.4-0.5
S t e p 3. Find the ps:
0.74-0.77 0.33--0.37
Step 4.
x
I
l’
I
0.2 -0.25 0.35 0.4 0.5 -
Find the MIN of each pair: min( I , CO.2-0.251)
=
CO.2-0.251
min (CO.74-0.771, 0.35) = 0.35 min(C0.33-0.371, CO.4-0.51) Step 5.
CO.33-0.371.
Form a list L which is the result from step 4:
L Step 6 .
=
=
{[0.2-0.25],0.35, CO.33-0.371).
Find the MAX of L : max (CO.2-0.251, 0.35) = 0.35 max (0.35, CO.33-0.373)
=
C0.35-0.373.
Thus the result of the computation is CO.35-0.371. Mapping back from xs to the corresponding ages results in The expected range of ages of the above population is 35 to 37. Thus, the typical age of the population provided to the expert system by the user is evaluated. The next step is to check if the rule will fire. Using Table I we can find that “more or less 36” is the range of ages from 32 to 40. Since the range of the population, provided by the user, is within the range of the rule, the premise becomes true and the rule will fire.
88
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
3.3 The Weighted Fuzzy Expected Value Our objective is to try to define the “most typical value” of a given membership function x A . We shall refer to this value as the “weighted fuzzy expected value”- WFEV-and start by formulating the principles by which we derive our definition. 1. Population effectiveness: Let A be a fuzzy set with populations n, and n, and membership values x, and x, respectively. If n, > n,, then WFEV should be “much closer” to x, than to x2. 2. Distance from the mean value: Let A include the population ni with membership xi. Then the effect of this population on the WFEV should be a decreasing monotonic function of [xi - WFEVI.
Definition 9 Let w ( x )be a nonnegative monotonically decreasing function defined over the interval [0, 11 and A a real number greater than 1. The solution s of s=
xIw(Ix,
- sI)a:
w(lx, - sl)n:
+ . . . + X m ~ ~ ~ ( I -X msl)nk
+ ... + o(lx,,, s l ) n i -
is called the weighled fuzzy expected value of order A with the attached weight A). function o,and is denoted by WFEV (o, The parameter IL measures the population’s density effect on WFEV. We generally take w ( x ) = e-8”; /I > 0 and the parameters 2, B are found sufficient for determining the “most typical value” of x A . However, a procedure that, based on {xi, pi; 1 I i s n}, determines a priori the numerical values of A and p, would certainly be desirable. In most tests we successfully use I. = 2, = 1. The equation above, which is a nonlinear equation of the form x = F(x), is solved by a standard iteration method: x,+ = F(x,), startis generally achieved ing for example with xo = FEV. An accuracy of after three to four iterations. The above definition is clearly in agreement with the two previously discussed principles, and the next example demonstrate WFEV’s performance. Example 7
Let A consist of the following four groups
4
0.375 0.625 0.875
31 43
Here we have FEV
= 0.625,
MEAN
= 0.65,
MEDIAN = 0.625,
WFEV
= 0.745.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
89
The majority of the elements have grade of membership either 0.625 or 0.875. The “most typical value” is expected to fall somewhere in the middle, but to indicate also the existence of the “few” elements with memberships 0.375 and 0.125. W FEV is clearly the best choice. Next, consider a continuous fuzzy model where the discrete population i n ) are replaced by a population density function p ( x ) . values ini, I I i I The weighted fuzzy expected value is defined as the solution s to
j(,’
XC~I(lX-
sl)p”.x)d.x
S =
C.O(lX
I),!
- sl)p“(u)d.x
The mean value is given by Jol
s p ( s )dx
MEAN(z) = p ( .x) d x Sol
while the FEV is the unique solution to the equation
It1 So’
P ( f ) dt
y
=
P ( f ,dr ’
Example 8 Let p ( x ) = .x. Here we expect the most typical value to fall within the upper half of the interval [0, 13, which contains 75‘;; of the population. Simple integration yields (i = .2, /? = I ) : MEAN
= 0.667,
FEV
= 0.618,
WFEV
= 0.760.
In order to realize that WFEV is a “more typical value” than FEV, let us approximate the continuous model by a discrete model of four groups, concentrated around the following grades of memberships:
X , = 0.125,
x2 = 0.375,
x3 = 0.625,
x4 = 0.875.
The populations attached to these values total 100 elements and are given by p ( x) dx
n.’ =
+:Jl
f l
100,
90
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
where x i = 0.25(i - l), 1 I i I 5. We thus get the four-subset case (Example 7 ) I
1,
-
1 2 3
4
“i
7
0.125 0.375
19
0.625 0.875
43
31
for which WFEV performs the best. 4.
Applications of Fuzzy Set Theory to Expert Systems
Expert Systems (sometimes also called Knowledge-Based Consultant Systems) are structured representations of data, experience, inferences, and rules that are implicit in the human expert. Expert systems draw conclusions from a store of task-specific knowledge, principally through logical or plausible inference, not by calculation. The objective of an expert system is to help the user choose among a limited set of options, actions, conclusions, or decisions, within a specific context, on the basis of information that is likely to be qualitative rather than quantitative. The creation of an expert system primarily revolves around the task of putting specific domain knowledge into the system. Expert systems support this task by explicitly separating the domain knowledge from the rest of the system into what is commonly called the knowledge base. The everyday usage of an expert system requires access by the end user to the knowledge base. This is accomplished by a software system called the inference engine. It interacts with the user through a user-interface subsystem. Each production rule in an expert system implements an autonomous chunk of expertise that can be developed and modified independently of other rules. When thrown together and fed to the inference engine, the set of rules behaves synergistically, yielding effects that are “greater than the sum of its parts.” Reflecting human expertise, much of the information in the knowledge base of a typical expert system is imprecise, incomplete, or not totally reliable. For this reason, the answer to a question or the advice rendered by an expert system is usually qualified with a “certainty factor” (CF), which gives the user an indication of the degree of confidence that the system has in its conclusion. To arrive at the certainty factor, existing expert systems such as MYCIN, for medical diagnosis of infectious blood diseases (Stanford University), PROSPECTOR, for location of mineral deposits (Stanford Research Institute), and others employ what are essentially probability-based methods.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
91
However, since much of the uncertainty in the knowledge base of a typical expert derives from the fuzziness and incompleteness of data rather than from its randomness, the computed values of the certainty factor are frequently lacking in reliability. By providing a single inferential system for dealing with the fuzziness, incompleteness, and randomness of information in the knowledge base, fuzzy logic furnishes a systematic basis for the computation of certainty factors in the form of fuzzy numbers. The numbers may be expressed as linguistic probabilities or fuzzy quantifiers, for example, ‘‘likely,’’ “very unlikely,” “almost certain,” “most,” “almost all,” and “frequently.” In this perspective, fuzzy set theory is an effective tool in the development of expert systems (Zadeh, 1983a; Gupta et d.,1984). Although fuzziness is usually viewed as undesirable, the elasticity of fuzzy sets gives them a number of advantages over conventional sets. First, they avoid the rigidity of conventional mathematical reasoning and computer programming. Second, fuzzy sets simplify the task of translating between human reasoning, which is inherently elastic, and the rigid operation of digital computers. In particular, in common-sense reasoning, humans tend to use words rather than numbers to describe how systems behave. Finally, fuzzy sets allow computers to use the type of human knowledge called common sense. Common-sense knowledge exists mainly in the form of statements that are usually, but not always, true. Such a statement can be termed a disposition. A disposition contains an implicit fuzzy quantifier, such as “most,” “almost always,” “usually,” and so on. In the existing expert systems, uncertainty is dealt with through a combination of predicate logic and probability-based methods. A serious shortcoming of these methods is that they are not capable of coming to grips with the pervasive fuzziness of information in the knowledge base and, as a result, are mostly ad hoc in nature. A n alternative approach to the management of uncertainty is based on the use of fuzzy logic, which is the logic underlying approximate or, equivalently, fuzzy reasoning. A feature of fuzzy logic which is of particular importance to the management of uncertainty in expert systems is that i t provides a systematic framework for dealing with fuzzy quantifiers. In this way fuzzy logic subsumes both predicate logic and probability theory and makes it possible to deal with different types of uncertainty with a single conceptual framework. 4.1
Fess-A
Reusable Fuzzy Expert System
For an expert system which is intended to be used in different applications the combination of evidence method must be easily changed. This is accomplished in the Fess system by having the combination method@) confined to a specific code unit. New methods may be plugged in without
92
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
affecting the rest of the system. This feature is one of the advantages provided by the language in which Fess was developed. Each conclusion in Fess is seen as consisting of two parts: the positive part X is Y and the negative part X is not Y. Our certainty for a conclusion depends upon how much we believe in its component parts. A strong belief in the negative part with a small belief in the positive part of the conclusion will make us certain that the negative part of the conclusion may be considered true to some degree. For each conclusion the system maintains a measure of belief and a measure of disbelief. The measure of disbelief may be described as the measure of belief in the negative part of the conclusion or the negation of the conclusion. The accrual method of evidence combination is the most generally applicable of the various methods available (Hall and Kandel, 1986). It was chosen for the initial implementation of Fess. The method of calculating the belief and disbelief measures will be detailed in the following. For each conclusion, denoted by C i ,there will be some relation(s), denoted by R j , which provide the evidence necessary to decide whether to believe the positive or negative parts of the conclusion or that no decision can be made with the evidence. With each relation R j , there will be an associated certainty factor, denoted by CFj(Ci),which indicates the strength of the evidence about conclusion Ci provided by the relation. The certainty factors of the relations used on a conclusion Ci accrue to provide us with our measure of belief in the following manner. Definition 10 The measure of belief in a conclusion Ci is
+
MB(Ci, t ) = MC(Ci, t - 1) (1 - MB(Ci, t - 1)) * CFj(Ci), where t is the current number of relations that have been used to determine Ci and MB(C,, t - 1 ) denotes the belief value before the current relation was used. MB(Ci, - 1 ) = 0. For convenience allow MB(Ci)to denote the value for the measure of belief function at the current t , the number of relations used to determine the belief in the conclusion. The measure of disbelief is calculated in a corresponding manner shown below. Definition 11 The measure of disbelief in a conclusion Ci is
MD(C,, t ) = MD(Ci, t
-
1) + (1
-
MD(Ci, t - 1)) * CFj(not C,),
where t is the current number of relations that have been used to determine Ci and MD(Ci, t - I ) denotes the disbelief value before the current relation was used. MD(Ci, - 1) = 0. Again for convenience we allow MD(C,) to denote the value for the measure of disbelief function at the current t , the number of relations used to determine the disbelief in the conclusion. It is important to note that the measure of belief
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
93
and measure of disbelief are calculated separately. Therefore it is possible that, MB(Ci) # 1 - MD(C,).To obtain relations that act on both the positive and negative part of a conclusion may require a bit more effort by the knowledge engineer and engender a very small number of additional queries of the system user. However, the clarity of reasoning provided by such a system is well worth the tradeoff. Given that measures of belief and disbelief in a conclusion have been calculated, we much come to an overall certainty for the conclusion. The overall certainty lies in the interval [0, 13. The value 0.5 indicates neither belief nor disbelief, in the interval (0.5, 1.01 our belief increases as the certainty approaches one, and in the interval [0,0.5) our disbelief increases as the certainty approaches zero. The overall certainty of a conclusion Ci is calculated as shown in the following definition.
Definition12 The overall certainty of a conclusion is defined as OC(Cj)= 0.5
-
0.5 * (MD(Ci) - MB(Ci)).
The overall certainty defined above has the required characteristics, which were previously discussed. We now have the mechanisms for determining whether to believe or disbelieve a conclusion based on one or more pieces of evidence about it. Some strategy about the number of relations to use to determine a conclusion and when to stop trying to determine a conclusion are necessary for an expert system implementation using the above combination of evidence scheme. All applicable relations may be applied to determine a conclusion. This is a straightforward scheme, but it is not very efficient. If there are many relations and we use some that give us a strong measure of belief and no measure of disbelief, should we continue processing'? Unless we have an unusual application for the expert system, we do not. In fact, to continue will require a great deal of extra processing time, and an extremely slow expert system is not likely to be accepted. This is clearly an area in which different approaches may be suitable for distinct problem areas. Again in Fess the algorithm that governs the determination of a conclusion is confined to a specific section and can be easily changed. The current algorithm used in Fess to determine a conclusion will be described in the following text. If more than one implication relation that acts on a conclusion exists, we will use a minimum of two relations to determine the conclusion. We alternate in using relations that act to increase our measure of belief and our measure of disbelief regarding the current conclusion. This process continues until we run out of relations that act to increase MB or MD, or we have tried two of each type of relation and the overall certainty has not changed by more than some value delta. If we find the overall certainty has changed by more than the deltu, we will continue the process until we run out of relations or until attempts to
94
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
increase our MB and MD results in a overall certainty which has not been changed by delta. If there are not more relations providing evidence upon the MB or MD, but we have not used at least two relations or our overall certainty changed by more than delta we will continue using relations to determine the overall certainty. This continues until two relations have been used, no more exist which act on this conclusion, or the overall certainty is not changed by more than delta. The algorithm described above, while somewhat complex, provides for efficient search and conclusion determination. It attempts to ensure that both the measure of belief and disbelief are fully determined, so that an erroneous reasoning path is not followed. 4.2
COFESS-Cooperative
Fuzzy Expert Systems
COFESS (Schneider and Kandel, 1987b),is an expert system which utilizes fuzzy set theory to recognize patterns. Its decision making is based on the concept of the FEV and the FEI (Schneider and Kandel, 1987a). COFESS utilizes three distinct fuzzy expert systems to recognize an object. The first expert system (COFESl) determines the next step in the recognition process. It employs a knowledge base (called PKB) and an inference engine for its decision making. The result of the evaluation is transferred to the second expert system. The second fuzzy expert system (COFES2) exploits the concept of fuzzy relations to determine the relations between the features of the object to be recognized. This expert system uses two types of relations (these relations are stored in a knowledge base called RKB): 1. Relations that determine the local area where the examined feature can
be found. 2. Relations that determine the physical relations between the examined feature and other features. The coordinates of the feature’s local area are transferred to COFESl and through it to the third expert system (COFES3). COFES3 utilizes various algorithms to perform the pattern matching between the digitized picture and the feature as described by strings stored in its knowledge base (denotes SKB). The result of the recognition process is returned back to COFESl. COFES 1, then, transfers the knowledge to COFES2 for evaluation. This time COFES2 uses relations of the second type to determine if the feature that was recognized is really the feature that was expected to be recognized. This is repeated until COFESl fires a concluding rule, which terminates the pattern recognition process.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
95
The decision making in the recognition scheme involves numerical computations. They are related to the evaluation of the information accumulated during the process, in particular to the recognition’s reliability, which is represented by the FEV. Consider a given object composed of distinct features to which COFESS is applied. COFES3 uses a simple pattern matching algorithm to recognize any given feature. I t matches strings that are pre-defined in its knowledge base to those of the digitized picture. Each string that is recognized is associated with some certainty factor ( C F ) , which is a number between 0 and I . In order to define the feature’s certainty factor we utilize the concept of the FEV or FEI. Let A be a fuzzy set defined as A = “perfect matches”
and let N be the number of strings participating in the pattern matching process related to a given feature. Let N
=
n,
+ n, + ... + n,,
where n iis the number of strings participated in the matching process with the certainty factor CF, = x i . The certainty by which the feature is recognized is defined by
CF(/hature) = FEV(n,,xl,.. . ,n , , ~ , ) , or CF(,feature) = FEI(n,,xl,.. . ,n,,
x,).
The CF of the feature together with the recognized coordinates are passed to COFES2. COFES2 utilizes relations of type 2 and the data provided to it by COFES3 to determine whether or not the feature that was recognized by COFES3 is really the feature that was intended to be recognized. This evaluation generates a second CF. The overall CF that determines how well a feature is matched against the knowledge bases is defined as
CF = FEV(CFfrorn
COFESS? CFfrorn COFESZ )?
or CF = FE1(cFfrorn
COFESS * CFfrorn
COFESZ ).
The certainty factor in which the object is recognized depends on the structure of the concluding rule and the logical relations between its clauses.
4.3 Example of COFESS In the following section we show how we can recognize general structures. Let A be a structure that can form five rectangles one next to the other and let B
96
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
be a structure that can form five rectangles in a cross-like structure. The five rectangles of A will be called a l , a2, a3, a4 and a5 and the rectangles that form structure B will be referred to as bl, b2, b3, b4 and b5. Let the knowledge base of COFESl (denoted by PKB) be r 1 if not the a l of the A is recognized then the a1 of the A is to recognize r 2 if not the a2 of the A is recognized then the a2 of the A is to recognize r 3 if not the a3 of the A is recognized then the a3 of the A is to recognize r 4 if not the a4 of the A is recognized then the a4 of the A is to recognize r 5 if not the a5 of the A is recognized then the a5 of the A is to recognize r 6 if not the bl of the B is recognized then the bl of the B is to recognize r 7 if not the b2 of the B is recognized then the b2 of the B is to recognize r 8 if not the b3 of the B is recognized then the b3 of the B is to recognize r 9 if not the b4 of the B is recognized then the b4 of the B is to recognize r 10 if not the b5 of the B is recognized then the b5 of the B is to recognize r 11 if the a1 of the A is recognized and the a2 of the A is recognized and the a3 of the A is recognized and the a4 of the A is recognized and the a5 of the A is recognized then the recognition of A is done
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
r 12 if the b l of the B is recognized and the b2 of the B is recognized and the b3 of the B is recognized and the b4 of the B is recognized and the b5 of the B is recognized then the recognition of B is done Let the knowledge base of COFES2 (denoted by RKB) be a1 is left to a2 a2 is left to a3 a 3 is left to a4 a4 is left to a5 a1 length is more-or-less I the length of a2 a2 length is more-or-less 1 the length of a3 a3 length is more-or-less 1 the length of a4 a4 length is more-or-less 1 the length of a5 a1 width is more-or-less 1 the width of a2 a2 width is more-or-less 1 the width of a3 a3 width is more-or-less 1 the width of a4 a4 width is more-or-less 1 the width of a5 b l is above b2 b l is above b3 bl is above b4 b2 is left to b l b2 is left to b3 b2 is left to b5 b5 is below b2 b5 is below b3 b5 is below b4 b4 is right to b3 bl length is more-or-less 1 the length of b2 b2 length is more-or-less I the length of b3 b3 length is more-or-less 1 the length of b4 b4 length is more-or-less 1 the length of b5 b l width is more-or-less 1 the width of b2 b2 width is more-or-less 1 the width of b3 b3 width is more-or-less 1 the width of b4 b4 width is more-or-less 1 the width of b5
97
98
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
Now, let the digitized picture be
..... ..... ..... .....
..... .....
.... .. .. .. .. ....
..... ..... The recognition process follows: COFESl : Request to COFES2 to initialize system. COFES2: Request arrived from COFES 1 to initialize the system. The new relations found: a1 left to a3 a2 left to a4 a3 left to a5 bl above b5 b2 left to b4 a1 left to a4 a2 left to a5 a1 left to a5. COFESl : Request to COFES3 to initialize the system. COFES3: Request arrived from COFESl to initialize the system. COFESl: Request to COFES2 to find area for al. COFES2: Request arrived from COFESl to find area for al. The area is found to be 1 1 80 80. COFESl : Request to COFES3 to recognize a1 in area 1 1 80 80. COFES3: Recognition of a1 is performed with C F = 0.83. The area found is 4 31 7 36.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
99
COFES 1 : Request to COFES2 to check area for a l . COFESZ: The C F is found to be 0.83. The process continues until all elements are recognized. The final result is: a1 was recognized with C F = 0.83 a2 was recognized with CF = 0.80 bl was recognized with C F = 0.83 b2 was recognized with C F = 0.75 b3 was recognized with C F = 0.86 b4 was recognized with C F = 0.86 b5 was recognized with C F = 0.75. Thus only rule 12 can be fired (since other rules that were fired are not concluding rules) and the final conclusion is
CONCLUSION # I (rule 12): The recognition of B is done with CF = 0.75. 4.4
Fuzzy Relational Knowledge Base
Much of human reasoning deals with imprecise, incomplete, or vague information. Therefore, there is a need for information systems that allow representation and manipulation of imprecise information in order to model human reasoning. (Zemankova-Leech and Kandel, 1984). The Fuzzy Relational Knowledge Base (FRKB) model, based on the research in the fields of relational data bases and theories of fuzzy sets and possibility, is designed to satisfy the need for individualization and imprecise information processing. The FRKB model design addresses the following: Representation of imprecise information. Derivation of possibility/certainty measures of acceptance. Linguistic approximations of fuzzy terms in the query language. Development of fuzzy relational operators (IS, AS.. . AS, GREATER,...). 5. Processing of queries with fuzzy connectors and truth quantifiers. 6. Null value handling using the concept of the possibilistic expected value. 7. Modification of the fuzzy term definitions to suit the individual user.
1. 2. 3. 4.
Such a knowledge base, in the form of an FRKB or a modified version thereof, is the basic unit of the “soft” expert system (SES).
100
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
A fuzzy relational knowledge base is a collection of fuzzy, time-varying relations which may be characterized by tables or functions and manipulated by recognition (retrieval) algorithms or translation rules. The organization of FRKB can be divided into three parts:
1. Value knowledge base (VKB), 2. Explanatory knowledge base (EKB), and 3. Translation rules.
The VKB is used to store actual data values, whereas the EKB consists of a collection of relations or functions (similarity, proximity, general fuzzy relations, and fuzzy set definitions) that “explain” how to compute the degree of compliance of a given data value with the user’s query. This part of the knowledge base definition can be used to reflect the subjective knowledge profile of a user. Data Types. The domains in the FRKB can be of the following types: 1. Discrete scalar set (e.g., COLOR = green, yellow, blue). 2. Discrete number sets, finite or infinite (limited by the maximum computer word size and precision). 3. The unit interval [0,1].
The attribute values are 1. Single scalars or numbers. 2. A sequence (list) of scalars or numbers. 3. A possibilistic distribution of scalar or numeric domain values. 4. A real number from the unit interval [0,1] (membership or possibility distribution function value). 5. Null value.
In general, if A iis an imprecise attribute with a domain Di, then an attribute denoted by nA. value can be a possibilistic distribution specified on Di, Relations defined in the EKB are used in translation of fuzzy propositions. In essence, they relax the dependence of relational algebra operators on the regular relational operators (=, I=;<, >, 2,S ) . These include ( i ) Similarity relation: Let Dibe a scalar domain, x, y E Di.Then s ( x , y ) E [0,1] is a similarity relation with the following properties: Reflexivity: s(x, x) = 1 Symmetry: s(x, y ) = s(y , x) Transitivity: s(x, z ) 2 max {min(s(x,y),s(y,z))). YED
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
101
Di be a numerical domain and x, y , z E Di; 11 is a proximity relation that is reflexive, symmetric, and
( i i ) Proximity relation: Let p(x, y ) E [0,
with transitivity of the form p ( x , z ) 2 max (p(.x,y ) * p( y. 4). ytD,
The generally used form of the proximity relation is p ( x , y ) = ealX yI, ~
where fl > 0.
This form assigns equal degrees of proximity to equally distant points. A general fuzzy relation (link) can be defined in either VKB or EKB. A link can be used to express relationships that are not necessarily reflexive and symmetrical, but may obey a specific transitivity, or transitivity improvement. Typical relations that can be represented by a link are freidnship or influence among the members of a group. More complex queries in the FRKB system are evaluated by applying rules of 1. Fuzzy modifiers. 2. Fuzzy relational operations. 3. Composition. 4. Qualified propositions.
The Relational Knowledge Base structure combined with the theory of fuzzy sets and possibility provides the solid theoretical foundation. The query language permits “natural-language-like” expressions that are easily understood by users and can be further developed to incorporate fuzzy inferences or production rules. The Value Knowledge Base appears to be an adequate schema for imprecise data representation. The Explanation Knowledge Base provides the means of individualization, and may be used to extend the query vocabulary by defining new fuzzy sets in terms of previously defined sets. This feature can become very useful in compounding knowledge, and it can be projected that the underlying structure can be utilized in knowledge extrapolation, or learning. Hence, it can be concluded that the FRKB system can serve as the data base in the implementation of such “soft” expert systems.
5.
Conclusion
Fuzzy set theory is not just another buzzword. In this paper we have tried to capture some of the relevant experience with imprecise systems that clearly fit our view of fuzzy structures,
102
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
The main reason for using fuzzy set theory in artificial intelligence and expert systems is that much of the information that is resident in knowledgebased systems is uncertain in nature. Uncertainty is a multifaceted concept that relates to the notions of imprecision, randomness, incompleteness, unreliability, vagueness, and fuzziness. One of the important advantages of fuzzy set theory is that, by providing a logical conceptual framework for uncertainty management, it makes it possible to deal uniformly with the representation of any inference in artificial intelligence and expert systems. One such example is the concept of a re-usable expert system which makes use of fuzzy reasoning techniques and a design methodology (Fess). A reusable expert system is one in which no domain knowledge is incorporated in the inference engine. Any necessary domain knowledge is contained solely in the knowledge base(s). Only the incorporation of a new knowledge base is necessary for the system to be applied to a new application area. We have shown to what extent these concepts succeeded and where they stand in relation to the field of expert systems. While Fess is a re-usable expert system, the question of what, if any, types of applicatons it is best suited for may be considered. All the tools and shells previously built have been found to be best suited to certain problem types. Fess is very useful for all types of classification and diagnosis problems. Its ability to forward and backward chain while providing many alternatives makes it useful for these problems. It should also prove quite useful for planning and forecasting problems. It will be extremely useful for problems in which direct interface with a user is not desired. This is due to the fact that it has a well-developed interface to the file system. It has facilities for interpreting information from files, which may be provided by any mechanism. The information may be the output of other computer programs or data. These facilities provide for standard test cases to be rerun after a change is made to the knowledge base or for problems to be solved with no user interaction. If the domains in the knowledge base are set up to indicate that our information will come from files, the system will operate effectively without user interaction. This mode of operation has been extensively tested and is viable. This provides potential robotics and image analysis uses. Overall the system should be of use in many different problem domains. The use of fuzzy sets and logic in the system provides a viable uncertainty handling method. This provides the system with the ability to be successfully applied to problems that have uncertainty associated with them. Overall Fess is robust and capable of providing expertise on a diverse set of problems. In a conventional expert system, if the input does not perfectly match the situations described in the rules, deadlock occurs and no action is taken. But if a rule were written to match every possible situation, the system would have to thumb through too many to act very quickly.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
103
Using fuzzy logic, the expert system accepts vague data, compares it to all the rules in its memory simultaneously and assigns each rule a weight. The highest weights are given to the rules that best match the data. The decision is based on the combined recommendations of these rules. While many applications of fuzzy set theory are still in an early stage of development, it seems probable that in the next decade fuzzy logic will become routinely applied in many areas of artificial intelligence where communication with people or imitation of their thought processes is involved. This may help to bridge the gap between the analogic and flexible thinking of humans and the rigid framework of present computers.
REFERENCES Adlassnig, K. P. (1980).A Fuzzy Logical Model of Computer-Assisted Medical Diagnosis. Merh. Infirm. Med. 19, 141- 148. Bandler, W., and Kohout, L. J. (1984). The four modes of inference in fuzzy expert systems. In Cybernetics and Systems Research 2 (R. Trappel. ed.). North-Holland. Bellman. R. E., and Giertz, M. (1973):On the Analytic Formalism of The Theory of Fuzzy Sets. In& Sci, 5, 149-156. Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York. Carlsson, C. ( 1983). An approach to handling fuzzy problem structures. Cybernetics and Systems 14(l),33-54. Czogala, E., and Pedrycz, W. (1983).O n the concept of fuzzy probabilistic controllers. Fuzzy Sets and Systems 10 (Z), 109-122. De Mori, R. (1983). “Computer Models of Speech Using Fuzzy Algorithms.” Plenum Press, New York. Dubois, R. (1979). “Fuzzy Sets and Sysfc>ms: Theory and Applications.” Academic Press, New York. Goguen, J. A. (1967). L-Fuzzy sets. J. Math. Anal. und Appl. 18, 145-174. Goodman, 1. R., and Nyguen, H. T. (1985). Uncertainty Models / o r Knowledge-Based Systems, North Holland. Gupta, M. M.. and Sanchez, E. (Editors) (1982). “Fuzzy Information und Decision Processes.” North-Holland. Gupta, M. M., Kandel, A,, Bandler, W.. and Kiska. J. B. (Editors)(1984).“Approximute Reasoning in Expert Systems.” North-Holland. Gupta, M. M. (Editor),Ragade, R. K . and Yager, R. R. (Assoc. Editors)(l979),“Advancesin Fuzzy Sef Theory and Applications.” North-Holland. Gupta, M. M. (Editor), Saridis, G. N., and Gaines. B. R. (Assoc. Editors) (1977).“Fuzzy Automata and Decision Processes.” North-Holland. Hall, L. O., and Kandel, A. (1986). “Designing Fuzzy Expert Systems.” Verlag, TUV Rheinland, Koln, W. Germany. Kacpryk, J. (1983).“Multistage Decision-Making Under Fuzziness.” Verlag. TUV Rheinland, Koln, W. Germany. Kandel, A. (1982). “Fuzzy Techniques in P a t t c w Recognition.” Wiley Interscience, New York. Kandel, A. ( 1986).“Fuzzy Marhematical Techniques wirh Applications.” Addison-Wesley, Reading, Massachusetts.
104
ABRAHAM KANDEL AND MORDECHAY SCHNEIDER
Kandel, A., and Lee, S.C. (1979). “Fuzzy Switching and Automata- Theory and Applications.” Crane, Russak and Co., New York, and Edward Arnold, London. Kaufmann, A. (1973). “lntrodurtion a la Thiorie des Sous-Ensembles Flous, I : Eliments Thioretiques de Base.” Masson et Cie, Paris, France. Kaufmann, A. (1975). “lntroduction d la Thiorie des Sous-Ensembles Flous, 2: Applications ci la Linguistique et u la Sirnantique.” Masson et Cie, Paris, France. Kaufmann, A. (1975). “lntroduction u la Theorie des Sous-Ensembles Flous, 3: Applications a la Classification et la Reconnaissance des Formes, aux Automates et aux Systems, aux Choix des Critarrs.” Masson et Cie, Paris, France. Kaufmann, A. (1975). “Theory of Fuzzy Subsets. Vol. 1.” Academic Press, New York. Kaufmann, A. (1977). “lntroducrion a la Theorie des Sous-Ensembles Flous, 4: Compliment et Nouoelles Applications.” Masson et Cie, Paris. France. Kaufmann, A. (1979). “Compliments u la Thiorie des Sous-Ensembles Flous.” (unpublished volumes). Kickert, W. J. M. (1978). “Fuzzy Theories on Decision-Making.” Mijhoff, Leiden, The Netherlands. Lasker, G. E. (Editor) (1981). “Applied System and Cybernetics. Vol. lV.” Pergamon Press, New York. Mamdani, E. H.,and Gaines, B. R. (Editors) (1981). “Fuzzy Reasoning and I t s Applications.” Academic Press, London. Negoita, C. V. (1981). “Fuzzy Systems.” Tunbridge Wells, U.K. Negoita, C. V. (1983). Fuzzy sets in decision support systems. Human Systems Management 4, 21-33. Negoita, C. V. (1984). “Fuzzy Systems and Expert Systems.” Benjamin/Cummings Publishing Co., Menlo Park California. Negoita, C. V., and Ralescu, D. A. (1975). “Applications cf Fuzzy Sets to Systems Analysis.” Birkhauser Verlag, Basel. Nishida, T., and Takeda, E. (1978). “Fuzzy Systems and its Applications.” Morikita, Tokyo. (In Japanese) Sanchez, E. (1984). Solution of fuzzy equations with extended operations. Fuzzy Sets and Systems I2 (3), 237-248. Schmucker, K. J. (1984). “Fuzzy Sets, Natural Languages, Computations, and Risk Analysis.” Computer Science Press, Maryland. Schneider, M., and Kandel, A. 1987a. On the Theory of Fuzzy Expected Intervals and their Applicability to Fuzzy Expert Systems. Proc. fnt’l. Symp Fuzzy Systems and Knowledye Enqineeriny. Schneider, M., and Kandel, A. I987b. On Fuzzy Reasoning in Expert Systems. Proc. IY87 Int’l. Symp. Multiple-oalued Loyic. Skala, H . J., Termini, S. and Trillas, E. (Editors) (1984). “Aspects of Vagueness.” D. Reidel Publishing Co., Dordnecht. Sugeno, M., and Takagi, T. (1983). Multi-dimensional fuzzy reasoning. Fuzzy Sets and Systems 9 (3), 313-326. Takeguchi,T., and Akashi, H. (1984). Analysis of decisions under risk with incomplete knowledge. l E E E Truns. Sys. Man. Cyb. SMC-14 (4), 618-670. Vari, A. ( 1983). “lnsiemi S’ocati e Decisioni (Fuzzy Sets and Decisions).” Edizione Scientifiche Italiane. Wang, P. P. (Editor) (1983). “Advances in Fuzzy Sets, Possibility Theory and, Applications.” Plenum Press, New York. Yager, R. R. (1981). Prototypical values for fuzzy subsets. Kyhernetes 10 (2). Y ager, R.R. (1982a). A new approach to the summarization of data. Information Science 28 ( I ) , 69. Yager, R. R. (Editor) (1982b). “Recent Advances in Fuzzy Set and Possibility Theory.” Plenum Press, New York.
FUZZY SETS AND THEIR APPLICATIONS TO ARTIFICIAL INTELLIGENCE
105
Yager, R. R. (1983a). Robot planning with fuzzy sets. Robotica I , 41-50. Yager. R. R. ( I983b). “Fuzzy Sets- A Bihliograplry.” Intersystem Publications, Seaside, California. Zadeh, L. A. (1962). From circuit theory to systems theory. P r ~ cInstitulc . o j Radio Engineers 50, 856- 865. Zadeh. L. A. (1965). Fuzzy sets. Informution and Control 8, 338-353. Zadeh, L. A. (1974). The concept of a linguistic variable and its application to approximate reasoning. In “Lcwrning S!,.stems and lntelligent Robots.” pp. I - 10. Plenum Press, New York. Zadeh, L. A. (1976). A Fuzzy-Algorithmic Approach to the Definition of Complex or Imprecise Concepts. In “System Theory in the Social Sciences” (Bossel, H.. Klaczko. S.. and Moller, N. eds.), pp. 202 282. Birkhauser Verlag Basel-Stuttgart. Zadeh, L. A. (1977). Linguistic characterization of preference relations as a basis for choice in social systems. Erkcwntnis I I, 383 410. Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Sysfems I ( l ) , 3 - 28. Zadeh, L. A. (1980). Fuzzy sets versus probability. I E E E Proceedings 68 ( 3 ) . Zadeh, L. A. (1983a). The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets and Systems 11 ( 3 ) , 199-228. Zadeh. L. A. (l983b). Commonsense knowledge representation based on fuzzy logic. Computer 16 (lo),61-65. Zadeh, L. A. (1983~). Linguistic Variables. Approximate Reasoning and Disposition. Med. Injbrm. 8 ( 3 ) , I73 - 186. Zadeh, L. A., Fu, K. S., Tanaka, K., and Shimura, M . (Editors) (1975). “Fuzzy Sers und Their Applicutions t o Cognitive and Decision Proccwes.” Academic Press, New York. Zemankova-Leech. M., and Kandel. A. (1984). “Fuzzy Relational Data Bases-A K e y fo E.\-pert Systems.” Verlag TUV Rheinland, Koln, W . Germany. Zimmerman, H. J., and Zyxno, P. (1983). Decisions and evaluations by hierarchical aggregation of information. Fuzzy Sets and S y , w n s 10 (2), 243-260. Zimmerman, H. J., Zadeh, L. A,, and Gaines, B. R. (Editors) (1984). “Fuzzy Sets and Decision Analysis.” North-Holland. ~
This Page Intentionally Left Blank
Parallel Architectures for Database Systems A. R. HURSON Computer Engineering Program Department of Electrical Engineering Pennsylvania State University University Park, Pennsylvania
L. L. MILLER Department of Computer Science lowa State University Ames. lowa
S. H. PAKZAD Computer Engineering Program Department of Electrical Engineering Pennsylvania State University University Park, Pennsylvania
M. H. ElCH 6. SHlRAZl Department of Computer Science Southern Methodist University Dallas, Texas
1. Introduction. . . . . . . . . . . . . 2. Classification of Database Machines . . . . 2.1 Rosenthal's Classification (1977) . . . . 2.2 Berra's Classification (1977) . . . . . 2.3 Champine's Classification (1978). . . . 2.4 Hsiao's Classification (1980) . . . . . 2.5 Su's Classification (1980) . . . . . . 2.6 Bray and Freeman's Classification (1979) . 2.7 Song's Classification (1983). . . . . . 2.8 Qadah's Classification (1985) . . . . . 2.9 Boral and Redifield's Classification (1985) 2.10 A New Classification. . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
I08 110 112 113 113 113 114 114 114 116
116 116
107 ADVANCES IN COMPUTERS, VOL. 28
Copyright 1 I Y X Y by Acadcmic Prcs,. Inc All nghts or reproduction in any form reserved ISBN 0-12-01 2 128-X
108
A . R. HURSON et a /
3. Database Machines . . . . . . . . . . . 3.1 Low VLSI-Compatible Database Machines 3.2 Semi VLSI-Compatible Database Machines 3.3 High VLSI-Compatible Database Machines 3.4 Related Eflorts . . . . . . . . . . . 4. Conclusion and Future Directions . . . . . References. . . . . . . . . . . . . . .
1.
. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
120 123 I30 143 144 146
Introduction
Since the early 1970s the complexity of conventional database management systems has gradually increased by i) the number and size of databases, i i ) the number and type of application programs, and iii) the number and type of online users. For example, among the estimated 20,000 U S . Government databases, the patent office has a database of 65 billion characters and a query retrieval of one item every 24 seconds (Hsiao, 1983).As another example, the Ballistic Missile Defense System requires a distributed and dynamic database which can maintain and upgrade itself in a complex and rapidly evolving battle ground. The system should be able to perform about 60-120 million operations per second to verify, trace, classify and eliminate a threat (Fletcher, 1984). It is projected that by 1995 the Defense Mapping Agency will have a database of IOl9 bits supporting 1000 on-line users needing I O l 4 bits each (Mahoney, 1981). Conventional systems using typical software approaches fail to meet the requirements of the aforementioned applications. A software implementation of direct search on an IBM 370/158 can process approximately 100,000 characters per second. But even if this speed could be increased tenfold, it would take 18 hours to search the 65 billion characters in the U.S. patent office’s data file system. To avoid the need for an exhaustive search, most existing software systems are based on an indexing mechanism. An indexed system improves performance by means of a sophisticated software system and additional redundacy of data. Nevertheless, this has created some additional problems: the indexed structures require extra storage for the indices. It has been estimated that in a fully inverted file, the space needed to store the index ranges from 50% to 300% of the size of the database itself (Bird et al., 1977; Haskin, 1978). In addition, the use of a directory creates complexity in the search, update, and delete algorithms. In fact, indexing merely shifts the processing burden from searching to merging of pointer lists, and so it offers only a partial solution in terms of the efficiency of database operations. The inefficiency of conventional systems in handling large databases can be associated to the existing semantic gap, computation gap and size gap. These gaps stem from the fact that i) conventional systems by their very nature are
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
109
sequential machines, ii) a conventional ALU is structured for numeric computations ( e g , the CRAY-1 is able to perform 250 million floating point operations per second, while it can not handle more than 15 million characters per second), and finally iii) the memory hierarchy has a passive role in the organization and hence there is a massive amount of data movement between processing elements and storage units. Therefore, there is a great need to design and develop new approaches specialized to the demand of a database environment. Since the mid 1970s a great deal of effort has been directed towards the design of special-purpose architectures for efficient handling of large database systems, namely Data Base Machines (DBMs). Hsiao (1980) examined the motivation for the move to an architectural approach and Demurjian ef ul. (1 986) looked at the issue of performance evaluation of such machines. Recent advances in technology have forced drastic changes in the architectural aspects of these machines. According to the current state of the art, it is possible to have a complexity of 10’ transistors together on a chip, and it is anticipated that by 1990, this complexity will be improved by an order of magnitude. Nevertheless, with the exception of memory organization, the great potential of such a development has not yet been fully exploited. Such a gap between theory and practice is partially due to the lack of suitable architectures and algorithms for hardware implementation. The hardware algorithms should reduce the communication requirements, as well as the computation throughout the chip by replication of a few basic operations in time or space. This implies simplicity, regularity and modularity in the design of an architecture. Recent database machines proposed in Bonuccelli et ul. (1985), Eich (1987), Gajski et ul. (1984), Garcia-Molina et al. (1984), Hurson (1981), Kung and Lehman (1980), Lehman (1986), Oflazer and Ozkarahan (1980), and Song (1980) are attempts to design systems based on the constraints imposed by the technology. These systems are highly regular and simple in nature and hence suitable for VLSI implementation. However, adaptability and performance of these models for dynamic large databases have to be studied in more detail. In this paper, our primary goal is to examine the impact of current technology on the design of special-purpose database machines. To accomplish this end, we present a new classification scheme that incorporates the suitability of the design for use with current technology. We initiate the discussion by looking at what makes a good classification scheme and by briefly discussing several of the schemes that have appeared in the literature. Finally, we use our classification scheme to provide a survey of the DBMs that have appeared in the literature. The survey focuses on the adaptability of the designs to current technology. Earlier surveys of DBMs centered around what we will call “historical machines” (Ambarder, 1985; Qadah, 1985; Smith
110
A. R. HURSON et a /
and Smith, 1979). These machines are of historical significance, but in light of current technology seem to be of little value as potential future designs. However for the sake of completeness they will be briefly discussed. Our primary emphasis in this paper is on designs more suited to current technology. In the next section, we examine previously proposed classification techniques, discuss their weaknesses, and propose a new scheme which we feel is more appropriate for examining novel architectures. Section 3 covers the DBMs based on our proposed classification.
2.
Classification of Database Machines
Since the late 1960s, different researchers have attempted to classify various proposed or designed computer architectures (Feng, 1976; Flynn, 1966; Handler, 1982). Flynn (1966) has classified the concurrent space according to the multiplicity of instruction and data streams. According to Flynn's classification the computer architectures fall into four groups: SISD (Single Instruction stream-Single Data stream), SIMD (Single Instruction streamMultiple Data stream), MISD (Multiple Instruction stream-Single Data stream) and MIMD (Multiple Instruction stream-Multiple Data stream). Unfortunately, Flynn's classification does not address the interactions among the processing modules and the methods in which processing modules in a concurrent system are controlled. As a result one can classify a pipeline computer as an SISD machine, since both instructions and data are provided sequentially. Handler (1982) has defined the concurrent space as a threedimensional space t = ( k , d , w ) in which
k d
is the number of control units interpreting a program. is the number of arithmetic and logic units (ALUs) controlled by a control unit. w is the word length or number of bits handled in one of the ALUs.
According to this classification a von Neumann machine with serial or parallel ALUs is represented as (1, 1, 1 ) or (1, 1, M ) , respectively. To represent pipelining at different levels (e.g., macro pipeline, instruction pipeline and arithmetic pipeline), and to illustrate the diversity, sequentiality and flexibility or adaptability of an organization, the above triplet has been extended by three variables (e.g., k', d', w ' ) and three operators (e.g., +, *, v ) where
k'
represents the macro pipeline-the number of control units interpreting tasks of a program, where the data flow through them is sequential.
111
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
d’ w’
+
*
v
represents instruction pipeline-the number of functional units managed by one control unit and working on one data stream. represents arithmetic pipe-the number of stages. represents diversity-existence of more than one structure. represents sequentiality-for sequentially ordered structures. represents flexibility or adaptability-for reconfigurable organization.
According to this extension to Hindler’s notation, the CDC 7600 and DAP are represented as
(15 * 1, 1 * I , 12 * 1) * ( 1 and ( 1 * I , 1 * 1, 32 * 1) * [(I respectively.
*
*
1. 1 * 9,60 * 1) 1. 128 * I , 32 * 1) v ( 1
*
1,4096 * 1, 1
*
l)],
These classifications suffer from the fact that either they do not uniquely identify a specific organization, or they cannot thoroughly determine the interrelationships among different modules in an organization. In general a classification scheme should ( i ) Categorize all existing as well as foreseeable computer designs. (ii) Differentiate essential processing elements. (iii) Assign an architecture to a unique class.
As one can discuss, these efforts are directed to i ) generalize and identify the characteristics of different designs, i i ) formulate a systematic mechanism by which different designs can be analyzed and compared against each other, and iii) define a systematic mechanism to transform the solutions from one design to other designs. The research on design and development of database machines during the past two decades has resulted in a large body of database machines with different architectures, characteristics, specifications, and target applications. These database machines are interrelated in many ways and share common characteristics. Through a classification and taxonomy of database machines, one can provide grounds for a basic understanding and a unified view of the body of research on database machines. Thus, by knowing the class in which a particular machine belongs, one can make intelligent decisions and deductions with regard to that machine without knowing its specific characteristics. Within this framework several classifications of database machines along with their general characteristics are discussed. Hsiao ( 1980) classified databases into two general categories, namely formutted and unfbrrnatted structures. Formatted databases are mainly timevariant entities and are subject to extensive alteration as well as search operations. Unformatted databases (bibliographic or full text) are archival in nature and are processed by searching for a pattern or a combination of
112
A . R. HURSON et a / .
patterns. As a result, operations on the formatted databases are based on the contents of the attribute values of the records, while in the unformatted databases the patterns are unpredictable combinations of terms and words. The major theme of this paper centers around the formatted databases. Hence, we use the terms database and database machines to refer to the formatted database as the underlying data structure. The literature has addressed several classifications of database machines. In the following, these classifications along with their general characteristics are discussed. 2.1
Rosenthal’s Classification (1977)
This classification categorizes the database machines according to their functions in a computing system into three groups: Smart peripheral systems, large backend, and distributed network data node (Rosenthal, 1977). A smart peripheral system at the low end is an intelligent controller for conventional rotating storage, which mainly moves some of the access method functions from the mainframe to the controller. At the high end it represents a controller capable of handling database operations at the secondary storage level. Such an architecture can be organized to perform database operations on a unit of data (i.e., data on the fly) or on several units of data (i.e., logic per track). This concept is a solution for the data communication problem, which eliminates one aspect of the 90-10% rule. A large backend system is a semi-autonomous special-purpose unit, which is closely coupled to a general-purpose host machine, and has access to the databases. Its function is to provide database management services for the host machine. Higher performance due to overlapping between operations in the backend and frontend machines, reducing the semantic gap due to the hardware implementation of the database primitive operations at the backend module, and reducing data transportation are the major thrusts of this architecture. However, one has to pay a penalty for lower reliability and performance due to the added processing module(s) and inter-communication among them. A distributed network data node is a loosely coupled autonomous processor which is an element in a network consisting of both general- and special-purpose processors. This topology is an extension of the backend approach where a database processor is shared with a group of processors in the network. Such an architecture bears the advantages of the backend design with a better utilization of the database processor. However, it enforces a complicated intercommunication protocol among the network nodes and a more complex and sophisticated mechanism to distribute and share the processing power of the database processor among different network elements.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
2.2
113
Berra’s Classification (1977)
Berra has classified the research on database machines into hatkend systems, logic in memory, and large associative processors (Berra, 1977). A backend system in this classification has the same configuration and characteristics as discussed by Rosenthal. A logic-in-memory system essentially matches the “high end” of the smart peripheral system class in the previous classification. It is based on the logic-per-track concept proposed by Slotnick (1970). A large associative processor performs database operations in associative fashion, acting as a backend processor or as a stand-alone unit. High performance, flexibility, and elimination of the name mapping resolution problem (due to the embedded parallelism in the architecture and content addressability of the data) are major advantages of this approach. However, the existing size gap between size of databases and the capacity of associative memories enforces the use of a partitioned data set mechanism. In addition, high-speed associative processing requires parallel 1 / 0 operations and a sophisticated buffering scheme between back-up storage and the associative processor.
2.3
Champine’s Classification (1978)
According to this classification, database machines are grouped as backend processor for a host, intelligent peripheral control unit, network node, and storage hierarchy (Champine, 1978). The first three groups have the same definitions as the three groups discussed by Rosenthal. Storage hierarchy is based on the old concept of memory hierarchy and virtual memory, which has been proposed to overcome the existing size gap and access gap found in conventional systems. I t should be noted that the efficiency of this approach is heavily due to locality of references in the database, which can be achieved at the expense of higher data dependence (i.e., partitioned data set). 2.4
Hsiao’s Classification (1980)
Hsiao ( 1 980) has classified database machines into three groups, namely the cellular logic approach, the ussociatioe array approach, and the junctionally specialized approach. The cellular logic approach falls at the high end of the intelligent controller systems category. In this approach, a cell is the basic building block with some storage capacity and ability to perform database operations on its data set. The associative array approach is the same as the large associative processor discussed before. The functionally specialized approach is a cluster of specialized units, each with considerably different processing speeds and memory capacity requirements. This approach allows the construction of a relatively well balanced computer to overcome the
114
A. R. HURSON et a /
existing bottlenecks by providing each component with the right amount of processing power and memory capacity. 2.5
Su’s Classification (1980)
This classification groups database machines into four classes cellular systems, high-speed associative systems, integrated database machines, and backend computers (Su et al., 1980). The first three are similar to Hsiao’s definition and the last one is the same as the backend systems in previous classifications. These classifications are similar in that for the most part they are based on how the database machines in a computing environment will be used, but they reveal nothing about the characteristics or architecture of such machines. Bray and Freeman (1979) have proposed a two-dimensional database space based on the degree of parallelism and the location where the data is searched. 2.6
Bray and Freeman’s Classification (1979)
According to this classification database machines are grouped into five categories: single processor indirect search, single processor direct search, multiple processor direct search, multiple processor indirect search, and multiple processor combined search (Bray and Freeman, 1979). Direct search processing implies the ability to search database at the secondary storage level, while indirect search represents the fact that data needs to be transferred to an intermediate storage media before the search can be conducted. Single processor indirect search (SPIS) represents the conventional von Neumann type architecture with no degree of parallelism. Naturally, such an architecture bears all the aforementioned deficiencies in handling large databases. Single processor direct search (SPDS) represents a conventional system enhanced by searching capability at the secondary storage. As a result, only the desired records or their specified parts are sent to the host. Multiple processor direct/indirect search (MPDS/MPIS) machines represent parallel versions of SPDS/SPTS organizations. A MultiProcessor Combined Search (MPCS) is a combination of the MPDS and MPIS organizations, in which search is performed on the data loaded into intermediate storage, while multiple processing units are assigned to the blocks of intermediate storages. It is clear that this classification is general enough to cover the previous classifications. Table I summarizes the relationships between the classifications. 2.7
Song’s Classification (1983)
Song (1983) has defined database machines as computer systems that are enhanced by special-purpose logic for handling database operations. With
TABLE I RELATIONSHIPS AMONGDIFFERENT CLASSIFICATIONS OF DATABASE MACHINES
Bray & Freeman (1980)
Rosenthal (1977)
Berra (1977)
Champine (1978)
Su et a/. (1980)
Hsiao (1980)
-
-
-
-
Song (1983)
Qadah (1985) ~~
-
SPDS
Smart peripheral system
SPIS
Large backend processor
Backend system
Backend processor
MPDS
Smart peripheral system
Logic in memory
Intelligent peripheral control unit
Cellular logic
MPlS
Large backend processor
Backend system; large associative processor
Backend processor
M PCS
Distributed network data node
Network node
-
Intelligent peripheral control unit
~~
Logic at secondary storage with static allocation (sequential operation)
SOSD with relation indexing on the disk search
Logic at primary storage with static allocation (sequential operation)
SOSD with relation or page indexing off disk search
Cellular logic
Logic at secondary storage with staticidynamic allocation (Parallel Operation)
MOMD,'SOMD with relation indexing on disk search
High-speed associative system/ backend system
Associative array
Logic at primary memory with static:'dynamic allocation (Parallel Operations)
MOMD,'SOMD Nith relation or page indexing off disk search
Integrated database machine
Functionally specialized system
Logic at primary secondary memory with static, dyndmic allocation Medium to high level of parallelism
M O M D S O M D Hith relation or page indexing on off hybrid sedrch
116
A. R. HURSON et a /
such a view, he has classified these machines according to three parameters: ( i ) The place where hardware logic is applied. This could be either at (close
to) secondary storage or at primary memory. (ii) Allocation of logic to storage unit. This could be static or dynamic. Naturally, a dynamic allocation offers better resource utilization. (iii) Degree of logic distribution, which defines the number of storage elements associated with each processing unit. This parameter represents the degree of parallelism, and hence it directly affects the performance. 2.8
Qadah’s Classification (1985)
Qadah (1985) has extended Bray and Freeman’s database space by a third dimension, namely indexing level. The coordinates of this database space are the indexing level, the query processing place, and the processing multiplicity. The indexing coordinate represents the smallest addressable unit of data. Along this coordinate database machines can be grouped into database indexing level, relation indexing level and page indexing level. The query processing place determines the location where data is searched. This could be off the secondary storage, on the secondary storage, or a hybrid of both. The third coordinate represents the degree of parallelism. Along this coordinate we can group database machines into single operation stream-single data stream (SOSD), single operation stream-multiple data stream (SOMD), and multiple operation stream-multiple data stream (MOMD). 2.9
Boral and Redfield’s Classification (1985)
Boral and Redfield (1985) have proposed a classification based on two components: a catalog and an anatomy. The catalog describes and discriminates among the DBMs using several parameters. It can describe a DBM and provide an overview of its working mechanisms. The anatomy systematically describes the architecture and hardware organization of a DBM. Due to the fact that we view the scheme more as means of describing the hardware thanas a general classification, we have chosen not to attempt to integrate it into the remainder of the discussion. 2.10
A New Classification
During the past 35 years, the transition from vacuum tubes to VLSI has increased the processor speed by four orders of magnitude, and has reduced the logic circuit size and memory cell size by factors of 500 and 6400, respectively (Block and Galage, 1978; Klass, 1984). For example, the first
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
117
electronic computer, ENIAC, consisted of 18,000 vacuum tubes and 15,000 relays. The U-shaped computer was 100 feet long and nearly 9 feet high and weighed over 30 tons. ENIAC was able to perform merely 5000 additions or subtractions per second. By comparison, with the strong emergence of VLSI technology, today an entire 32-bit microprocessor or a 1 M-bit RAM memory can be incorporated into a single chip. The reduction in the gate switching delay and wire length on the one hand and miniaturization of circuits on the other hand has a direct effect on the clock rate and hence in closing the computation gap. In addition, such a trend has increased the sophistication of the supporting software and the migration of the software functions into hardware. In the late 1960s Moore predicted that component density on a chip would quadruple every three or four years (Mukherjee, 1986).This is partially due to the development of high-resolution lithographic techniques, increases in the size of the silicon wafer, and growth of the accumulated circuits and layout design issues leading to an improved architecture capable of exploiting the technology. However, as improvements in technology approach the limit (the speed of light), Moore’s law is no longer applicable and the emphasis is shifted in the direction of advances in system architecture and organization. Migration of software functions into hardware, combined with advances in the technology’s intrinsic speed, have reduced the computation gap. However, there has always been the need for much more computer performance than is feasible with a simple straightforward design. For example, there currently exist several projects with requirements of lo9 instructions per second (e.g., 1 ns per instruction) balanced against technologies that are approaching the speed-of-light transmission limitation (e.g. 30 cm/ns). To overcome these limitations, computer designers have long been attracted to designs in which the computer hardware is simultaneously processing more than one basic operation-i.e., concurrent systems. Within this general category are several well recognized techniques: parallelism, pipelining, multiprocessing and distributed proccessing. However, a discouraging aspect of these activities is the lack of extensive published knowledge regarding the rationale for various designs, or comparing the results achieved by various approaches. The aforementioned classifications of database machines do not address the effect of the advances in technology on the architecture and adaptability of a specific architecture for the current and foreseeable technology. We believe such a parameter should be used as a coordinate of the database space. Recent developments in technology have influenced the architecture of database machines in two directions: ( i ) Reconfiguration and reevaluation of the previously designed database architectures according to the constraints imposed by the technology.
118
A. R. HURSON et a/
The evolution of R A P (Oflazer and Ozkarahan, 1980; Schuster et al., 1979) demonstrates the validity of this discussion. (ii) The design of new architectures based on the constraints imposed by the technology (Bonuccelli et al., 1985; Eich, 1987; Gajski et al., 1984; Hurson, 1981; Kung and Lehman, 1980; Song, 1980). The classification of database machines based on their architectural characteristics and technology allows one to
(i) Group the machines with common architectural characteristics into one class. Thus, a solution for a particular fundamental problem based on a specific machine or technology can be easily extended and applied to the other machines within the class. For example, a practical solution to the chip 1 / 0 pin limitation problem can be extended to other VLSI-based database machines; (ii) Pinpoint the weak and strong points of each machine architecture based on comparing it to similar (intra-class) and different (inter-class) machines. Such a study results in a good set of parameters and tools for evaluating existing and future machines. For example, the treestructured database machines are especially suited for sorting and searching operations, but are not necessarily efficient for performing the join operation. On the other hand, associative-based architectures are proven to be efficient for many database operations, but are prohibitively expensive to develop. Therefore, the tree-structured database machines should be compared and evaluated based on their efficiency in handling the join operation. The associative-based database machines should be evaluated based on how effectively they can handle large files using associative memories. In addition, an architectural classification of database machines allows the designers of new database machines first to match their specifications and hardware requirements with one of the existing classes and then to proceed with their specialized architecture if a perfect match is not found; (iii) Anticipate the future trends and the new developments in the field of database machine design. We witnessed the trend in the use of associative memory technology in the design of database machines in the 1970s. The impact of current technology on the design of database machines in the 1980s is quite obvious by the attempts to use systems with systolic array characteristics.
For our architectural classification of database machines we propose adding a fourth dimension to Qadah’s classification, creating a database space
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
119
of the following four coordinates: technology adaptability, degree of parallelism, query processing place, and indexing level. According to these parameters one can characterize an architecture based on its ability to handle i) computation gap, ii) semantic gap, iii) size gap, iu) data transmission, and u ) name mapping resolution. The indexing level determines the smallest accessible unit of data. Such a coordinate determines name mapping resolution and the proper protocol which one should take to enforce the security and guarantee the system’s integrity. The query processing coordinate illustrates the capability of the design in handling the data transmission problem. I t is characterized by the following four classes: searching at the secondary storage, searching close to the secondary storage, indirect search, or a combination of the previous three. The degree of parallelism describes an architecture’s ability to close the computation gap. Along the technology adaptability coordinate, database architectures are characterized as high-, semi- or low-adaptable designs. Due to continued advances in device and storage technology, the adaptability issue is measured with respect to these two parameters.
3.
Database Machines
The overall structure and organization of several database machines will be examined with respect to the parameters of the new classification. In addition, in our evaluation we will consider the flexibility of a design for different database operations. In other words, we will look at whether an architecture requires different modules for basic database operations or a module can be used to carry out different basic operations. I t should be noted that in this study we concentrate on the architectures that utilize the relational data model, and hence basic database operations refer to the relational algebra operations. This section is divided into three subsections, based on the adaptability of database machines to current technology. The first subsection examines some of the “low adaptable” organizations. These machines were generally reported before the widespread application of the current advances in technology. In addition, the literature has not examined their potential for the current technology. The second subsection addresses those machines that have not been designed according to the constraints imposed by the technology. However, some efforts have been made to modify and adapt their designs for current technology. Finally, the last subsection addresses the machines that are designed for current technology-i.e., their topologies bear the constraints imposed by technology.
120
A. R . HURSON et a/
3.1
Low VLSI-Compatible Database Machines
The machines in this class represent the early DBM architectures. These are architectures that were primarily concerned with establishing the conceptual feasibility of DBMs. However, we should note that some of these designs are inherently adaptable to the current advances in device technology. The general trend of these designs was primarily based on associative operations. They can be classified as fully parallel or block-oriented associative organizations. 3.1.1
Fully Parallel Associative Architecture
The first special-purpose processor for handling non-numeric operations was designed by Lee and Paul1 (1963). This bit-parallel word-serial hardware machine is composed of an array of identical cells. Each cell is a small finitestate machine which can communicate with its neighbors. The array is controlled via a set of programming commands broadcast among all the cells by a controller. Each cell includes a set of bistable devices called cell elements. Cell elements are divided into cell state elements and cell symbol elements. Cell symbol elements hold a bit pattern corresponding to a character in the alphabet. There also exists a matching circuit which matches the bit pattern broadcast by the controller and the bit patterns stored in each cell. Data is organized as a single string of symbols divided into substrings of arbitrary lengths by delimiters. The system is suitable for text retrieval operations, but because of the hardware cost of the memory, the memory size could not be large enough for handling databases. In order to overcome the propagation timing problems, Gaines and Lee (1965) redesigned the logic circuitry. The new design was able to perform simultaneous operations of shifting and marking strings, but the cost was high for practical implementation. In the early 1970s some new hardware for handling databases using associative processors was designed (Defiore and Berra, 1973; Linde el a!., 1973; Moulder, 1973). These systems are able to perform the operations in “bit slice” fashion. That is, parallel operations are performed on one bit of all words at a time. In each approach, a critical assumption has to be made that the entire data file can fit into the associative memory. Comparisons between an associativeprocessor-based architecture and a similar von Neumann architecture have shown the superiority of associative processors over the von Neumann design with respect to retrieval, update and storage operations (Berra, 1974). Defiore and Berra (1973) have shown that associative processors need to use 3 to 15 times less storage compared to a database system using inverted list organization. Moreover, the response time is 10 times faster. The system implemented by Moulder (1973)is a typical design of a database system based on an associative processor. The system is restricted to moderate size databases (6 x 10’ bits). This system, is composed of a general-purpose
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
121
computer (Sigma 5 ) augmented by a four-array STARAN computer (Rudolph, 1972).The database is subdivided into a fixed size sector. A sector is read in and will be searched. Then the system reads in another sector and performs the same search, continuing in this fashion until the entire database is searched. The system designed by Linde et u1. (1973) IS an integrated associative computer. This system is composed of two associative processing units with 2K, 256-bit elements, which are linkcd to an IBM 370/145. I t is worthwhile to mention that, besides the above special-purpose designs based on associative processors, there are some general-purpose associative processors. STARAN is an example of such a system (Davis, 1974, 1983; Rudolph, 1972). It is composed of an associative array processor (one to 32 modular associative processor) with an interface (custom interface unit) to the users. It also has a conventionally addressed control memory for program storage and data buffering. Control signals generated by the control logic unit are fed to the processing elements in parallel, and all processing elements execute the instruction simultaneously. The STARAN symbolic assembler language APPLE provides a flexible and convenient assembler for programming without the complex and costly indexing, nested loops, and data manipulation constructions required in conventional systems. 3.1.2
Block-Oriented Database Architecture
In the late 1960s, Slotnick (1970) proposed the “Logic per Track” concept. This idea, which has served as the guideline for a large number of proposed hardware designs, is simply based on assigning a read/write head to each track of a disk. Later on, this concept was enhanced by adding more logic to each read/write head (cell) and hence the cellular organization. The general philosophy behind the proposed architectures based on Slotnick’s idea is the selection of tuples at the secondary device level. These cellular organizations are composed of a set of identical cells supported by a general-purpose computer. Each cell acts as a small computer, capable of performing some basic operations on the data residing in its memory. As a result, just a subset of data is selected and transferred to the frontend machine (e.g., the solution to the data transmission problem). Figure I shows the general diagram of a cellular organization, in which N processors (PI,Pz, . . . , P,) are controlled by a controller and to each processor P, a memory M iis assigned ( I I i 5 N ) . A Loyic Per Track Retrieval Sjwtem is a fixed-head disk with a logic chip attached directly to each read/write head (Parker, 1971).Each head is allowed to search for a fixed key, use garbage collection on the track, and insert, and/or delete a record and/or a key. Since there is no communication path between heads, the maximum length of each record is logically restricted to the size of a
122
A. R. HURSON et al
7 IjO Machine
-H t Gen. Purp. Computer
Controller
FIG. 1. General structure of a cellular organization
track. Information on t e tracks is divided into three groups: holes, keys, an’ data record. Moreover, each record should be preceded by a set of mark bits for more complex instructions which require more than one disk rotation. R A P I D (Rotating Associative Processing for Information Dissemination) is an array of identical cells controlled byacontrol unit (Parhami, 1972). RAPID is based on Slotnick’s idea and Lee’s machine (Lee and Paull, 1963). This combination will reduce the cost of memory (compared to Lee’s memory) at the expense of execution time. Information on the tracks is in bit-parallel word-serial format. The strings of data, which are stored on the secondary memory, are read into the cell storage one character at a time, processed and stored back. Since data is handled one character at a time, each character should have some control storage associated with it to store the temporary result, similar to the one proposed in Lee’s design. C A S S S (Context Associated Segment Sequential Storage) is another nonnumeric-oriented system based on a cellular organization, augmented by a fixed-head disk (Healy et al., 1972).Data files are partitioned into segments of equal length, one segment per track. Data is organized in bit-serial word-serial
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
123
fashion. Each record is of variable length, and is preceded by a set of control bits for storing the intermediate results. CASSS is also capable of performing non-numeric operations, which includes automatic garbage collection. C A S S M (Context Addressed Segment Sequential Memory) is a database machine capable of handling the hierarchical data model, as well as the relational and network data models (Su, 1979). Data is stored in a bitserial word-serial fashion. One particular feature of CASSM is that each processing element can directly communicate with its adjacent neighbors. R A R E S (Rotating Associative RElational Store) uses a very different organization from CASSM and RAP (Lin et al., 1976). Tuples are laid out on the secondary storage device across the tracks, rather than along the tracks (bit-serial byte-parallel fashion). Each set of tracks used to store a relation in this fashion is called a band. The number of tracks in a band is a function of the size of tuples in the relation. In this design a cell will be assigned to each band, which can overcome the inefficiency of previously mentioned cellular organizations, at the expense of slower execution time and additional hardware for association of cells to bands of different sizes. Another special feature in this system is the elimination of mark bits associated to each tuple stored on the secondary storage, since mark bits are implemented in the cells (search modules) rather than on the secondary storage. According to Qadah’s classification the aforementioned machines can be classified as SOMD architectures. However, the Search Processor for database management systems proposed by Leilich et al. (1977) belongs to the class of MOSD machines. A slightly different design than the logic-per-track DBMs, namely CAFS (Content Addressable File Store) (Babb, 1979), was proposed in the late 1970s. CAFS utilizes a data filter between the disk storage device and the host computer, which screens the data before sending it to the host. Its hardware reads out up to 12 tracks simultaneously and then multiplexes the data as input to multiple processing elements each executing a different operation. 3.2 3.2.1
Semi VLSI-Compatible Database Machines
RAP-Relational
Associative Processor
The Architecture of RAP was proposed in 1975 (Ozkarahan et al., 1975). However since then, in order to increase its compatibility with the advances in technology, it has undergone several revisions (Oflazer abd Ozkarahan, 1980; Schuster et ul., 1979). Originally RAP was based on the concept of logic per track proposed by Slotnick (1970). It is a collection of identical and autonomous processors (cells)communicating with a general- purpose frontend machine through a controller (Fig. 2). As the memory rotates under the read/
124
CELL,
7processor
CONTROLLER Processor
STATISTICAL, ARITHMETIC UNIT I
1 FIG.2. Organization of RAP architecture.
CELL,
A. R. HURSON et a /
L
CELL,
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
125
write heads, tuples are read into the buffer and circulated through the cell’s logic and written back on the track after a delay. As a result, memory tracks are searched in parallel in associative fashion, and hence the execution time is independent of the size of the database and is a function of the rotation time and the number of rotations required to execute a query. The development of the rotating memory technology and the complexity of the controller to synchronize and coordinate the operations within the cells have shown that such a simple concept is not economically feasible. In addition, it has been shown that this model has very low performance in handling inter-relational operations such as the join (Oflazer, 1983). These deficiencies have evolved the RAP architecture into RAP-2 and RAP-3, where system dependence on a specific secondary technology has been reduced. In addition, the functionality and flexibility of each cell has been upgraded through the concept of microprogramming. Finally, special algorithms have been devised for efficient handling of “hard” database operations (Oflazer, 1983). However, the compatibility of the system to the recent advances in technology has to be studied in detail. 3.2.2
A Database Computer (DBC)
The database computer (Banerjee et al., 1978, 1979) is a specialized backend system capable of managing 109-10’0 bytes of data represented in different data models. In addition, it has a built-in support for security, and different relational operations. DBC uses two forms of parallelism. First, an entire cylinder is processed in parallel. Second, the system performs queries in a pipeline fashion by separate units around two rings (Fig. 3): the structure loop and the data loop. The function of the structure loop is to translate predicates in the query into physical addresses of the minimal addressable units (cylinders). Along the structure loop the keyword transformation unit transforms the request into a series of indices to the structure memory. The structure memory is an inversion table for the database; it produces the logical address of data required to answer a request. The structure memory information processor then uses the output of the structure memory to perform boolean operations on these logical addresses. The index translation unit then translates the logical addresses into physical addresses. These physical addresses are used to access the data residing on the mass memory. Finally. the security filter processor is used to enforce the security specifications. VLSI compatibility of the join operation has been evaluated and discussea by Hsiao (1983). The major drawback of this system is its performance for large databases. The performance analysis of this system in supporting relational databases shows that a general-purpose machine can perform better
126
A. R. HURSON e t a / .
DBCCP: DataBase Command KL Control Processor
KXU:
SM:
Keyword Transformation Unit Structure Memory
SMIP:
Structure Memory Information Processor
IXU:
Index Translation Unit
MM:
Mass Memory
SFP:
Security Filter Processor
PES:
Program Execution System
Data path
FIG.3. DBC organization.
than DBC (Oflazer and Ozkarahan, 1980). Furthermore, since the system relies on the concept of index processing, the same complexity of algorithms and data dependence in conventional systems will exist in DBC, which could affect the performance. 3.2.3 DIRECT-A
Multiprocessor Backend Database Machine
Specific features which distinguish this architecture from the other database machines are based on the fact that i ) DIRECT is an M I M D organization, and
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
127
i i ) it can dynamically allocate resources to the queries. As a result, it can simultaneously support intra-query and inter-query concurrency. Since the late 1970s its architecture has been under study and evaluation (Bitton et al., 1983; Boral er ( I / . , 1982; Boral and DeWitt 1981; DeWitt 1979; DeWitt and Hawthorn, 1981). DIRECT is a collection of query processing elements and memory modules (Fig. 4) interconnected through a cross-bar switch network. The network provides access paths between any pair of query processor and memory module. Memory modules are shared disk caches. Allocation and deallocation of memory modules to the database pages is similar to the concept of paging scheme in the conventional systems. The backend controller is responsible for determining the number of query processors and memory modules that should be allocated to a query. This is done based on the complexity of the query, size of the database(s) and the system’s contention. Simultaneous execution of different queries on a database enforces a locking protocol which guarantees the data consistency. This has been achieved by enforcing the locking scheme at the relation level. The design offers better load balancing and allows the processors to be shared among the storage modules. Furthermore, expansion is easy and modular. However, this design suffers from the data transmission problem. As a result, a lot of data is transferred from one module to another. This could be reduced through the concept of presearching and selection of the related tuples before pages are allocated to a memory module. In addition, due to the nature of the organization (e.g., MIMD) a more elaborate security mechanism is needed to enforce the security and privacy access control. Finally, the adaptability and suitability of the query processing elements according to the constraints imposed by the current technology need to be studied in depth.
3.2.4 DIALOG -Distributed Associative LOGic Database Machine
DIALOG (Wah and Yao, 1980) is a collection of clusters of identical modules organized in a hierarchy. Each cluster is under the control of a controller, which provides communication paths between its nodes and other clusters on the one hand, and the frontend machine on the other hand (Fig. 5). The processing units (e.g., data modules) are designed to store a part of a database and perform the relational operations on them. Data modules are based on the concept of preprocessing. In other words, the data file is searched close to secondary storage and the valid tuples are sent to the system. DIALOG is designed for handling a large on-line database system. Organization of the system is based on heterogeneous storage devices which allows the system to upgrade itself according to the future advances in secondary storage technology. In addition, similarity of the overall structure
Front-end Machine
II
A
N
03
Back-end controller
L
1
: I storage
? ?
-CL-
storage
I
c n v)
CROSS-BAR INTERCONNECTION NETWORK
Query procc
,
Query procc,
a a Query procc,
FIG.4. DIRECT organization.
0
z
‘D
Iu
PARALLEL ARCHITECTURES
FOR DATABASE SYSTEMS
129
Front-end Machine
Flci. 5. DIALOG organization.
of the clusters and the uniformity of the sequence of operations within the clusters could simplify the overall design phase and fabrication. However, the simplicity of each data module and its adaptability for VLSI technology should be studied in more detail. Moreover, the computational capability of the system for handling a complete set of database operations and aggregate functions in a multi-user distributed environment should be enhanced. Otherwise, the bottleneck at the controller level will degrade the overall performance of the system. Finally, one has to study the overall performance of the system extensively.
130
A. R. HURSON ef a\.
3.3
High VLSI-Compatible Database Machines
The systems in this class are basically designed based on the constraints imposed by technology. In general, they are highly parallel with regular and simple architectures.
3.3.1 Systolic Organization The systolic model proposed by Kung and Lehman (1980) is a collection of basic cells replicated in a two-dimensional space. Each cell has the organization as depicted in Fig. 6; ai,, bin, and ti, represent the inputs and aOut, hou,,and tout represent the outputs from a cell. A cell performs the following sequence of operations: A (aindbin)
tout
+
tin
aout
+
ain
bout
+
bin 3
where O E { =, f , <, >, ,I >}. Tuples from the source and target relations move along the a and b paths and the result of the comparison within each cell is accumulated along the t path. As tuples are pipelined to the system, each pair of the tuples (i, j ) in the
FIG.6 . A systolic cell.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
131
source and target relations meet each other in some row. The final result of the comparisons within the rows are accumulated in a binary matrix 7:where
Then, scanning the bit patterns in T determines the tuples that participate in the operation. Figure 7 shows the sequence of the operations and the formation of T for "join." As can be seen, the operations within cells should be synchronized. In addition, for proper execution of the basic operations, data should be delivered in a special pattern. The potential of this simple organization for handling database operations and file processing operations has been discussed by Lehman (1981). The simplicity of the basic cell and the overall regularity of the architecture has provided a suitable ground for VLSI implementation of this architecture. On the negative part, each relation should be scanned twice. First they should be pipelined to the array to determine interrelationships between each pair of tuples, and then they should be scanned again to extract relevant tuples. The investigation of the resultant matrix (e.g., T ) is another issue that should be studied in detail. According to the flow of the operations and data, the array should be large enough to accommodate the relations. This means that the feasibility of this design for handling large databases should be investigated in depth. Finally, as can be seen in Fig. 7, the staggered processing of data means that half of the cells will be idle during a clock period. This inefficiency, as discussed by Kung and Lehman (1980),can be removed by preallocation of a relation (say source) in the array and crossing the target tuples along the b path (or a path).
3.3.2 Tree Machines The tree machine proposed by Bentley and Kung (1979) is a collection of three basic cells, namely the 0-node, I ]-node and o-node, organized as twomirror-image binary trees sharing the leaf nodes (Fig. 8).The o-nodes are used to broadcast the data and instructions to the leaf nodes. Each o-node has a limited storage capacity enhanced by a processor to perform increment/ decrement and simple selection operations. The o-nodes store records and perform the various broadcast operations in response to a user query. The a-nodes are used to combine the results of the operations in the 17-nodes to compute the final result. Song (1980) has shown that such a model can be used to perform relational operations. By assuming that duplicate tuples are not allowed within a relation, he simplified the original model proposed by Bentley and Kung
132
A. R. HURSON el al
a64
t
a21
22
I1
b63 b64
FIG.7. Flow of data in systolic join
(1979). The system performs the operations based on the principle of pipelining. Hence, it provides a smooth flow of data with a high throughput. However, due to the existing bottleneck at the root of the a-nodes, the pipeline might be forced to be turned off. The regularity of the model and the simplicity of the connections among the cells, combined with the efficient chip layout format of a binary tree structure, provide an attractive model for VLSI implementation of the system. By unmirroring the binary trees (Fig. 9), the system can be constructed utilizing two types of chips, namely internal and leaf chips. A n internal chip contains a combination of the 0-nodes and a-nodes while the leaf chip contains the 0nodes.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
133
FIG.H. Tree organimtion.
Data transmission among the cells is the major bottleneck of the model. As a result, in many instances the pipeline cannot have its smooth and uniform data flow. This will enforce some complexities at the control level. 1 / 0 pin limitation is another drawback which has to be addressed extensively. I t is possible to provide a direct parallel 1 / 0 facility at the leaf chip. However, problems such as pin limitation have to be discussed. For some relational operations such as join (Song, 1980).resource utilization of the system is low. Finally, capability of this model for handling large databases is a serious problem which should be investigated extensively.
?
F
Leaf nodes FIG.9. Unmirror tree of Fig. 8
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
135
I t should be noted that the tree organization proposed by Bonuccelli et a! (1985) is based on the same philosophy. However, nodes have uniform organization, each capable of performing the bookkeeping, communication, and database operations. In addition, the representation of tuples and the sequence of the operations are totally different than the one discussed by Song ( 1980). 3.3.3
ASLM-An
Associative Search Language Machine
ASLM (Associative Search Language Machine) is a backend database machine (Hurson, 1981, 1983). The system is composed of a collection of general-purpose frontend machines supported by ASLM (Fig. 10). Security validation of the user and the user’s query, translation of the user’s query into ASLM primitives, and transmission of the final results to the user are the major functions of the frontend systems. ASLM is composed of four modules: i ) controller, ii) secondary storage interface, iii) an array of preprocessors, and ill) a database processor. Controller. The controller i) stores the ASL microinstructions generated by the ASL compiler, ii) decodes the ASL microinstructions, iii) propagates the control sequences to the appropriate modules in the backend, and iu) manages the sharing and distribution of the resources among different concurrent users’ requests. The simplicity of the controller is primarily due to the use of associative hardware for the execution of the ASL primitives. Secondary Storuge Interface. The secondary storage interface is a collection of random access memory modules, augmented by some hardware facilities. This module is an interface between secondary storage and ASLM. I t accesses blocks of data from secondary storage and distributes them among the preprocessors. Preprocessors. The preprocessors act as a filter which screens the data, selecting the valid data. The valid data is then placed in an associative module in the database processor. I n addition, the preprocessors perform a projection over the relevant attributes. Database Processor. The database processor is composed of a set of associative modules enhanced by some hardware capabilities for direct implementation of the relational operators. The result of the operations of the array of preprocessors on a relation r with relation scheme R is a relation r, with relation scheme R , , which is stored in an associative memory.
136
-
A +t ASL Compiler
Processor
-*
Control Signals Data Common data bus
C 0 N
\
T
R 0
L L E R
I
Date Base Processor
FIG.10. A S L M architecture.
I
A . R . HURSON el a /
-4
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
137
This unit is a set of identical and independent associative memories of size w
* d (where M’ is the width and d is the depth) augmented by some hardware
circuits. The associative modules can be linked together to make a memory K I n, where n is the number of associative modules) size ( K * w) * d ( 1 I capable of holding d tuples of size K * w bits, or they could be linked to form a memory of size w * ( K * d ) capable of holding K * d tuples of size w bits. The independence of the associative modules enhances the modularity and as such the fault tolerance of the system. A n enhanced join module capable of performing high-level operations on null values has been incorporated in the design of the database processor (Miller and Hurson, 1986). VLSI time/space complexity of the basic cells of an associative memory for handling basic set and relational operations has been discussed by Hurson (1986). Based on such a calculation, the search time would be 13At 3(n - 1)At where At and n stand for average delay time of an inverter and the word length, respectively. In addition, the geometry area of a cell has been estimated at 401 * 201,. The Non-Von DBM project, under investigation since the early 1980s (Shaw, 1980), is composed of two components: a primary processing subsystem (PPS) and a secondary processing subsystem (SPS). The SPS is the major repository of the data and consists of conventional movable-head disks with processing logic per head. The logic is capable of examining arbitrary attributes in each tuple and performing a hash function on them. PPS processing includes associative retrieval, arithmetic comparisons, logical manipulation, and 1 / 0 functions.
+
3.3.4 Parallel Pipelined Query Processor The pipelined database machine (Gajski et al., 1984) is a collection of four modules, namely single-relation query pipe, aggregation pipe, join pipe, and sorting pipe. These pipes share a group of FIFO memories through a crossbar switch network (Fig. 11). Like ASLM (Hurson, 1981),the system is based on the principle of preprocessing. A single-relation query pipe is a collection of 32 chips each composed of two parts: a where-evaluation part and a select-evaluation part. The whereevaluation part is used to check the validity of tuples according to the search arguments defined in the user query. The select-evaluation part projects tuples on the attributes defined in the output set. In addition, it can perform some simple arithmetic operations on the attributes specified in the output set. These two parts perform their operations in parallel on the designated tuple. An aggregation pipe is composed of several units (e.g., squeezer, aggregation unit, duplicate checker, and arithmetic/comparator unit) for handling operations such as select unique, Count, Sum, Max, Min and Avg.
To From Front-end
Control Program Memory 1
I
4
t
Buffer
v w I -
Ill r
To Disk
S W
TOutput Queue
' I'
I
FIFO Queue =s-} FIFOQueue J
*
I I
I T C H I
K G p\l
I I
*
FIFO Queue
E
FIFOQueue
4-1
Query Processor Controller
t
Single-Relation Query Pipe
1
Control Bus
L
I
T
w
v
0
I-
R K
FIFO Queue
L
FIG. i1. Query processor organization.
I
I
-
L
Aggregation Pipe
P
II I
Ji
C
z5
'u
Join Pipe
Sorting Pipe
n
sm (n
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
139
Thejoin pipe, consisting of 32 identical join chips, performs the operation as a sequence of two steps: first, similar to ASLM, source and target tuples are selected and projected, and then the join is performed on the explicitly generated reduced relations. The sorting pipe is organized as eight chips of 16 * 16 sorting modules. Figure 12 depicts the overall organization of a 4 * 4 sorting module. I t is a
Ilnsorlcd (sorted1 list
J
Ilnsorted (sorted) list
12. A 4
* 4 sorting pipe.
140
A . R. HURSON et a / .
matrix of comparator/switches. The delay elements as depicted in Fig. 12 are used to synchronize the arrival and departure of tuples to and from the 4 * 4 matrix at the center. In general, the system is highly adaptable for VLSI implementation and is topologically similar to CRAY. As discussed by Gajski et al. (1984), some of the chips are under implementation. However, due to the fact that the modules are not general, data transmission within the system is high, which could degrade performance.
3.3.5 GRACE The GRACE relational DBM currently being developed at the University of Tokyo is aimed at achieving high performance by adapting processing algorithms based on hash and sort operations (Fushimi et al., 1986). The general philosophy behind the query processing in GRACE is based on clustering the databases via hashing, a uniform distribution of clusters among several temporary storage units, and parallel execution of processing units on distributed clusters. For example, in a join operation the hash clustering feature of GRACE allows the source and target relations to be partitioned into disjoint sets of smaller relations (subrelation), where just source and target subrelations stored in a memory module need to be joined together. Figure 13 depicts the overall architecture of GRACE. The system is composed of two basic rings (e.g., processing ring and staging ring) and four fundamental modules (e.g., processing module (PM), memory module (MM), disk module (DM), and control module (CM)).The processing module performs relational operations on stored data in the memory module in a pipeline fashion. The disk module is a small processor capable of filtering the data according to the user query and distributing the selected tuples among the memory modules. Finally, the control module monitors the flow of data and operations within different modules in the system.
3.3.6 Delta The Delta database machine (Hiroshi et al., 1984, Kakuta et al., 1985) is currently being studied at ICOT (Institute for New Generation Computer Technology). It is planned to be connected to a set of inference machines as part of Japan’s fifth-generation project. The architecture of Delta is composed of five components (Fig. 14): (a) Interface Processor (IP) is an interface between Delta and inference machines. This interaction is through a local area network (LAN). (b) Control Processor (CP) provides database management functions to manage the resources and control the flow of data and operations.
t@
PROCESSING RING
Fic;. 13. The overall organization of GRACE. PM Module: DM = Disk Module; CM = Control Module
=
Processing Module; MM
=
Memory
Local Area Networh
tInterface Processor
C'o n Ir o I Processor
I
Relational DataBase Engine
Hierarchical Memory
FIG. 14. Functional architecture of Delta
142
A. R . HURSON et a /
(c) Relational DataBase Engine (RDBE) is the key component for processing relational operations in Delta, based on sorting algorithms. (d) Maintenance Processor (MP), provides functions to enhance Delta’s reliability and serviceability. (e) Hierarchical Memory (HM) provides functions such as sorting, accessing, clustering and maintaining relations. This component is composed of a general-purpose process as controller and a hierarchy of memory units consisting of a semiconductor memory and large-capacity moving-head disks. The query processing in Delta is as follows: The Interface Processor receives the translated user’s commands (e.g., Delta Command) from a host connected to the LAN, and sends them to the Central Processor. The Central Processor translates these commands into a sequence of internal subcommands which are then issued to the Relational DataBase Engine and Hierarchical Memory. After execution of the commands, the IP transfers the result to the host. The Relational DataBase Engine as the key component for handling relational operations is composed of a general-purpose CPU, a sorting pipe of 12 stages, and a merger. A combination of sorting pipe and merger is used to perform a sort-merge join operation. As mentioned before, Delta functions as a database server to Personal Sequential Inference (PSI) machines in a local area network. The PSIS use a logic programming language which must interface with Delta. To avoid the possible bottleneck, Delta does not offer a wide range of database functions. 3.3.7
MIRDM-
Mlchigan Relational Database Machine
The MIRDM (MIchigan Relational Database Machine) is a backend relational machine suitable for supporting concurrent on-line very large databases (Qadah and Irani, 1985). To reduce the data transportation the system maintains a series of index tables to allow data granularity at the page level. However, this reduction is achieved at the expense of more complex operations and extra space overhead. MIRDM architecture can be classified as a loosely coupled MSIMD (Multiple Single Instruction-stream, Multiple Data-stream) architecture. As depicted in Fig. 15, it is composed of four basic components: (a) Master Buckend Controller (MBC). Like any other backend system, MBC is an interface between frontend machine and the backend database system. It translates the user queries, schedules and monitors the query execution, manages and controls the different components of MIRDM and provides security checking and integrity mechanisms.
143
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
(b) Mass Storage Subsystem(MSS). MSS is a two-level memory hierarchy consisting of a Mass Memory (MM) organized as a set of moving-head disks, and parallel buffer (PB) organized as a set of blocks. (c) Processing Cluster Subsystem (PCS). This is the major component to perform relational algebra operations on the data. Each processor in a cluster is composed of three modules: local memory, controller, and a processing unit. (d) Interconnection Network Subsystem (INS). This unit is designed to allow simultaneous and dynamic interactions among the PCSs and PBs on one hand and PBs and MM on the other hand. As reported by Qadah and Irani (1989, a prototype of MIRDM consisting of a processing cluster with eight modules is under development.
3.4 Related Efforts Several database machines have been proposed that fall outside of the focus of this paper, but are significant enough to deserve mention. The concept of
MASTER BACKEND CONTROLLER
- - - - - - - - - - - -. - - - - - 1
I I
ROCESSING 'LUSTER UBSYSTEM
I
I
PC,
PC,
--- -
PC.
- -t- - - - - - - - - - - -
I I
I
I
I I
I
I
r - - - - - - -1-
-1
Y INTERCONNECTION NETWORK
I
IASS STORAGE UBSYSTEM
1
I
PBB,
L-
_ _ _ -_
I
PBB,
PBB,
f
_ _ _ _ _ _ _ _ _ _ _J
PARALLEL BUFFER
-----_____________-___-_-_
FIG. IS. The organization of the Michigan Database Machine.
1 I
I J
144
A. R . HURSON et a /
data flow has been incorporated into several recent DBM proposals. GAMMA (DeWitt et a/., 1987), AGM (Bic and Hartmann, 1985),and FLASH (Boral and DeWitt, 1980) are some examples. In addition, a great deal of effort has recently been directed towards developing massive memory DBMs. New architectures in this area include MARS (Eich, 1987), MM-DBS (Lehman, 1986), and MMM (Garcia-Molina et al., 1984).
4.
Conclusion and Future Directions
The purpose of this paper was twofold: i ) to study the architectural aspects of DBMs, and ii) to investigate the impact of technology on DBM design. We also intended to put technology-oriented DBMs into a perspective. To accomplish this we have proposed a new classification scheme that emphasizes the adaptability of a design to current technology. As a result, several classification schemes of database machines have been overviewed and compared. The new classification scheme provides an extension to Qadah’s database space (Qadah, 1985). The new coordinate studies the database machines according to their compatibility for current advances in technology. This coordinate classifies the DBMs as low-, semi-, and high-compatible architectures. Several machines which were discussed are summarized according to our database space in Table 11. One of our goals in doing this survey was to develop an understanding of what architectural aspects of the existing designs make them “practical” with respect to current technology. It is our feeling that the new classification scheme aids the reader in making similar judgments. Obviously, we did not intend to provide a critique of database machines. However, one should briefly address the lack of widespread use of DBMs after two decades of research and development. One major problem plaguing the DBMs is the 1 / 0 bottleneck since little attention has been given to exploiting mechanisms for increasing 1/0 bandwidth. Boral and DeWitt (1985) have devised two solutions for this purpose. The basic idea in the first solution is due to the use of a customized disk controller that permits simultaneous data transfer between the controller and conventional disk drives. The second solution is based on the utilization of an 1 / 0 cache. Most database machine designers have failed to address issues central to the future success of their designs. In particular, issues such as how updates are performed have been ignored in many cases. In addition, designers have generally failed to address recent developments in database theory (Hurson and Miller, 1987; Miller and Hurson, 1986). The importance of such developments in software systems means that designers of practical DBMs cannot continue to ignore such issues.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
145
TAIWE I1
CLASSIFICATION OF DATAHASE MACHINES
Machine PPQP* (Gajski et al. 1984) ASLM (Hurson 1981. 1983) Systolic (Kung and Lehman. 1980; Lehman, 1981) RAP3 (Otlazer and Ozkarahan, 1980) DIALOG (Wah and Yao. 1980) Tree Machine (Bentley and Kung. 1979; Song 1980; Bonuccelli er al., 1985) Delta (Hiroshi CI a/., 1984; Kakuta et a/., 1985 GRACE (Fushimi 41 a/., 1986) MIRDM (Qadah, 1985) DBC (Banerjee et a/., 1978, 1979) DIRECT (Bitton ct d., 1983; Boral and DeWitt, 1981; Boral et a!., 1982: DeWitt, 1979; DeWitt and Hawthorn, 1981) RELACS (Oliver. 1979) RAP2 (Schuster 1979) CASSM (Su. 1979) Search Processor (Leilich P I a/., 1977) RARES (Lin et ul.. 1976) RAP1 (Ozkarahan et d , 1975) Moulder (Moulder, 1973)
Technology compatibility
Indexing level
high
Prcicessing location
Parallelism
relation
oB disk
MOMD
high
relation
off disk
SOMD
high
relation
olT disk
SOMD
semi
relation
off disk
MOMD
semi
relation
hybrid
MOMD
high
relation
off disk
SOMD
high
relation
off disk
SOMD
high
relation
hybrid
MOMD
high
Page
off disk
MOMD
semi
Page
hybrid
SOMD
semi
relation
oR disk
MOMD
semi
relation
OR disk
MOMD
semi
relation
off disk
MOMD
low
database
on disk
SOMD
low
database
on disk
MOSD
low
database
on disk
SOMD
semi
database
on disk
SOMD
low
database
off disk
SOMD
146
A. R. HURSON et a /
TABLEI I (Continued) Machine Linde (Linde et al., 1973) Defiore (Defore and Berra, 1973) CASSS (Healy et a/., 1972) RAPID (Parhami, 1972) Logic Per Track (Parker, 1971) Gaines (Gaines 1Y65) Lee (Lee and Paull, 1963)
Technology compatibility
Indexing level
Processing location
Parallelism
low
database
OB disk
SOMD
low
database
on disk
SOMD
low
database
OB disk
SOMD
low
database
off disk
SOMD
low
database
on disk
SOMD
low
database
off disk
SOMD
low
database
otT disk
SOMD
* PPQP: Parallel Pipelined Query Processing As proposed by Boral and Redfield (1985), the future database machine designers should concentrate on two important issues: i ) the investigation of the effect and benefit of a specialized database operating system on their DBM designs, and ii) the optimization of the system throughput rather than improving the response time of a single request. REFERENCES Ambardar, V. (1985). Data base machines. Proc. /Nth Annu. Hawaii Int. ConJ. System Sciences I, pp. 352-372. Arora, S. K., and Dumpala, S. R. (1981). WCRC: An ANSI SPARC machine architecture for data base management. 8th Annu. Symp. Computer Architecture, pp. 373- 387. Babb, E. (1979). Implementing a relational data base by means of specialized hardware. ACM Trans. Data Base Systems 4, 1-29. Banerjee, J. Hsiao, D. K., and Baum R. J. (1978). Concepts and capabilities of a data base computer. ACM Trans. Data Buse Systems 3, 347-384. Banerjee, J.. Hsiao, D. K., and Kannan, (1979). DBC-A database computer for very large data bases. IEEE Trans. Computer 28,414-429. Bentley, J. L., and Kung H.T. (1979). A tree machine for searching problems. IEEE Int. ConJ. Parallel Processing, pp. 257-266. Berra, B. P. (1974). Some problems in associative processor applications to database management. Nat. Computer Conf., pp. 1-5. Berra, B. P. (1977). Database machines. ACM S I G I R Forum 13,4-23. Berra, B., and Oliver, E. (1979). The role of associative array in data base machine architecture. IEEE Computer 12,53-61. Bic, L., and Hartmann, R. L. (1985). Hither hundreds of processors in a database machine. Database Machine 4th lnf. Workshop, pp. 153-168.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
147
Bird, R. M., Tu. J. C.. and Worthy, R. M. (1977). Associative/paralleI processors for searching very large textual data bases. Proc. 3rd Non-Numeric Workshop, pp. I - 9. Bitton, D. et a/. (1983). Parallel algorithms for the execution of relational data base operations. A C M Trans. Data Base Systems 8, 324-353. Block E., and Galage, D. (1978). Component progress; its elTect on high speed computer architecture and machine organization. Computer 11.64-76. Bonuccelli, M. A., Lodi. E., Luccio F.. Maestrini, P., and Pagli. L. (1985). VLSI algorithms and architecture for relational operations. Estrurto dtr Colcolo XXII. 63-90. Boral, H. L.. and DeWitt, D. J. (19x0). Design consideration for dataflow database machines. Proc. ACM SIGMOD Int. Conf. Managemenr of Darn, pp. 95- 104. Boral, H. L., and DeWitt, D. J . (1981). Processor allocation strategies for multiprocessor database machines. AC M Trans. Datahuse Systems 6, 227 - 254. Boral, H. L.. and DeWitt, D. J. (1985). Database machines: an idea whose time has passed? a critique of the future of database machines. “Database Machines”. pp. 166- 187, Springer Verlag. B o d , H. L., and Redfield, S. (1985).Database machine morphology. Proc. Very Lorge Databases. pp. 59-71, Boral, H. L., DeWitt, D. J., Friedland. D.. Jarrel, N. F., and Wilkinson. K. W. (1982). Implementation of the database machine direct. I Trans. Sofiwure Engineering SE-8, 533-543.
Boyce, R. F.. Chamberlin, D. D.. King, W. F., 111, and Hammer, M. M. (1975). Specifying queries as relational expression: the SQUARE data sublanguage. Comm. ACM 18,621- 628. Bray, H. 0.. and Freeman, H. A. (1979). “Database Computers.” Lexington Books. Massachussetts. Bray. H. 0..and Thurber. K. J. (1979). What‘s happening with data base processors’? Dtrtamution, 146- 156. Cardenas. A. F. (1975). Analysis and performance of inverted database structures. Comm. A C M 18,253-273.
Chamberlin, D. D. ( 1976). Relational data base management systems. Computing Surwys 8, 43 -.59.
Chamberlin, D. D., Gray, J. N., and Traiger. I . L. (1975). Views, authorization and locking in a relational data base system. Nut. Computer Conf. 44, pp. 425-430. Champine, G . A. (1978). Four approaches to a data base computer. Datamation, 101-106. Champine, G. A. (1979). Current trends in data base systems. Computer 12, 27-41. Davis. E. W. (1974). STARAN parallel processor system software. AFIPS Nut. Computer ConJ. 43, pp. 17-22. Davis. E. W. (1983). Application of the massively parallel processor to database management systems. Proc. AFIPS Nut. Compurer ConJ., pp. 299 -307. Defiore, C. R., and Berra, P. B. (1973). A data management system utilizing an associative memory. Nut. Compurer Cotf. 42, pp. 181 185. Demurjian. S. A.. Hsiao D. K., and Strawser P. R. (1986). Design analysis and performance evaluation methodologies for database computers. In “Advances in Computers” 25 (M. C. Yovits. ed.). pp. 101-214. DeWitt, D. J. (1979). DIRECT-A multiprocessor organization for supporting relational database management systems. I E E E Tram. Compu~ersC-28, 395-406. DeWitt. D. J., and Hawthorn, P. B. (1981). A performance evaluation of database machine architecture. Proc. 7th Int. Cottf. Very Lurge Data nases, pp. 199-213. DeWitt, D. J.. ef a/. (1987). A Single User Evaluation of a Gamma Database Machine. I n “Database Machines and Knowledge Base Machines” (M. Kitsuregawa and H. Tanaka, eds.), pp. 370-386. Kluwer, Academic Publishers. Eich. M. H. (1987). MARS: the design of a main memory database machine. In “Database ~
148
A . R . HURSON el a /
Machines and Knowledge Based Machines”, (M. Kitsuregawa and H. Tanaka, eds.), pp. 325-338. Kluwer, Academic Publishers. Feng, T. Y. (1976). Some considerations in multiprocessor architecture. In “Multiprocessor Systems’‘ (C. H. White, ed.), pp. 277-286. Infotech state of the art report, England. Finnila, C. A., and Love, H. H., Jr. (1977). The associative linear array processor. I E E E Trans. Electron Cornpulers 16, 112-142. Fletcher, J. C. (1984). Report of the study of eliminating the threat posed by nuclear ballistic missiles. MDA 90384 Coo31. Task T-3-191. Flynn, M. J. (1966). Very high speed computing systems. Proc. I E E E 54, 1901-1909. Fushimi, S., Kitsuregawa, M., and Tanaka, H. (1986). An overview of the system software of a parallel relational database machine GRACE. Proc. Int. Con/. Very Large Databases, pp. 209-219. Gaines, R. S.. and Lee, C. Y. (1965). An improved cell memory. I E E E Trans. Elecrron Computers C-14, 72-75. Gajski, D.,Kim. W., and Fushimi, S. (1984). A parallel pipelined relational query processor: an architectural overview. Proc. l l t h Annu. Symp. Computer Architecture, pp. 134-141. Garcia-Molina, H., Lipton, R. J., and Valdes, J. (1984).A massive memory machine. I E E E Trans. Computers C-33, 391 -399. Gonzalez-Rubio, R., and Rohmer, J. (1984).The schuss filter: a processor for non-numerical data processing. Proc. 11th Annu. Symp. Compufer Architecture, pp. 393-395. Handler. W. (1982). Innovative computer architecture to increase parallelism but not complexity. In “Parallel Processing Systems” (D. J. Evans, ed.), pp. 1-41. Cambridge University Press, London. Haskin, R. (1978). Hardware for searching very large text databases. Ph.D. Dissertation, University of Illinois, Urbana, Illinois. Haskin, R., and Hollaar, L. A. (1983). Operational characteristics of a hardware-based pattern matcher, A C M 7run.s. Database Systems 8, 15-40. Hawthorn, P. (1980). Panel on database machines. Proc. 6th Inf. ConJ. Very Large Data Bases, pp. 393-395. Healy, L. D., Lipovski, G . J., and Doty, K. L. (1972). The architecture of a context addressed segment-sequential storage. Full Joint Computer Conf.41, 691 -701. Held, G . D., Stonebraker, M. R., and Wong, E. (1975).INGRES-A relational data base system. Nut. Computer Con/. 44,409-416. Hikita, S., Kawakami, S., and Haniuda, H. (1985). Database machine FREND. Database Machines 4th f n t . Workshop, pp. 190-207. Hiroshi, S.. el ai. (1984). Design and implementation of the relational data base engine. Proc. Int. Con$ FiJh Generurion Computer Systems, pp. 419-426. Hong, Y. C . (1985). Efficient computing of relational algebraic primitives in a data base machine Trans. Computers C-34, 588- 595. Hsiao, D. K. (1980). Data base computers, In “Advances in Computers” 19 (M. C. Yovits, ed.), pp. 1-64. Academic Press, New York. Hsiao, D. K. (1983). “Advanced Database Machine Architecture.” Prentice Hall. Hsiao, D. K. (1983). Cost-effective ways of improving database computer performance. Nut. Computer Cons. 52, pp. 292-298. Hsiao, D. K., Kannan, K., and Kerr, D. S. (1977). Structure memory designs for a database computer. AFIPS Proc. Nur. Computer Cons.,pp. 343-350. Hurson, A. R. (1981). An associative backend machine for database management. Proc. 6 f h Workshop on Computer Architecture ,/or Non-Numeric Processor. Hurson. A. R. (1983). A special purpose mini computer for relational database management. ISM M J . Microcomputer Applicafions 2, 9- 12.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
149
Hurson, A. R. (1984). A VLSI design for the parallel finite state automaton and its performance evaluation as a hardware scanner. Int. J. Computer und Informution Scicnces 13, 491 -508. Hurson, A. R. (19x5). A mini processor for database security enforcement. ISMM J . Microcomputer App/icutions 6, 16 - 19. Hurson, A. R. (1986). VLSI time/space complexities of an associative join module. Proc. In[. Con/. Puru//e/Processing, pp. 379 - 386. Hurson. A. R., and Miller. L. L. (1982). A non-numeric processor. A I C A Annu. Cot$ Ituliun Computer Society, pp. 845 -X52. Hurson, A. R., and Miller, L. L. (1983). Security issues in backend database machines. Proc. 17th Annu. Con/. It$orniufion Science and Systems. pp. 690-696. Hurson, A. R.. and Miller, L. L. (1987). A database machine architecture for supporting incomplete information. J . Computer Science und Systems Engineering 2, 107- 1 16. Kakuta, T., Miyazaki, N., Shibayama, S., Yokota, H., and Murakami. K. (1985). The design and implementation of relational database machine delta. Dutuhuse Machine 4th Int. Workshop, pp. 13- 34. Kambayashi, Y. (1984). A Database Machine Rased on the Data Distribution Approach. Nut. Computer Cot$ 53, pp. 61 3 -625. Kambayashi, N., and Seo, K. (1982). SPIRIT-Ill: an advanced relational database machine introducing a novel data-staging architecture with tuple stream filters to preprocess relational algebra. Nut. C'ompufcr C o n / . 51, pp. 605-616. Kannan, K. (1978). The design of a mass memory for a database computer. Proc. 5th Annu. Symp. Computer Architecturr. pp. 44- 50. Kerr, D. S. (1979). Data base machines with large content addressable blocks and structural information processors. Computer 12,64-8 I . Kim, Won, (1979). Relational database systems. Computing Surveys I I , 1x5-21 I. Klass. P. J. (1984). New microcircuits challenge silicon use. Ariiufiori Week o/' Space Techno/oqy, 179- 184.
Kung. H. T.. and Lehman, P. L. (1980). Systolic (VLSI)arrays for relational database operations. A C M I S I G M O D Int. c'nnf. Munugumeiit of Dutu, pp. 105-1 16. Lamb, S. (1978). An ADD-in recognition memory for S-100 bus micro computers. C'ompu[rr Design 17, 162- 168. Lamb, S., and Vanderslice. R. (1978). Recognition memory: low cost content addressable parallel the Acou.stii.ul Society o/ America processor for speech data manipulation. Joint Meeting und Acousticul Society o / Jirprrn. Langdon. G . G., Jr. (1978). A note on associative processors for data management. A C M Trans. Diiiu Rose Systems 3, 148-158. Langdon. G . G., Jr. (1979). Database machines: an introduction. I E E E Trans. C o m p u r m C-28, 3X 1-383. Lea, R. M. (1976). "Associative Processing of Non-numerical Information." D. Reider Publishing Company, Dordrecht, Holland. Lee, C. Y., and Paull, M . C. (19631. A content addressable distributed logic memory with applications to information retrieval. Proc. I E E E 51, 924-932. Lee, S. Y., and Chang, H. (1979). Associative search bubble devices for content addressable memory and array logic, I Truns. Computers C-28, 627 636. Lehman, P. L. (1981). A systolic (VLSI) array for processing simple relational queries. In "VLSI Systems and Computations" (Kung et id., eds.), pp. 285-295. Computer Science Press. Lehman. T. J. (1986). Design and performance evaluation of a main memory relational database system. Ph.D. Dissertation, University of Wisconsin-Madison. Leilich. H. O.,Steige, G.,and Zeidler, H . C. (1977). A search processor for data base management systems. Technical University of Braunchweig, Braunchweig, Germany. -
150
A. R. HURSON el a /
Lin, C. S., Smith, D., and Smith, J. M. (1976). The design of a rotating associative memory for relational database applications. A C M Trans. Database Systems 1, 53-65. Linde, R. R., Gates, R., and Peng, T. (1973). Associative processor applications to real-time data management. Nut. Computer Conf.42, pp. 187-195. Lipovski, S. J. (1978).Architectural features of CASSM: a context addressed segment sequential memory. S I G A R C H Newsletter 6, 31-38. Liuzzi, R. A., and Berra, P. B. (1982). A methodology for the development of special-purpose function architectures. Nut. Computer Conf. 51, 125-134. Mahoney, W. C. (1981). Defense mapping agency (DMA) overview of mapping, charting, and geodesy (MC&G) applications of digital image pattern recognition. SP I E 281, Techniques and Applications of Image Understanding, 11-23. Maller, V. A. (1979).The content addressable file store-CAFS. ICL Tech. J . 2,265-279. Maryanski, F. J. (1980). Backend database systems. A C M Computing Surveys 12, 3-25. Miller, L. L. and Hurson, A. R. (1983). Performance evaluation of database machines based on the database relationships. IEEE Workship on Computer Architecture ,for Pattern Analysis and Image Database Management, pp. 187- 192. Miller, L. L., and Hurson, A. R. (1986). Maybe algebra operators in database machine architecture. Proc. Fa“ Joint Computer Conf., pp. 1210-1218. Minsky, N. (1972). Rotating storage devices as partially associative memories. Proc. A F I P S Fall Joint Computer Conf.,pp. 587-595. Moulder, R. (1973).An implementation of a d a t a management system on an associative processor. Nat. Computer ConJ., pp. 171 - 176. Mukherjee, A. ( I 986).“Introduction to nMOS and CMOS VLSI Systems Design.” Prentice-Hall, New Jersey. Mukhopadhyay, A. (1979). Hardware algorithms for non-numeric computation. IEEE Trans. Computers C-28, 384- 394. Mukhopadhyay, A,, and Hurson, A. R. (1979). An associative search language for data management. Nut. Computer Conf., pp. 727-732. Muraszkiewicz, M. (1981). Concepts of sorting and projection in a cellular array. Proc. 7th Int. Conf. Very Large D a t a Bases, pp. 76-79. Myers, G . J. (1978).“Advances in Computer Architecture.” Wiley, New York. Oflazer, K. (1983). A reconfigurable VLSI architecture for a database processor. Nut. Computer Conf., pp. 271-281. Oflazer, K., and Ozkarahan, A. E. (1980). RAP3-A multi-microprocessor cell architecture for the RAP database machine. Proc. I n t . Workshop on fligh Level Language Computer Architecture, pp. 108- 119. Oliver, E. J. (1979). RELACS, an associative computer architecture For a relational data model. Ph.D. Dissertation, Syracuse University. Ozkarahan, E. A,, Schuster, S. A,, and Smith, K. C. (1974). A data base processor. Technical Report, University of Toronto. Ozkarahan, E. A., Schuster, S. A,, and Smith, K. C. (1975). RAP-an associative processor for data base management. Nut. Computer Conf., pp. 379-387. Ozkarahan, E. A,, Schuster, S. A,, and Smith, K. C. (1976). A high level machine-oriented assembler language for a database machine. Technical Report, University of Toronto. Parhami, B. (1972). A highly parallel computing system For information retrieval. Fall Joint Computer ConJ., pp. 68 1-690. Parker, J. L. (1971). A logic per track retrieval system. Technical Report, University of British Columbia. Qadah, G . Z. (1985). Database machines: a survey. Not Computer Conf.,54,211-223. Qadah, G. Z., and Irani, K. B. (1985). A database machine for very large relational databases. I E E E Trans. Computers C-34 1015-1025.
PARALLEL ARCHITECTURES FOR DATABASE SYSTEMS
151
Rosenthal. R. S. (1977). The data management machine, A classification. Third Computer Architecture f o r Non-numeric Processing. pp. 35 39. Rudolph, J. A. (1972). A production implementation of an associative array processorSTARAN. Fall Joinr Computer Con/:, pp. 229-241. Schuster, S. A., Nguyen, H. B., Ozkarahan, E. A.. and Smith, K . C. (1979).RAP2 -an associative processor for databases and its application. I E E E Trans. Computers C-28,446-458. Shaw, E. (1980). Relational operator. Minnowhroak Workshop o n Dnfu Buse Machine. Slotnick. D. L. (1970). Logic per track devices. In "Advances in Computers" 10 (F. L. Ah. and M. Rubenoff, eds.). Academic Press, New York. Smith, D. C . ,and Smith, J. M. (1979). Relational database machines. Computer 12, 28- 38. Song, S. W. (1980). A highly concurrent tree machine for database applications. Proc. Int. Conf. Parallel Processing, pp. 259 268. Song, S. W. (1983). A survey and taxonomy of database machines. In "Database Engineering" (W. Kim, D. Batory. A. Herner, R. Katz and D. Reiner, eds.), pp. 5-15. Su, S. Y. W. (1979). Cellular logic devices concepts and application. Computer 12, 1 1-25, Su, S. Y. W.. and Emam, A. (1980).CASDAL: CASSM's Data Language. A C M Truns. Daruhuse Systems 3, 57-91. Su, S. Y. W.. Nguyen. H. B., Emam, A., and Lipovski, G. J. (1979).The architectural features and implementation techniques of the multi cell CASSM. I E E E Trans. Compufers C-28,430-445. Su, S. Y. W., er ul. (1980).Database machines and some issues on DBMS standards. Nnr. Compufer CfJnf.,pp. I9 I - 208. Tdnaka, K., and Kambayashi. Y. ( 1 981). Logical integration of locally independent relational databases into a distributed database. Proc. 7 / h Inr. Cot$ Very Large Dara Rases, pp. 131 -141. Tong, F.. and Yao, S. B. (1982).Performance analysis of database join processors. N u / . Compufer Conf. 51, pp. 627-637. Valduriez, P., and Gardarin, ( 3 . (1984). Join and semijoin algorithms for a multiprocessor data base machine. A C M Trans. D u ~ u ~ (Sysrems J s ~ 9, 133-161. Wah, B. W., and Yao. A. S. (1980).DIALOG- A distributed processor organization for database machine. Nut. Computer Conf. 49, 243-253. ~
~
This Page Intentionally Left Blank
Optical and Optoelectronic Computing MIR MOJTABA MIRSALEHI MUSTAFA A . G . ABUSHAGUR Electrical and Computer €ngineering Department University of Alabama in Huntsville Huntsville. Alabama
H. JOHN CAULFIELD Center for Applied Optics University of Alabama in Huntsville Huntsville Alabama
.
I . Introduction . . . . . . . . . . . . 2. Basic Operations for Optical Computations . 2.1 Transmittance Function . . . . . . 2.2 Addition and Subtraction . . . . . 2.3 Multiplication . . . . . . . . . 3 . Elements of Optical Computers . . . . . 3.1 Memories . . . . . . . . . . . 3.2 Logic Arrays . . . . . . . . . . 3.3 Input/Output Devices . . . . . . . 3.4 Interconnections . . . . . . . . . 4. Analog Processors . . . . . . . . . . 4.1 Fourier Transform Processors . . . . 4.2 Correlators . . . . . . . . . . 4.3 Spatial Filters . . . . . . . . . . 4.4 Image Processors . . . . . . . 4.5 Nonlinear Processors . . . . . . . 4.6 Applications . . . . . . . . . . 5. Digital Processors . . . . . . . . . . 5.1 Number Representation . . . . . . 5.2 Computing Techniques . . . . . . 5.3 Architectures and Technologies . . . 6. Hybrid Processors . . . . . . . . . . 6.1 General Hybrid Processor . . . . . 6.2 Algebraic Processors . . . . . . . 6.3 Diffraction- Pattern Sampling Processor 7. Conclusion . . . . . . . . . . . . Acknowledgement . . . . . . . . . . References . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
.
.
. . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
154 155 155 155 156 157 157 164 168
170 175 175 178 179 182 186 189
196 197
. . . . . . . . . . . . .
199
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203 212 213 214 218 219 221 221
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
153
.
ADVANCES IN COMPUTERS. VOL . 28
Copyright 1 1989 hy Acidemic Prcs5 Inc All rights of rcproduction In any form reserved I S B N n- I ?-oi 2 I 28-x
154
MIR MOJTABA MlRSALEHl
1.
Introduction
Today, “optical computing” has become a hot topic. Popular magazines describe its promise. Financial investors seek out its experts. The governments of the world support it (with varying degrees of enthusiasm). International meeting attendance increases at a high rate. We examine here a view of what this field is today and where it will be in the dim dark future of the next decade. This is an introduction, not an exhaustive treatment. Breadth was favored over detail; understanding, over rigor. By “optical computing” we mean the use of light as a primary (but not exclusive) means for carrying out numerical calculations, reasoning, artificial intelligence, etc. The two features worth noting in this definition are as follows: 1. Optical computing is defined by method (heavily optical) not by effect (numerical calculation, inference, etc.). It attempts no new tasks. Hence, it must offer demonstrable, decisive advantages over its all-electronic competition if it is to succeed. 2. “Pure” optics is not sought or possible. As we will show later, there is always some overt or covert electronics in optical computers.
Optical computing emerged as a discipline in this decade and became hot in the 1983-1984 time period. The fields from which it arose are holography and coherent optical information processing. The major initial goals of these two fields have yet to be achieved. Spawning optical computing may prove to be their ultimate achievement. Optical Fourier transform analysis of time signals and optical image formation for synthetic-aperture radar are probably the most important technical legacies of the pre-1980s work. Even in recent retrospect it is hard to see what sparked the rapid growth of optical computing. In the late 1970s and early 1980s the most talked about papers were on parallel, analog matrix-vector multiplication (Goodman et al., 1978), optical iterative algorithms (Caulfield ef al., 198la), optical systolic array processors (Caulfield et al., 1981b), and digital multiplication by analog convolution (Psaltis et al., 1980). Interestingly, as originally conceived, these concepts proved considerably less valuable than their authors hoped. What was accomplished was the creation of a community out of which far more useful techniques have arisen. Today, it is widely accepted that optical computing should not duplicate the architectures that have been used for electronic computers, but should utilize techniques that use the strengths of optics and avoid its weaknesses. One of the most important features of optics is its capability of global interconnections (Goodman et al., 1984). Therefore, areas such as neural networks (Farhat et al., 1985) that utilize this feature are the most promising areas for optical computing.
OPTICAL AND OPTOELECTRONIC COMPUTING
155
In this chapter. the fundamentals of optical computing are presented. First, the optical implementations of the basic operations are described in Section 2. Thzn, the elements of optical computers are introduced in Section 3. These include memories, logic arrays, input/output devices, and interconnections. Sections 4, 5, and 6 are devoted to different types of optical processors, i.e., analog processors, digital processors, and hybrid processors. Finally, in Section 7, the present status of optical computing is summarized and the future of this field in the next decade is predicted.
2.
Basic Operations for Optical Computations
In this section, we introduce the basic mathematical operations for optical computing. These are addition, subtraction, and multiplication. More complex operations, such as Fourier transformation and correlation, will be introduced in Sec. 4. Before describing the basic operations, we need to introduce the concept of transmittance function. 2.1
Transmittance Function
I n optics, light waves are the agents that carry the information throughout the system. In optical systems, the input functions (either one- or twodimensional) are usually entered by spatial light modulators (SLMs). Various types of SLMs exist; among them are photographic films, optical transparencies, acousto-optic modulators, electro-optic modulators, and magnetooptic modulators. These devices display functions spatially (in the space domain). In Fig. 1, an optical transparency is illuminated by a light wave U ,(x, y). The emerging wave just to the right of the transparency is U,(x, y). In general, U , and U , are complex quantities whose magnitudes and angles correspond to the amplitudes and relative phases of the input and output waves. The transmittance function, t ( x , y), of a SLM is defined as
The magnitude of t represents the attenuation of the light wave as it goes through the SLM and the angle of t represents the phase difference between the input and the output waves. 2.2
Addition and Subtraction
Let . f ( x , y ) and y(x, y ) be the two functions that should be added or subtracted optically. First, consider the operation of addition. The two
MIR MOJTABA MlRSALEHl
/ 0%
/
/
Fic;. 1. An optical transparency. V,(.u. y) and U 2 ( x ,y) represent the electric fields of the input and the output light, respectively.
functions are recorded as transparencies T I and T, , respectively. Collimated light beams illuminate both T, and T,, as shown in Fig. 2. The addition is performed by the beamsplitter BS. The lens L is used to image both T, and T, on the screen, where the sum of the two functions is obtained. In Fig. 2, if the optical path difference between TI and T, is an odd multiple of A/2, where A is the wavelength of the light, the operation becomes subtraction. This can be achieved by inserting a half-wave plate (a plate that introduces 180" phase shift) in the system in front of f ( x , y).
2.3 Multiplication The multiplication of f ( x , y) and g(x, y) can be achieved by various optical systems. A simple method is shown in Fig. 3. The two functions f and y are recorded on the transparencies TI and T,, respectively. Lenses L, and L,
OPTICAL AND OPTOELEGTRONIC COMPUTING
157
PARALLEL BEAM
I I G . 2. Optical system for adding two functions J ( x , J J ) and q(x, J). T I and T, are optical transparencies, BS is a heamsplitter, L is a spherical lens, and S is a screen (Iizuka, 1985).
FIG.3. Optical system for multiplying two functions /(.x,.v) and g(.x.y), where ,f, and f2 are the focal lengths of lenses L , and L,, respectively. The distances d , and d , are chosen such that T, is imaged on the screen S (lizuka, 1985).
produce an inverted magnified image of .f (with a magnification factor of j 2 / f , )on the plane of T,. The transmitted light from T, is f ( x , y ) g ( x , y). Lens L, is used to image the result on the screen. 3.
Elements of Optical Computers
3.1 3.1.1
Memories
Holographic Memories
Since the invention of digital computers, there has been an increasing demand for memories with large storage capacity, low access time, and low cost. The technologies that have been used for data storage, such as bipolar,
158
MIR MOJTABA MlRSALEHl
MOS, and magnetic tapes, do not satisfy all of the above requirements. High storage capacities usually require large physical sizes which result in slow access times. The development of reliable lasers in the mid 1960s and the progress in holography motivated research on holographic memories in the 1970s. Holography is a technique for storing the whole information about an object. The word holograph is taken from the Greek word holographos, where holo means whole and graphos means written. Holography was invented in 1948 (Gabor, 1948),but it only became popular after the invention of the laser in 1960. Figure 4 shows the schematic diagram of a holographic system. During the recording process, a coherent light (usually a laser) is split into two beams: an object beam and a reference beam. The former illuminates the object and as a result it is affected by both the amplitude and phase of the transmittance function of the object. In photography, only the amplitude information is recorded, while in holography both the amplitude and phase information are recorded. This is achieved by bringing the reference and object beams together to construct an interference pattern which is then recorded as a hologram in a photosensitive material. Various materials can be used for recording, such as photographic plate, dichromated gelatin, thermoplastics, and photorefractive crystals. It is important to note that the recording of both amplitude and phase information is possible not because of the type of recording material, but because of the interference pattern that has been created. The coherence of light is needed in order to create this interference pattern. After the hologram is recorded and developed, it can be used to reconstruct an image of the object. During the reconstruction, the same reference beam is used to illuminate the hologram. It can be shown that because of the pattern that has been recorded on the hologram, part of the energy of the reference beam will be coupled into a wave that is identical to the object beam. As a result, a person looking at the hologram will see a three-dimensional object which actually does not exist. Holograms have many applications in industry and science (Caulfield, 1979).Here we briefly describe the application of holography for data storage. For more information, the reader is referred to the article by Gaylord (1979). Holographic memories do not use the 3-D display characteristic of holography. Data is coded in 2-D arrays as dark and bright spots representing zeros and ones, and each data array is recorded as a hologram. There are different types of holograms. The one that is most suitable for data storage is the Fourier hologram. In this type, using a spherical lens, the Fourier transform of the data mask is obtained at the recording medium where it is interfered with a reference beam. The main advantage of Fourier holograms is that. since the information is recorded as the Fourier transform of the data
OPTICAL AND OPTOELECTRONIC COMPUTING
159
REFERENCE BEAM
HOLOGRAM
OBJECT BEAM
-
REFERENCE BEAM
/
e VIEWER
FIG.4. Schematic diagram of a holographic system: (a) recording process, (b) reconstructing process.
160
MIR MOJTABA MlRSALEHl
array, a local defect of the recording medium does not produce serious errors in a particular part of the stored pattern. Instead, it results in a small degradation of the entire data array, without changing a particular bit. The information can be stored in 2- or 3-dimensional form. If a recording material such as the photographic film is used, the storage will be twodimensional, i.e., on the surface. Other materials, such as photorefractive crystals, are capable of recording the information three-dimensionally, i.e., inside the volume. The 3-D storage has the advantage of providing larger storage capacities. Theoretically, the capacity of 2-D storage can reach l / A 2 bits/m2, where A is the wavelength of the light. The capacity of a 3-D storage can reach (n/,l)3 bits/m3, where n is the index of refraction of the recording material. Considering an argon ion laser with a wavelength of A = 514.5 nm and a recording medium with n = 2, the storage capacities of 2- and 3-D memories can reach up to about 4 x lo6 bits/mm2 and 4 x lo9 bits/mm3, respectively. Therefore, a 1 cm3 crystal is sufficient to store more than 10l2bits of information. The large storage capacity of holographic memories motivated research in this field in the early 1970s. Some commercial holographic memories were introduced in that decade, such as Holoscan by Optical Data Systems and Megafetch by 3M. These systems had storage capacities of about lo7 bits, which was significantly below the theoretical limits. The research on volume holographic memories was not very successful in reaching the values predicted by theory. A major problem was to find a suitable material for volume holographic memories. An ideal material should be highly photorefractive, so that real time recording and reading become possible. Also, it should be capable of long-term storage, so that the data can be stored for long periods, if desired. Finally, simple techniques should be available for erasing particular parts of a memory without affecting the other parts. Unfortunately, such a material has not been found yet. One of the best known photorefractive materials is LiNbO,, which is a man-made crystal. Although this material can hold the information for long periods (several days and through the fixing process for several years) it is not sensitive enough for real time processing. Due to this and other difficulties, and also because of the advances in other technologies, such as optical disks, holographic data storage was abandoned in the early 1980s. However, the potential for building high-capacity and lowaccess-time holographic memories do exist. There is another application for holographic memories that does not require a material with all the characteristics mentioned above. This approach, known as the control operator processing, has been pursued in the Soviet Unions (Basov e l al., 1978). In this application, holographic memories are not used for general-purpose computers. Instead, they are used for developing special-purpose processors that perform specific tasks. In this method, each
OPTICAL AND OPTOELECTRONIC COMPUTING
161
operation is decomposed into a series of transformations over operand sets. These transformations are permanently recorded on a page-oriented holographic memory. There is no need for changing these stored patterns, since they represent the operations, not the operands. Using this technique, fundamental operations such as conjunction, disjunction, right and left shift, number inversion, logic functions, addition, and multiplication can be realized (Elion and Morozov, 1984). Another application of holography which is recently being explored is the development of associative memories. Currently, the memories that are used in computers are location-addressable memories. In this type of memory, an address is used to locate a part of the memory in which the required information is stored. Another type of memory, which is similar to the human brain, is associative memory. In this type, partial information of the content is used to recall the whole information. For example, you can recall detailed information, such as the phone number, of a close friend if you see his name or his picture. An interesting characteristic of the human brain is that it is capable of self-organization and error correction (Kohonen, 1988).For example, if the name of your friend is misspelled, you will recall the correct name and the correct information about him. Holography is inherently an associative memory (Gabor, 1969). This is due to the fact that one beam (the reference beam) can be used to reconstruct the other beam (the object beam). In general, both beams can carry information, and one beam can be used to reconstruct the other beam. This is similar to the recall of one content from another in an associative memory. By including a feedback loop and some modification, the holographic system can reconstruct the whole image if just partial information is provided to the system (Owechko et ul., 1987; Paek and Psaltis, 1987). Also, research is under way to develop self-organizing neural networks using holograms (Psaltis et ul., 1988).
3.1.2
Optical Disks
The storage of information on a disk dates back to the invention of phonograph by Thomas Edison in 1877, when he recorded and played back an audio signal on a wax cylinder. The development of videodiscs started in the late 1950s. However, most of the research in this area remained secret until 1970. In the early 1970s, videodiscs were introduced by commercial companies in Europe and the United States. Since then, videodisc technology has significantly grown. Today, more than 100 firms are involved in research and development of videodiscs. Among the applications of videodiscs are entertainment, education, and data storage. In this section, only the data storage application is described. For more information on this and other
162
MIR MOJTABA MlRSALEHl
applications of videodiscs, the reader is referred to the book Videodisc and Optical Memory Systems by Isailovii: (1985). Videodiscs, when used as data storage systems (memories), are usually referred to as optical disks. In optical disks, digital information is stored as ones and zeros. Presently, magnetic technology is used for data storage in computers. The three most commonly used data storage media are magnetic tapes, magnetic hard disks, and magnetic floppy disks. In optical disks, information is recorded and read by a tiny laser beam. This results in an enormous storage capacity. Among other advantages of optical disks are random access, long lifetime, and cheap reproduction. Optical disks are of two types: read-only disks and erasable disks. The first type is useful for archival storage and for storing data or instructions that do not need to be changed. In the second type, the recorded data can be erased or changed. This type of memory is needed for temporary data storage, such as in digital computing. Some of the materials used for nonerasable disks are tellurium, silver halide, photoresists, and photopolymers. Among the candidate materials for erasable disks, three groups are more promising. These are magneto-optic materials, phase-change materials, and thermoplastic materials. Magneto-optic materials are the most developed type of erasable media. In these materials, the information is stored as the directions of the magnetic domains. During the recording process, a laser beam is used to heat up a small spot on the disk. This lowers the coercivity of the magnetic domains. At the same time, data is introduced as a current through a magnetic coil, and a magnetic field is produced in the vicinity of that spot. As a result, the direction of the magnetic domains at that spot is determined by the external magnetic field. Currents with opposite directions are used to record ones and zeros. Hence, the magnetic domains corresponding to ones and zeros have opposite directions. The reading process is performed by using the Kerr effect. A lowerpower laser beam is used to illuminate a spot on the disk. According to the Kerr effect, the polarization of the reflected light is determined by the direction of the magnetic domains in that spot. An analyzer is used to allow the reflected light with a polarization corresponding to a stored 1 to pass, while the reflected light with a polarization corresponding to a stored 0 is blocked. A detector behind the polarizer determines the stored bit. To change the stored information, the recording process is repeated for the new information. Another group of erasable recording materials is the phase-change optical materials. These are materials, such as TeS, that have two stable forms (amorphous and crystal). The recording medium is originally crystallized. This form is usually used to represent zeros. In order to record a one, a laser beam is used to heat up a spot on the disk. This provides sufficient energy to change the form of material at that spot from crystal to amorphous. The reading process
OPTICAL AND OPTOELECTRONIC COMPUTING
163
is performed by a low-power laser. If the illuminated spot on the disk is crystal, the light is mainly reflected back, while, if the spot is amorphous, the amount of reflection is negligible. These are detected as zeros and ones, respectively. In order to erase a recorded data, a low-power diffuse beam is used. The slow heating produced by this beam results in recrystallization of the recording medium. The third group of erasable recording materials is thermoplastics. A thermoplastic disk consists of a glass substrate, a conductive layer, a photoconductive layer, and a thermoplastic layer put on top of each other. For recording, the surface of the thermoplastic is uniformly charged, and a laser beam whose intensity is modulated by the input data is focused at a spot on the disk. As a result, the conductivity of the photoconductor at that spot is decreased, and the thermoplastic surface is discharged. At the same time, the thermoplastic surface is heated by the laser to its softening temperature. This results in a surface relief pattern which resembles the input data. The pattern remains, even after the laser beam is removed and the surface is cooled. The reading process is performed by another laser beam of lower energy. The presence of a bump at a spot on the thermoplastic region results in a weak reflection which is detected as a one. The absence of a bump results in a strong reflection which is detected as a zero. In order to erase the information, the disk is heated by an erasing beam to smooth out the surface. Gas lasers, such as He-Ne, He-Cd, and Ar lasers, are usually used for the recording and reading processes of optical disks. The main advantage of gas lasers is their high reliability. However, considering the compactness of the diode lasers, there is a great potential for these devices in the optical disk technology. Different types of optical disk drives have been designed. In one type, three lasers of different wavelengths are used for recording, playback, and erasing. In another type, only one laser is used. An important feature of some optical disk devices is their ability to read the recorded data right after recording. This is known as direct read after write (DRAW), and is obtained by using a laser beam focused about 10 pm behind the writing beam. This feature is used to detect possible errors and correct them during the recording. Optical disk technology is a fast growing field. It is expected that small erasable optical disks of 3.5” or 5.5“ with capacities of 30 Mbytes to 200 Mbytes will become common in personal computers within a few years. Another application of optical disks is in archival storage. Two such systems have been developed and installed by RCA for NASA and Rome Air Development Center (RADC). These are optical disk “jukebox” mass storage systems that provide direct access to any part of a stored data of l O I 3 bits within six seconds(Ammon et ul., 1985). Each system is divided into two units: a hardware/software controller, and a disk drive unit. The controller serves as
164
MIR MOJTABA MlRSALEHl
an interface to the host system, and controls and monitors all system functions. The disk drive unit serves as the basic read/write unit and performs all the functions needed to retrieve the disks. The drive unit includes a cartridge storage module that contains 125 optical disks each of 7.8 x 10'' bits storage capacity. This storage size is beyond the capacities that are currently available with other technologies.
3.2 Logic Arrays 3.2.7
lntended Uses
All operations in modern digital computers are based on logic gates. It follows that if we can do logical operations optically, we can do digital computing optically. It does not follow that we should do digital computing optically. That is a whole different topic to which we turn next. For optical digital computers to supplant electronic digital computers they will need advantages that are profound, lasting, and affordable. The need for profound advantages in speed, size, power, cost, and reliability is clear. The electronic digital computers work well and are improving. This latter point is vital. While the optical computers are being built, electronic computers will still be improving. The cost of electronic circuits is already low and will continue to drop as we put more and more elements on the same chip. The profound advantages of optics is likely to be parallelism. If lo4 to lo6 optical logic gates can be made to operate in parallel, the number of operations per second (OPS) may be very high. Of course this requires twodimensional (2-D)arrays of gates. Ideally the individual gates should be fast, low-power, cascadable, and have similar performance characteristics. The contrast between outputs 0 and 1 should be very high, so the output can be divided into many pieces for complex operations. There are several fundamental limitations which need to be taken into account on logic gates. If the energy difference between the 0 and 1 states is less than the thermal energy kT, where k = 1.38 x J/"K is the Boltzmann's constant and T is the temperature in Kelvin, only noise, not information, will result. At room temperature, kT = 4.14 x J. Both current electronics and current optics work far above the kT limit. Indeed, current switching energies of the two technologies are comparable. The difference is that optics can never improve much below, say, 104kT,while, in principle, electronics can. The reason optics is bound far away from kT is twofold. First, optics uses J s is Planck's constant photons each of energy hv, where h = 6.63 x and v is the frequency of the light. In the visible or infrared region hv is greater than kT. For example, at v = 5 x I O l 4 Hz, hv = 3.315 x lo-'' J. Second, optics must use many photons to achieve reliability in the face of the random
165
OPTICAL AND OPTOELECTRONIC COMPUTING
(Poisson) arrival rate of photons. Thus for each logic operation we might need 1000 photons. That corresponds to a total energy of IOOOhv which is much larger than kT. Electrons also have random arrival, but since their energy can be as low as the thermal energy the fundamental limit on the energy needed in electronic gates is less than the energy needed in optical gates. The optical power is related to the number of operations per second by P 2 OPS x 1000 hv.
(3.1)
An extreme case of what might be possible would have lo6 parallel elements working in lo-'* seconds. This is equivalent to lo'* OPS. For visible light IOOOhv 3 x lo-'' J. So we need at least 300 W optical power. Therefore, optical logic gates for very large systems would require very large optical powers.
-
T
I I
I,
I,+AI
I
I
I,+2AI
FIG.5. Transmission (7') versus intensity ( I ) curve for an optical gate. To implement the AND operation, we might bias the gate with intensity I,. If both inputs A and B a r e present and of intensity A/, the high transmission situation occurs. Otherwise, low transmission is found.
166
MIR MOJTABA MlRSALEHl
3.2.2 Methods A nonlinear material with a transmission T versus intensity 1 curve such as shown in Fig. 5 can be used to implement an AND gate. Physically, the AND gate might be something like the one shown in Fig. 6. Obviously, setting the bias at lB + A1 makes this an OR gate. Making NAND and NOR gates can be accomplished by (a) using the complementary response curve available in nonabsorbing switches or (b) working with the complementary 2 and B. The most commonly suggested optical switches are bistable optical devices. Figure 7 shows a schematic T versus 1 curve for such a device. These are popular for two reasons. First, the transitions can be quite sharp and use fairly low switching energy. Second, the bistability is a kind of memory or latch. The transmission can stay high even after the switching power has been removed. Bistability is a runaway effect which occurs under some conditions of strong positive feedback. In an etalon or Fabry-Perot interferometer, we can achieve a very narrow spectral transmission versus optical thickness (spacing times index of refraction) curve such as shown in Fig. 8. If the incident frequency is barely outside the high transmission band and if the etalon material has index
FIG.6. A schematic optical AND gate, where A and Bare the inputs and leis the intensity of the bias beam. When b0t.h A and B carry energy, the gate transmits all three beams. The transmitted bias beam can become the input t o the next gate.
T
I Fic;. 7. Schematic optical bistahledevice. The arrows indicate directions of transmission. The device is called bistable because there are intensities such as I , for which two stable outputs exist.
T
1 I I I I I I
I
I
1
I
Y
FIG. 8 . Transmission, T, versus incident frequency, \I, for an etalon with high reflectivity and an optical path length of 1. The distance between two successive peaks is c/21, where (' is the speed of light.
168
MIR MOJTABA MlRSALEHl
of refraction which depends on the intensity, the incident intensity changes the index so the curve shifts horizontally causing higher transmission and higher in-etalon intensity. This, in turn, causes a greater index change which causes higher transmission, etc. In retrospect it is obvious that the transition can be very slow near the transition intensity. This phenomenon is called “critical slowing down.” Therefore, fast switching-up requires operating well above the threshold, while fast switching-down needs operating well below the threshold. Research in bistable optical devices has two primary thrusts. First, “better” materials and devices are sought, where “better” means all of the obvious things: larger fan-out, higher speed, lower energy, higher transmission, flatter saturation, more stability and control, etc. Second, useful arrays of these devices are needed. This is almost independent of the actual choice of material. Fabrication methods are the focus, and the important criteria are cost, uniformity, high yield, conformity to single-device quality, and robustness. These areas are being studied, and great progress is being made. Several recent reviews (Gibbs, 1985; Szu, 1987)provide detailed information on this research. Optical bistable devices do not exhaust the possible approaches to optical logic. Other actively studied approaches include look-up tables (Guest and Gaylord, 1980),shadow casting (Ichioka and Tanida, 1984),and the “operator method” (Elion and Morozov, 1984).
3.2.3 Assessment All researchers in optical computing agree that 2-D arrays of optical logic elements with adequate parameters (cost, energy, uniformity, yield, saturation, fan-out, speed, etc.) are needed. While these goals are achievable, much serious and expensive research and development remains before such arrays can form the heart of digital optical computers which can displace their electronic ancestors. Some researchers have started to experiment with ways to use optical bistable arrays (Gibbs, 1985; Peyghambarian and Gibbs, 1985; Smith P C ul., 1985 and 1987). The details of these methods are beyond the scope of this introductory review. 3.3
Input/Output Devices
Ironically, optical computers may involve more electronics than optics. The reasons are primarily as follows: 1. Electronics is already established. The world runs on electronics.
Whatever optics we create must receive information from, and deliver information to, electronics. 2. Because light particles (photons) do not affect each other, we need electronic devices such as detectors, amplifiers, and modulators to act on light.
OPTICAL AND OPTOELECTRONIC COMPUTING
169
The former circumstance may, but probably will not, pass away. The latter circumstance (the need for electronics to operate on light) is lasting. Optical computing will always require electronics, although that electronics may be hidden, e.g., as photorefractive crystals or photoconductors. All of this is an introduction to the concept that both input and output will be electronic in the sense just described. The input/output (I/O) devices, not the optics, often limit the characteristics of the system in speed, power, and size. To take full advantage of optical parallelism, we would like to have parallel I/O. This is possible in input if we address optically from a live scene. Of course, parallel addressing buys no speed advantage if the image used in addressing must itself be constructed serially. The name applied to parallel read (but possibly not parallel written) input systems is spatial light modulator (SLM). Some of the parameters of SLMs are 0
0
0 0 0 0 0
0 0 0 0 0 0 0 0
0 0
input or addressing method (optical or electronic), dimensionality (1-D or 2-D), continuous or discrete in modulation pattern, continuous or discrete in time, property modulated (intensity, phase, polarization), gray levels (binary, multilevel, continuous), operating curve (linear, bistable, sigmoid, etc.), pixel count or space-bandwidth product, resolution or smallest element, speed of a full cycle, storage (if any), optical quality (especially flatness), modulation efficiency, power or energy consumption, cost, uniformity, yield.
There are dozens of commercially available SLMs and far more in development (Fisher and Lee, 1987).The primary 2-D addressing mechanisms are charged coupled device (CCD), electrical cross wire, cathode ray tube (CRT), and optical (through photoconducting layers or photocathodes). The primary modulation methods are electro-optics (primarily Pockels effect), magneto-optics, physical motion (piston or cantilever), and phasetransition absorption or reflection change. Some SLMs offer extra features which operate on an electron image. These include gain, polarity reversal, thresholding, and translation.
170
MIR MOJTABA MlRSALEHl
Optically addressed 2-D SLMs are particularly valuable because they can be addressed in parallel from many patterns on a page-oriented holographic memory (Elion and Morozov, 1984). The 1-D SLMs tend to use either bulk acousto-optics or surface acoustic waves. The input temporal signal modulates a carrier or drive frequency which produces a modulated diffraction grating that moves across the device at the speed of sound. In contrast with the vast array of input devices there are essentially no parallel output devices. The main reason is that electronics does not need such arrays. For most electronic imaging, CCD arrays are adequate. The sole exception may be military imaging. The cost and complexity of parallel output may be extreme. At this point, optical computer workers have arrived at a way around this problem. We can let the output light pattern strike the input to an optically addressed SLM to allow looping, cascading, etc. Thus, parallel operations can be performed on parallel outputs of intermediate stages. Eventually, however, an output optical pattern has to be converted to an electronic pattern. In some optical computers this step is rate limiting. In other cases, the parallel computation reduces the output data rate to such a level that the optics becomes rate limiting. 3.4 3.4.1
Interconnections
Motivation
To carry signals from one location to another we usually use either wires or their equivalents in integrated circuits. Optics provides two alternative ways to carry signals: fiber optics or their equivalent integrated optical waveguides, and “free space” propagation. There are circumstances in which each is appropriate. Fiber optics is particularly useful for long-distance communication because it is light-weight, low-loss, and inexpensive. In principle very high bandwidth (terahertz) can be achieved. In practice tens of gigahertz are achieved now. Furthermore, once information is in the optical domain it makes sense to keep it in that form and avoid the time, power, and expense losses of conversion. Applications of fiber optics range from undersea telephone lines to local area networks and optical computers. Free-space optical interconnection is the generally accepted term for interconnection over paths not fully defined and constrained by “hardwired” paths. These interconnections can occur in free space but also in glass blocks and two-dimensional waveguides. It is here that the immense power of optics becomes evident. Two properties of free space interconnection are the keys. First, in some but not all imaging configurations all rays traveling from one
OPTICAL AND OPTOELECTRONIC COMPUTING
171
focal plane to another take equal times to do so (to within a few femtoseconds). This makes imaging and some quasi-imaging (image at infinity but intercepted at a finite distance) optical systems ideal for clock distribution. In many optical systems, clock skew can be made negligible. Second, unlike electrons, two beams of photons can cross without affecting the paths of each other. However, in the region where both beams are present, the electromagnetic field depends on the fields of the two beams. If they are mutually coherent, the total field will be the vectorial sum of the two fields. As a result, a pattern with dark and bright regions will be obtained. This phenomenon is called interference. When interference is used to account for light propagation, it is called diffraction. Once the various beams are separated, however, they bear no trace of having interacted with each other. Physicists would summarize by noting that electrons are Fermions while photons are Bosons. This accounts for the fact that closely spaced electrical carriers interact to greatly slow propagation, among other things, whereas adjacent light paths do not slow propagation. These permit two general observations which account for the greatest strengths and the greatest weaknesses of optical computing. The greatest strengths come from massive interconnections. We will show in Section 3.4.3 that the noninteraction allows optics to connect each of N 2 inputs to each of N 2 outputs with a uniquely weighted interconnect attenuation in parallel. The greatest weaknesses of optical computing also arise from the noninteracting nature of photons. We cannot operate on light with light. We must operate on light through physical media whose electronic properties can be controlled. Thus, the irony of optical computing is that it must also involve electronics. 3.4.2 Fiber-optic Interconnect Fiber-optic interconnects among computers is well developed and beyond the scope of this review. Some efforts have been made in fiber-optic interconnections within optical computers. They function as wires but they allow arbitrary rerouting of signals among optical sources, modulators, and detectors. The fiber optics is useful because optical components offer advantages over electronics and we want to avoid optics-to-electronics and electronics-to-optics conversions. Fiber optics also provides convenient, highaccuracy time delays which may be used for processing temporal signals at very high speed (Jackson and Shaw, 1987). One of the first optical computers developed was the Tse Computer (Schaefer and Strong, 1977)in which the technologies of fiber optics, thin films, and semiconductors were utilized to perform logic operations on binary images, Another approach pursued by Arrathoon and Kozaitis (1987) is to use fiber optics for realizing optical programmable logic arrays. Moslehi er al.
172
MIR MOJTABA MlRSALEHl
(1984) have investigated the application of fiber optics for performing various high-speed time-domain and frequency-domain functions such as matrix operations and frequency filtering. Goutzoulis et al. (1988) have developed a digital optical processor that utilizes fiber optics and laser diodes to perform residue arithmetic.
3.4.3 Free-Space lnterconnecf Patterned Interconnect. The most common interconnect, other than imaging, is Fourier transformation. As will be described later, Fourier transformation may be desirable in itself. In optical interconnects, masks in the Fourier plane can be used for high-quality, high-resolution generation of uniform multiple images from single-input images (Caulfield and Lu, 1972).By inserting gratings in both planes, we can perform coordinate transformations (Bryngdahl, 1974). Imaging. An ordinary reflective or dielectric lens produces uniform time paths between points in conjugate image planes. This is useful for clock synchronization at very low clock skew. If, on the other hand, we use "flat" components, such as holograms and Fresnel lenses, some clock skew, usually on the order of picoseconds, is introduced (Shamir, 1987). The main use of imaging, however, is simply transfer of information. In the process of imaging we can insert devices, such as Dove prisms, for image rotation, or choose configurations for magnification or demagnification. In some respects, analog rotations and magnifications are superior to their digital counterparts. A simple example illustrates this point. Suppose we want to rotate an image clockwise by 12" and then by 11". If we perform these operations digitally and then rotate the resultant image counterclockwise by 23", the final image will not be identical to the original image. This is due to the fact that digital rotation or magnification involves interpolation and extrapolation for resampling and inherently introduces errors. This is not necessarily true for analog rotation and magnification. Spatial Light Modulator Interconnect. The parallel matrix-vector multiplier (Goodman et al., 1978) and the parallel (2-D) matrix-(4-D) matrix multiplier (Caulfield, 1987) have the property of full interconnection. The matrix-vector multiplier connects N inputs to N outputs through N 2 interconnections. The matrix-matrix multiplier connects N 2 inputs to N 2 outputs through N4 interconnections. The matrix-vector multiplier uses a spatial light modulator (SLM) for controlling the N 2 interconnection strengths. The matrix-matrix multiplier uses an SLM for the N 2 inputs. In both cases, N can approach 1000 in principle and N = 100 is straightforward. By making only N interconnects
OPTICAL AND OPTOELECTRONIC COMPUTING
173
in the matrix-vector case or only N * interconnects in the matrix-matrix case, we create an optical crossbar. I t is difficult to imagine a crossbar between lo6 inputs and 10' outputs made in any other way. The matrix-vector interconnect made through an N x N SLM is easy to reprogram by reprogramming the SLM. If the SLM is optically addressable, one method is to store the interconnection patterns on side-by-side microholograms and then apply any of these patterns by deflecting a laser beam to the proper hologram. This use of page-oriented holographic memories to reprogram optical free-space interconnects appears to be very powerful (Szu and Caulfield, 1987; Caulfield, 1987). Holographic Interconnects. Holograms have a number of highly attractive properties for interconnection: 1. A large fraction of incident light can be put where we want it. Holograms rearrange light but do not (directly) throw it away. If light needs to go to a few, selected, arbitrary points, lines, or regions, holograms can do it with more light efficiency than any other method. 2. Holograms are thin (usually 5 pm-10 pm) and conformable to available substrate shapes. In many cases, holograms take essentially no space, since they can adhere to optics which is present anyway. 3. Holograms are easy to copy and are therefore inexpensive. For example, credit-card holograms and Universal Product Code scanners are possible only for this reason. 4. Holograms can produce fast changing results. Page-oriented holograms (side-by-side holograms on a single substrate) can be changed randomly among 1024 x 1024 (essentially a million) pages using a digital light deflector. Such a deflector for one million spots has been operated in submicrosecond random access time (Schmidt, 1973). It is also possible to multiplex holograms with different angles or wavelengths. 5. Holograms can superimpose wavefronts (light patterns) from many separate locations into a single region of space. These wavefronts can then be separated spatially such that they bear no memory of having overlapped. This is the effect noted in Section 3.4.1.
The application of holograms for interconnections was first proposed by Goodman Pt ul. ( 1 984). They specifically proposed the use of holograms for distributing the clock signals in VLSl systems. Several groups are now pursuing this dircction (Clymer and Goodman, 1986; Bergman et al., 1986; Feldman and Guest, 1987). Another application of holographic interconnection is to perform digital operations. Optical Fredkin gates (Shamir et ul., 1986; Shamir and Caulfield,
174
MIR MOJTABA MlRSALEHl
R
H
SLM
D
FIG.9. Basic configuration for an N 4 interconnection network. H is a hologram array illuminated by reconstruction beam R, SLM is a spatial light modulator between two lenses L, and L, with their respective focal lengths ,/,and , f 2 , and D is a detector array or an array of nonlinear optical devices.
1987)in conjunction with page-oriented holographic memories can be used to realize digital processors. This architecture is especially suitable for implementing residue arithmetic (Mirsalehi et a!., 1987). Perhaps the most promising application of holographic interconnection is in realizing neural network systems. Psaltis et al. (1988) have investigated the application of photorefractive crystals for implementing adaptive neural networks. The storage capacity of volume holograms that can be recorded in these crystals provides huge optical interconnection networks that are well beyond the capacities of other technologies. To show the capability of holographical interconnections, we compare the optical and electronic implementations of an interconnection. Let us assume that we want to connect each of lo6 input points to each of lo6 output points with selectable interconnect strengths in parallel. First, consider the electronic implementation. Suppose we use ultimately small, submicron carriers. We might imagine these being spaced only 1 pm apart (although this is doubtful). The required 10l2 carriers would occupy a 1 m x 1 m cross section. Let us suppose (as is also doubtful) that we could “scramble” these wires to achieve the desired interconnection pattern. Let us also suppose (again very doubtful) that we can somehow adjust the interconnect strengths without space consumption. Then the remaining problem is to connect each of lo6 inputs to the proper 10‘ wires and to connect each of the other ends of the 10’’ wires to the proper 10‘ terminations. Making 1000 connections per second, it would take more than 30 years to connect all the wires. These arguments show that
OPTICAL AND OPTOELECTRONIC COMPUTING
175
the electronic implementation would be impractical. We turn next to the optical implementation (Caulfield, 1987). We use an array of small, for example, 1 mm holograms which can be illuminated simultaneously (Fig. 9). The hologram in the (i, j ) position in that array produces light of intensity T j k ) at the ( k , I ) position of a spatial light modulator (SLM). The transmission of the ( k , I ) pixel is akl. Therefore, the intensity of the light from the ( i , , j ) hologram after passing the ( k , I ) SLM pixel is Tjklukl.This is imaged on the detector. The intensity at the ( i , j ) output position is N
(3.2) Since i, j , k , and I can range from 1 to N , we can have N4 different Tjklvalues applied in parallel. The optical implementation is fairly straightforward for N = 100 and quite hard for N = 1000. Thus we regard N = 1000 parallel interconnections) as a rough natural limit.
4.
Analog Processors
Optical systems are analog in nature because of their capabilities of operating on continuous signals. Optical imaging systems are a clear example of this property. In this section we introduce some of the optical analog processors. We start with the building block of the optical processors, which is the Fourier transform processor. Then we introduce the optical correlator, spatial filtering processors, image processors, and nonlinear processors. Finally, we describe the applications of analog processors in syntheticaperture radar and spectrum analysis. 4.1
Fourier Transform Processors
Optical systems are linear because of the linear nature of Maxwell’s equations which govern the propagation of light. One of the powerful tools in analyzing linear systems is Fourier analysis. The use of Fourier analysis as an analytical tool in image formation theory was introduced by Duffieux (1946). Treating a general imaging system as a linear filter, Duffieux found a Fourier transform relationship between the field distributions at the object plane and the focal plane. Marechal and Croce (1953), Elias el a / .( I 952), Elias (1953), and Rhodes (1953) were among the first authors to describe the analogy between image formation in an optical system and a communication network. Both are vehicles for carrying, transferring, or transforming information (spatial in the optical system, and temporal in the communication system). In this section we introduce the capability of optical systems to perform Fourier transformation.
176
MIR MOJTABA MlRSALEHl
One of the remarkable properties of spherical lenses is their ability to perform real-time Fourier transformation on two-dimensional signals. In the system shown in Fig. 10 an input transparency is illuminated by a normally incident monochromatic plane wave. The transparency has a transmittance that represents the input function y(x, y ) and it is placed at the front focal plane of a spherical lens. We show below that the field distribution on a screen placed at the back focal plane is the Fourier transform of y(x, y). To derive this property we start with the Fresnel diffraction integral, which describes the field distribution at the (t,q, z ) plane in terms of the field at the (x, Y , 0) plane.
(4.1)
where k = 2 4 1 , is the wave number, A is the wavelength of the illuminating light, and j = The Fresnel diffraction integral is the governing relationship for the propagation of an electromagnetic field in free space. To determine the field distribution at plane P2,we consider the propagation of light from plane PI to the lens, then through the lens, and finally from the lens to plane Pz.
J-1.
PI
I-
p2
f
f
4
FIG. 10. A Fourier transform system, where 1is the focal length of the lens, P, is the input plane, and P, is the Fourier plane.
OPTICAL AND OPTOELECTRONIC COMPUTING
177
-)I
U3C P 7
4 ) = U2(
p3
(4.3)
q )t ( p . 419
where r(p, 4) is the lens transmittance function, i.e.,
and P( p. q ) is the pupil function of the lens, i.e., P ( p . Y) =
1; 0;
( p , 4) inside the lens aperture otherwise,
and .f‘ is the focal length of the lens. We have I,
Consider the case where P ( p , 4 ) = 1 everywhere, which is the case of large lens aperture or consider only the region of the function close to the optical axis. Then Eq. (4.5) reduces to I
178
MIR MOJTABA MlRSALEHl
Completing the square terms and using the Fresnel integral reduces Eq. (4.6) to
-m
The integral in Eq. (4.7) is the Fourier transform of g(x, y), and the factor in front of it is a complex constant that depends on the wavelength of the light and the focal length of the lens. Equation (4.7) can be rewritten in a more familiar Fourier transform notation as m
U 3 ( J x 7f y ) = - j
lf
[ [ g ( x , y)exp [ -j2n(xfx
+ yf,)] dxdy,
(4.8)
- a,
where f x and 1; are the spatial frequency coordinates in the Fourier transform plane and are given by f x = t / 2 f and f , = q / A f . Equation (4.8) can be written as (4.9)
where 9 represents the Fourier Transform operator. The field distribution U 3 ( f x ,f,)is also called the optical transform, since it is not exactly equal to the Fourier transform. The system in Fig. 10 is capable of Fourier transforming two-dimensional signals. The inverse Fourier transform operation can be achieved easily with this system by inverting the axes 5 and q. In the Fourier transform plane (plane P2 in Fig. 10) the light intensity distribution is proportional to the power spectrum of the signal g(x, y), i.e.,
4.2 Correlators
Correlation is a key operation in data processing. The crosscorrelation of functions .f’ and h is defined as g(x, Y ) = f ( x , Y ) * h(x, Y ) =
jj.
f ( 5 , rl)h*(5 - x , rl - Y)d
-a,
(4.1 1)
OPTICAL AND OPTOELECTRONIC COMPUTING
179
*
where denotes the correlation operation, and superscript * represents complex conjugation. Taking the Fourier transform of both sides of Eq. (4.1 I), then G(.f,,
f,,= F(f,,
f,)H*(fX, f*)?
(4.12)
where G, F, and H are the Fourier transforms of g, ,f, and h, respectively. The crosscorrelation operation can be implemented optically using the Fourier transform processor. One method of realizing the correlation operation is to use the optical spatial filtering system shown in Fig. 11. The function f ( x , y) is inserted as a transparency in plane PI.Lens L, Fourier transforms f ( x , y) and the resulting transform F(f,, fy) will be obtained at plane P,. If a transparency representing H * ( f , , f,)is placed at plane P2. then the field distribution which emerges from P, will be F(f’, .f,)H*(.f,, f,). Lens L, takes the inverse Fourier transform of the field emerging from plane Pz. The field distribution across plane P, is given by
~ - ‘ { wJ,)H*(fx, i? 1;))= .f(KY) * h(x, YX
(4.13)
which is the crosscorrelation of the functions f ( x , y) and h ( x , y).
4.3 Spatial Filters Spatial filtering is an operation in which the spatial frequency components are manipulated in order to alter the properties of the image. Spatial filtering has been known for over 100 years as a result of the work by Abbe (1 873) and
FIG.1 I . Configuration of a 4-1 spatial filtering system, where PI is the input plane, f, is the Fourier plane, P3 is the outpul plane, j ; is the focal length of lens I,,, and f is the focal length of lenses L, and L , .
180
MIR MOJTABA MlRSALEHl
Porter (1906). Ernst Abbe worked on the theory of image formation to improve the quality of microscope imaging. One of Abbe’s contributions to the field of spatial filtering was his observation that the imaging of fine details in an object was directly affected by the angular acceptance of the objective lens. This result eventually led to the formulation of the imaging process as a diffraction phenomenon and to the description of the imaging system as a filter for the spatial frequencies of the object. The experimental results published by Porter (1 906) confirmed Abbe’s theory. In the early 1950s the similarities between optical and communication systems were found. This led to the use of the developed communication techniques for analyzing and synthesizing optical systems. The essential idea behind optical system synthesis is to accomplish a specific transformation between the object and the image distribution by manipulating the light distribution in the Fourier transform plane of the system. Marechal and Croce (1953) used a system similar to the one shown in Fig. 11 which is now the classic experimental arrangement for spatial filtering. Spatial filters can be classified in the following categories: simple filters, grating filters, complex spatial filters, and computer generated spatial filters (Lee, 1981). We describe two of these filters in some detail here.
4.3.1 Simple Filters
This class of filters is simple to construct, and they modify either the amplitude or the phase of the object spectra. Examples of the first type are low-pass, high-pass and band-pass filters, which block part of the spectra (Porter, 1906). A n example of the second type is a simple phase filter which retards the zero frequency component of the object spectrum by 90” (Zernike, 1935). A high-pass filter which blocks only the zero spatial frequency eliminates the average transmittance of the image. A low-pass filter in one dimension, such as a horizontal slit, transmits all the vertical lines of a mesh and eliminates all the horizontal lines (Goodman, 1968).
4.3.2
Complex Spatial Filters
Simple filters introduced in the previous section can implement transfer functions that are pure real or imaginary functions. Implementing a filter with a complex transfer function or a complex “impulse response” is not an easy task. The impulse response of a filter is the inverse Fourier transform of its transfer function. Vander Lugt (1964) introduced a novel technique to record
181
OPTICAL AND OPTOELECTRONIC COMPUTING
complex spatial filters, which has revolutionized the field of optical signal processing. Vander Lugt’s filter, as it is referred to now, is a Fourier transform hologram and is capable of constructing spatial filters with arbitrary transfer functions. This filter is an optical matched filter which can detect a signal in a noisy background. The system shown in Fig. 12 can be used to record the Vander Lugt filter. The desired impulse response of the filter, h ( x , y), is placed in the front focal plane of lens L, and is illuminated by a plane wave. The light distribution in the back focal plane of L, will then be the Fourier transform of h ( x , y ) , i.e., H ( x 2 , y2). In the back focal plane of L , , a plane wave tilted with an angle 0, R ( x 2 ,y z ) interferes with H ( x , , y 2 ) .The complex field distribution in plane P2 is then
where (4.15)
R ( x 2 , y 2 ) = R , exp( -j2naxz).
R , is the amplitude of the plane wave and a is the carrier spatial frequency given by a = sin ()/A.From Eqs. (4.14)and (4.15), the total optical intensity in
f x*
I-
f
FIG. 12. Recording process for a Vander Lugt lilter. P, plane. and / I S the focal length of lens L , .
f IS the
1 input plane, Pz is the Fourier
182
MIR MOJTABA MlRSALEHl
the P2 plane can be obtained as f(x2, Y 2 ) = I W X Z ? Y d 2 =
R i + IH(x2,y2)I2 + R O W 2 y, z ) e x p ( + j 2 n c 4
+ RoH*(x2,y2)exp(-j2nax2).
(4.16)
If the coherent transfer function H(x,, y2) is written as H(-xz,y2) = IH(x,, y2)I exp [ j 4 ( X z 9
~2)1,
(4.17)
then the expression for the optical intensity becomes [(xz, y2) = R i
+ IH(x2, y2)l2 + 2RoIH(x2, yz)l cos[2nax2 + 4(x, Y)]. (4.18)
This expression shows that the phase &x2, y2) of the transfer function is encoded as the modulation of the spatial carrier. If a high-resolution photographic film is placed at plane P z , the amplitude transmittance of the developed film would be proportional to the intensity, i.e., T(X2,YJ a
(4.19)
~(X23YZ).
If the filter described by Eq. (4.19) is now placed at the filter plane of the setup shown in Fig. 11, the resulting system acts as a processor that has a transfer function proportional to Eq. (4.16).If a function f(x,, y l ) is placed at the input plane of Fig. 11, the output of the system will be Y(."c>
Y3) = F k ( F ( X 2 9
Y2)T(X29 y2))
+ f(x3, Y 3 ) * 0
3 9 Y3)
Rif(X3, Y3)
* h*(-x3,
-Y3)
+ R O f ' ( X 3 . y3) * Mx3, y3) * &x3 + aAf,y J + R0f(x3, y3) * A*(-,, -y3) * 6(x3 - w, y 3 ) . (4.20) where S(x, y) is the Dirac delta function, and * represents the convolution operation. The first two terms of Eq. (4.20) are centered at the origin of the output plane. The third term is the convolution of the input function f(x3, y3) with the desired impulse response It(x3, y3)and is centered at ( - df,0).The fourth term is the crosscorrelation of f(x3, y3) with h(x3, y 3 ) and is centered at Wf, 0). 4.4
Image Processors
Image processing is an operation on a two-dimensional input image to produce a two-dimensional output image which is of enhanced utility for some subsequent process such as display to a human viewer, feature extraction,
OPTICAL AND OPTOELECTRONIC COMPUTING
183
recognition, and defect detection. There are a large number of analog optical operations that are simple, parallel, and (in many senses) superior to their electronic counterparts. These include scale change and rotation. The imageprocessing activities worthy of being called “optical computing” are of a more complex nature than these. We will limit our remarks to two broad categories: image content transformation and image shape transformation.
4.4.1
Image Content Transformation
The vast majority of these transformations are based on two successive optical Fourier transformations (Fig. 13).The first transformation produces a spatial display of the Fourier transform of the input image. That is, if the electric-field distribution in the input plane is f ( x , y ) , the Fourier-transform plane will contain a pattern
F(u, ti)
= S {f ( X ,
y)j,
(4.21)
where .Sis the Fourier transform operator. The Fourier transform lens is not infinitely large, so the integral has finite limits. Furthermore real lenses produce field curvature, distortion, and other unpleasantries which make the Fourier description inexact. The second Fourier system transforms F(u, u ) to get an output
O((,‘I)
=
~ ; F ( UL ] ) ,}
=
. Y ( . Y { ~ (yY) }, ) = f ( - ~ , -y).
(4.22)
FIG. 13. Schematic diagram of an optical system for image content transformation. L, and L z are spherical lenses of focal length /’. P, is the input plane, P, is the Fourier plane, and P3 is the output plane.
184
MIR MOJTABA MlRSALEHl
Therefore, neglecting the coordinate reversal (( = - x, 4 = -y), this is simply an imaging system. The important thing is that at the intermediate (u - u) plane, we have physical access to the Fourier transform. By blocking the low-frequency (small u and u) portions, we remove everything but the fine details of the image. Thus we enhance edges. By blocking the small u region, we can favor edges in f ( x , y) normal to u. By rotating such a slit or wedge in the u - u plane, we can examine various edge directions in sequence. By blocking the high-frequency parts, we obtain lowresolution “blobs” which still show the spatial distribution of energy. By doing various blockings in parallel on multiple Fourier transforms, we can do parallel pyramidal processing (Eichmann, et ul., 1988). Of course complicated u - u masks can be made. For example, a hologram can be used as a mask that not only selectively attenuates the amplitude of light but also selectively phase shifts. Perhaps the most famous mask is F*(u, u), where the superscript * indicates complex conjugation. This is the “matched filter” described in Sec. 4.3.2. We can generalize this as follows: (4.23) Of course F,(u, u ) is a matched filter. For pattern recognition we often prefer the “phase-only matched filter” F,(u, u), where IF,(u, u ) ] = 1 for all values of u and u (Horner and Gianino, 1984). For deblurring we want (4.24) Suppose an ideal image g(x, y) has been blurred in some space-invariant manner (motion, defocus, etc.) in such a way that each point in g(x, y) has been replaced coherently with an impulse response f ( x , y). The resultant blurred image is b(x, Y ) = d x , Y)
* f ( x , YX
(4.25)
where * represents the convolution operation. Taking the Fourier transform of both sides, we have B(u, u )
= G(u, u)F(u, 4,
(4.26)
where B,C, and F are the Fourier transforms of b, g, and f , respectively. If we could realize a mask that has a transmittance function of l/F(u, u), we could use it to deblur the image. This is done by putting the above mask in the Fourier plane to obtain (4.27)
OPTICAL AND OPTOELECTRONIC COMPUTING
185
and then using a spherical lens to take the Fourier transform of the result. There are subtle noise problems with this, but some limited success is possible as shown in Fig. 14 (Javidi and Horner, 1989). Another technique for image processing is to use morphological operations (Serra, 1982; Maragos, 1987),such as dilation and erosion, on binary images as well as median filtering (Fitch ct a/., 1984; Ochoa et d.,1987; Goodman and Rhodes, 1988). These operations can be recognized as special cases of the simple cellular-array processors mentioned earlier. The same processors can perform logical operations, such as AND, OR, and NOR, on two images.
Fici. 14. Example of a deblurring process: (a) object. (b) blurred image, (c) image obtained from (b) by optical processing (Javidi and Horner. 1989).
186
MIR MOJTABA MlRSALEHl
If our task is to control the amplitude of the output as a function of the amplitude of the input for all points in the image, an interesting method has arisen. The idea is to multiply the input image at high frequency with an array of carefully specified “dots” or lines, detect that “half toned” product on a hard clipping detector, Fourier transform the resulting image, mask out all but the desired regions of the Fourier transform, and retransform to an image. Operations such as logarithmic transformation, exponentiation, and intensity band selection, can be readily implemented by this method (Kato and Goodman, 1975; Strand, 1975). 4.4.2
Shape Transformation
It is possible to change the shape of an image while leaving all other properties unchanged by a one-to-one mapping between the input and output planes. We might transform f ( x , y) to y(u, u ) by letting u = {(x, y) and t ) = q ( x , y), where 5 and q are functions of x and y. For example, we might set u = (x2 + y 2 ) ’ I 2 and u = tan-’ (ylx). The functions need not be of any simple form, since some optical methods amount to look-up tables going from (.L y) to (u, 0). One powerful method is due to Bryngdahl(l974). In essence, space-variant gratings or computer-generated holograms were inserted in both the object and Fourier transform planes. Quite useful transformations, such as conformal mappings, are obtainable. The most general approach is to let each (x, y) pixel be its own hologram which, with or without a lens, maps that (x, y) into the chosen (u, u). This method is due to Case et al. (1981) and Case and Haugen (1982). They used this to map one picture into another, to map a Gaussian profile into a uniform profile, and to do geometric transformations. 4.5
Nonlinear Processors
Nonlinear mathematics is far more difficult to perform than linear mathematics, and is therefore an attractive target for optical computing. As it is possible to perform all nonlinear operations on a digital electronic computer, it follows that it is possible to perform them on a digital optical computer. Indeed, the logical operations are themselves highly nonlinear. There are approaches to nonlinear operations that perform special operations very rapidly. Most of these involve performing a series of linear operations and then operating on the results on a pixel-by-pixel basis with a threshold or another nonlinear operator. The variety of operations that are attainable in this way is very rich. I t includes applications to artificial intelligence (Eichmann and Caulfield, 1989, Boolean matrix operations (Caulfield, 1989, optical cellular-array processing (Tanida and Ichioka, 1985a, 1985b), and, perhaps most important, optical neural networks (Farhat et ul., 1985).
OPTICAL AND OPTOELECTRONIC COMPUTING
187
What the Japanese call the sixth-generation computer is usually called a neural network or an artificial neural system in the United States. The essence of the neural network is the use of a large number of very simple processors (neurons). Each neuron receives signals of various signs and strengths from many others and transmits signals of various signs and strengths to many others. The neural network usually has multiple stages of operations. The neuron responds nonlinearly to the algebraic sum of the incident signal. It is possible to show (Abu-Mostafa and Psaltis, 1987) that there is a variety of problems for which classical digital computers are quite ill suited and for which neural networks are ideally suited. This is probably the reason humans, who use neural networks, are not as good at arithmetic as digital computers but are substantially better at problems such as pattern recognition and judgment making. True artificial intelligence will almost certainly require the construction of computers that have complexity approaching that of the human brain. We are a long way from that. The brain may have roughly 10l2neurons. With optics it is possible to make up to 1OI2 parallel weighted interconnections between lo6 inputs and 10‘ outputs (Caulfield, 1987). This is far beyond what electronics can achieve. Thus optics defines a unique domain which is different from and, in some cases, superior to the earlier neural networks: brains and electronics. Figure 15 shows the uncontested domain of optics. Concerning optical neural networks, there are two questions: ( I ) does the world need what optics offers? and (2) will optical systems be accurate enough? We address these points here. The thought expressed here is that there is solid land (practical use) only in the two explored parts of the world (marked “brains” and “electronics” in Fig. 15). Clearly, there is explored land only where we have explored. To inspire exploration of the unknown areas marked “optics,” we must provide the hope that exploration will not be fruitless and that current tools will suffice for the exploration. The behavioral differences between electronic neural networks and brains are immense. We might expect these to reflect something of the complexity of the equipment used by each. In Fig. 15. a line connecting the centroids of the two known territories (electronics and brains) passes through the territory marked optics. Exploring this territory appears to be the next logical step. The fact that it is not clear what to do with so many neurons and interconnections is a cause for celebration not fear. The unexplored is the stuff of science. Programming such huge neural networks needs to be explored. One approach involves partitioning into many subnetworks (Farhat, 1987, 1988) in a very special way. Another involves casting classical training problems in a form that can be handled by optical computing-a solution looking for good problems. These are just two of many possible approaches. Concerning accuracy, most optical methods can be pushed to achieve 10%
188
MIR MOJTABA MlRSALEHl
I =N
t
/ /
UNDERCONNECTED REGION
/ /
8
/
4
0
/ /
2
4
6
8
/
I=N2
1 0 1 2 1 4
FIG.15. Rough comparison of three implementation technologies for neural networks in terms of number of neurons N and number of interconnections I . The dotted line I = N 2 corresponds to full interconnection. where each neuron in the input plane is connected to all neurons in the output plane. The dotted line I = N corresponds to the case where each neuron in the input plane is connected to one neuron in the output plane.
to 20% accuracy of signal, interconnect strength, threshold uniformity, etc. Most neural networks have been trained on 16- or 32-bit accurate simulators. The fact that the results from electronic simulators tend to fail when implemented with analog accuracy should not be surprising. A fruitful question to ask in this regard is “How much accuracy does a neural network need?’ It is hard to believe that brains operate at 16-bit accuracy. Figure 16 shows a simple one-layer network with the heart of the operation being the matrix-vector multiplication y = A x , and the nonlinear operation a sigmoidal function. It can be shown that (4.28)
where the subscript t indicates the true value and double bars represent the norm. The parameter x ( A ) is known as the condition number of matrix A and it is defined as x ( A ) = IlAll ~ ~ A - According ’ ~ ~ , to Eq. (4.28) the connection accuracy IldAII/IIAll multiplied by the condition number x ( A )> 1 of the
-
189
OPTICAL AND OPTOELECTRONIC COMPUTING
-
X-
-
X‘
-c
NL
-
A
Y‘ -
-
-
a
,
NL
Y
matrix limits the relative accuracy of the result vector y. Since x ( A ) > 1 and (16All is at least 0.1, the relative error in y will be greater than 10%. By simple scaling operations we can modify the given A without changing its mathematical meaning. These modifications often reduce x ( A ) to a value that gives a more accurate y. In a similar way, we can show (4.29) The net effect is that we always loose accuracy during the operation of an optical neural network, but if we reduce x ( A ) to the vicinity of 2 or less, this may not be a major concern. The second appproach is using some of the optics margin over electronics to tolerate error better. Takeda and Goodman (1986) hint at encoding the numbers spatially to gain accuracy. The point is not that the accuracy problem has been solved. Rather, the point is that it is explorable. We are limited by nerve and imagination, not by technology. 4.6
Applications
Analog optical processors are well developed and have found a wide range of applications. Here, we describe two of their most successful applications in synthetic-aperture radar and spectrum analysis. 4.6.1
Synthetic-Aperture Radar
The formation of high-resolution terrain maps using an airborne radar is of great interest. The “azimuthal” resolution of an airborne radar with an antenna of diameter D is roughly ).RID, where E, is the wavelength of the radar signal and R is the distance between the antenna and the ground. For a radar working in the microwave region, assuming i= 10 cm, D = 1 m, and R = 10 km, the resolution is about 1 km. To get a resolution of 10 cm using the same values of iand R as before, D should be 10 km, which is certainly impractical. The synthetic-aperture radar approach is a practical solution to
190
MIR MOJTABA MlRSALEHl
this problem. The idea is to synthesize the effect of a large antenna using the data obtained by a short antenna. In the synthetic-aperture radar, an aircraft carries the antenna to a sequence of positions, where the antenna radiates a pulse and then receives and stores the reflected signal. The reflected signal is recorded and processed to obtain a map of the terrain. The synthetic-aperture concept was first studied by Carl Wiley in the early 1950s (Sherwin et al., 1962). The major problem was that the amount of information needed to be stored and the number of computations that were required was so large that real-time analysis could not be achieved. In the early 1960s, the University of Michigan's Radar and Optics Laboratory group solved this problem by developing a coherent optical system for processing the synthetic-aperture radar data. Here, we briefly describe their approach. For more information, the reader is referred to the papers by Cutrona et al. (1960, 1961, 1966) and Brown and Porcello (1969). lmage Recording. Consider the side-looking radar system shown in Fig. 17. In order to obtain an accurate map of the terrain, a high resolution is required in both the direction of flight (azimuth), and the direction perpendicular to it (ground-range). The ground-range resolution depends on the duration of the pulse radiated by the antenna, and it can be improved by using short pulses. The azimuthal resolution, on the other hand, depends on the antenna pattern, and it can be improved by using the synthetic-aperture radar. To analyze the problem, let a target be located at ( x o , y o ) , a distance r from the airplane, and let the wave transmitted from the antenna be represented by UT(t)= A exp ( - j o t ) ,
(4.30)
where A is the amplitude and w is the angular frequency of the transmitted wave. The signal reflected from the target and received by the antenna is &(f)
= A' exp[ - j o ( t - 2f)],
(4.31)
where A' is a complex amplitude and c is the speed of light. In the radar system the received signal, UR,is first amplified to U k which has the same amplitude as UT,and then mixed with the transmitted signal. The mixed signal is squarelaw detected. The output of the detector l ( x ) is l(X)
1
=
2I UT(t) + Uk(t)12 (4.32)
OPTICAL AND OPTOELECTRONIC COMPUTING
191
Z
0
*Y
DIRECTION OF GROUND- RANGE
OF FLIGHT FIG.17. Geometry of the synthetic-aperture radar
where the asterisk indicates the complex conjugation. The above equation can be written as
+ cos(2kr)],
[(x) = ( A ( 2 [ l
where k
= w/c. The
r
(4.33)
distance r can be expressed as 2
=
J r i + ( x - xo)
2 rn
+ ( x - .Yo) 2ro
2
’
(4.34)
where r n is the distance from the antenna to a line that is parallel to the direction of flight and passes through the target. Substituting for r from Eq. (4.34) into (4.33).we obtain
(4.35) This expression is equivalent to the expression for a one-dimensional Fresnel zone plate with a focal length of r0/2 centered at x = xo (Jenkins and White,
192
MIR MOJTABA MlRSALEHl
1976). In very simple terms, a zone plate acts as a lens. That is, when it is illuminated with a parallel beam of light, it focuses the light to a point. In practice, there are other targets at distances other than r,, each reflecting the radar pulse at different times. At the receiver, the returned echo from the target is used to modulate the raster of a cathode ray tube (CRT) display. As soon as the radar pulse is emitted, the scanning raster on the CRT starts along a vertical line as shown in Fig. 18. The intensity of the raster is stepped up when the reflected signal is received by the antenna. The closer the target is to the airplane, the sooner the pulse returns, and the closer the bright point on the CRT will be to the point at which the raster has started. Signals from the targets at the same distance will appear on the same point on the CRT. The CRT screen is imaged on a running photographic film where the echo from equidistant targets is recorded as horizontal lines. The brightness of the display point on the CRT corresponds to I ( x )as given by Eq. (4.35).As a result, the amplitude and phase information of the reflected signal is recorded as Fresnel zone plates. Image Reconstruction. In reconstructing the image, a plane wave produced by a coherent light source is used to illuminate the developed film. Due to the Fresnel zone characteristic of the recorded pattern, the light transmitted through the film will produce an image of the terrain. The distance of each point on this image from the film is proportional to the distance of that point on the ground from the antenna. As a result, the image will be constructed on a slanted plane. To correct for this, a cylindrical lens should be placed behind each one-dimensional zone plate. These lenses can be combined to form a conical lens as shown in Fig. 19. The coherent light emerging from the filmconical lens combination is collimated in the x direction, but it has no focal property in they direction. The spherical lens in Fig. 19 focuses the beam in the
START OF SWEEP
0 LENS
INTENSITY MODULATION
I
END OF SWEEP BIAS
+
BIPOLAR VIDEO
Fici. 18. Recording process in synthetic-aperture radar (Brown and Porcello, 1969).
; @yx,
193
OPTICAL AND OPTOELECTRONIC COMPUTING
PI
SPHERICAL
p2
@
DATA F I L M
CYLINDRICAL LENS
CONICAL LENS FIG I9
x2
Optical procesor for synthetic-aperture radar analysis
x direction and the spherical-cylindrical lens combination images the beam in
the y direction. The reconstructed image of the terrain can be recorded on a photographic film or observed on a CRT screen. Very high resolution terrain maps have been obtained using this imaging radar system (Cutrona et al., 1966). 4.6.2 Spectrum Analysis
Thc heart of an optical spectrum analyzer is an acousto-optic Bragg cell which is made of a transparent material such as glass (Fig. 20). One end of the cell is attached to a transducer which receives an electric signal with a frequency usually in the M H z range and launches an acoustic wave inside the cell. The other end of the cell is tapered to prevent the reflection of the sound wave. The acoustic wave produces a mechanical strain which results in a periodic change in the index of refraction of the material. Such a periodic structure is known as grating, and it has special characteristics. An important property of the gratings is their capability of diffracting a beam of light at particular directions. It should be mentioned that since the speed of light is much higher than the speed of sound, the grating looks essentially stationary to the light wave. If a beam of light impinges on a grating at a particular angle (known as the Bragg angle), it will be diffracted. The Bragg angle (O,J is related to the wavelengths of the light wave (2) and the acoustic wave (A) by Be = sin- '(j42A). Another important parameter is the diffraction efficiency, which is the ratio of the power of the diffracted light to the power of the incident light. The diffraction efficiency depends on the parameters of the Bragg cell and the optical and acoustic waves. Acousto-optic cells have several applications (Berg and Lee, 1983).They are used for spectrum analysis (Anderson rt al., 1977),performing operations such
194
MIR MOJTABA MlRSALEHl
/ DIFFR, CTED
INPUT
LIGHT
\
UNDILZTED
t
w FIG.20. Acousto-optic Bragg cell, where (1) is the angular frequency of the electronic signal which is used to produce the acoustic wave.
as convolution and correlation (Rhodes, 198l), and algebraic processing (Casasent, 1984). Here, we describe their application in spectrum analysis. Figure 21 shows the schematic diagram of a spectrum analyzer. The device consists of a light source, a spatial filter, a Bragg cell, a Fourier transform lens, and a detector array. Because of its compactness, a diode laser is usually used as the light source. The light produced by the laser is collimated by the spatial filter and illuminates the Bragg cell. At the same time, an acoustic wave is launched in the cell by the transducer. As described above, part of the incident light will be diffractedat an angle that depends on the frequency of the acoustic wave. The diffracted light passes through a Fourier transform lens and is focused at the focal plane of the lens where a detector array is located. The position of the light on the detector array can be used to determine the frequency of the acoustic wave. The device described above can be fabricated in a compact form using integrated optics technology. Figure 22 shows the schematic diagram of an integrated-optical spectrum analyzer. Except for the diode laser, the other components of the device are fabricated on a common substrate. Among the materials that have been used for substrate are glass, lithium niobate, and gallium phosphate. To produce acoustic waves, surface acoustic wave (SAW)
195
OPTICAL AND OPTOELECTRONIC COMPUTING
LASER
SPATIAL FILTER
BRAGG CELL
FOURIER TRANSFORM LENS DETECTOR ARRAY
ZERO ORDER STOP
t
W
Fici. 21.
Schematic diagram of a spectrum analyzer.
PHOTO DE TECTOR
I
MIXER FILTER
Fi(t
12
Schematic diagram of an integrated-optical spectrum analyzer (Tsai, 1979)
196
MIR MOJTABA MlRSALEHl
transducers are used instead of the bulk Bragg cell. This type of transducer can be easily fabricated by depositing a thin film of a conducting material on the substrate. To analyze microwave signals, the heterodyne technique can be used. In this technique, an input signal with a frequency of v1 is mixed with a signal of a fixed frequency v2 produced by a local oscillator. The output of the mixer consists of two frequencies v1 + v2 and v1 - v 2 . The latter is separated by an intermediate frequency (IF) filter and it is used to drive the SAW device. Today, integrated-optical spectrum analyzers are manufactured by several companies (Goutzoulis and Abramovitz, 1988). These devices are very compact and powerful. They have bandwidths of several hundred MHz, frequency resolutions of few MHz, and can detect pulses with only 100 nsec length. 5.
Digital Processors
Although analog optical processors can perform computationally intensive operations, such as Fourier transformation and correlation, they suffer from two shortcomings: low accuracy and inflexibility. The accuracy of these processors is about 1% or less. To achieve higher accuracy, digital processors are needed. Also, analog optical processors are limited to specific operations. Digital processors, on the other hand, are more flexible and can implement any operation. These advantages motivates one to develop digital optical processors. Extensive interest in performing digital operations with optical signals started in the mid 1960s, when reliable lasers became available. The direction pursued at that time was to use nonlinear characteristics of materials to perform logic operations (Tippett et al., 1965). Although the possibility of performing logic operations with optics was demonstrated, the lack of materials with high nonlinear characteristics made the development of large scale systems impractical. Some critics claimed that digital optical computers will never succeed (Keyes and Armstrong, 1969). As a result, the research projects in this area were mostly abandoned. Starting in the mid 1970s, a second phase of research on digital optical processing was initiated and the interest in this area has increased in the 1980s. This new interest is partially due to the growing need for parallel processing and partially due to the development of new materials for optical bistability and, recently, new models for neural networks. Attention is made to develop architectures that take advantage of the parallelism of optics. Optical signals, unlike electronic signals in integrated circuits, are not restricted to planar structures. Numerous information channels can be distributed over a twodimensional array and propagate along the third dimension. Using 2-D arrays of spatial light modulators and detectors, the information in all channels can be processed and detected in parallel.
OPTICAL AND OPTOELECTRONIC COMPUTING
197
During the last two decades, a large number of digital optical processors have been proposed. However, because of technological problems, the proposed architectures have been demonstrated only as small prototype systems, and it is not clear yet which ones will be successful in implementing large and practical systems. As for any new field, there is not a standard classification of digital optical processors at this stage. Here, we categorize these systems according to three aspects: number representation, computing techniques, and architectures.
5.1
Number Representation
An important issue in digital processing is the number representation. Digital electronic systems generally use binary numbers. This is mainly because electronic bistability is easy to achieve. However, due to the carry propagation problem, binary numbers are not very suitable for parallel processing. Two number systems that do not suffer from this problem are residue and signed-digit. 5.1.1 Residue Number System (RNS) The foundation of residue arithmetic dates back to the first century A.D., when the Chinese mathematician Sun-Tsu published a verse in which he gave an algorithm for finding a number whose remainders on division by 3,5,and 7 are known. A general theory of remainders (now known as the Chinese remainder theorem) was established by the German mathematician K. F. Gauss in the nineteenth century. The application of residue arithmetic in computers, however, is relatively recent and was first introduced in the mid 1950s by Svoboda and Valach (1957) in Czechoslovakia. Unlike the commonly used binary and decimal number systems, the RNS is an unweighted system. The base of a residue system consists of n pairwise relatively prime (having no common factor) numbers, m , , m 2 , . . . , m,, called moduli. Any integer X can then be represented by an n-tuple (xl, x2,.. . , x,), where .xi = /XI,,,, (read X mod mi)is the positive remainder that is obtained from the division of X by mi.This representation is unique for a dynamic range of M = I mi. An important feature of the RNS is that the fixed-point arithmetic operations can be performed on each digit individually. That is, if X = (xI,.x2 ,..., x,) and Y = (y,, y2,..., y,) are two numbers of the same residue system, Z = X * Y = ( z l , z 2 ,..., z n ) , where zi = I(xi * pi)/,,,, for i = 1, 2 , . . ., t7, and * represents addition, subtraction, or multiplication. Division can be performed, but it is difficult except for the remainder-zero case (Szabo and Tanaka, 1967).
nl=
198
MIR MOJTABA MlRSALEHl
As an example, consider the set of four moduli (5, 7, 8,9}. These moduli cover a dynamic range of 2520. In this residue system, the decimal numbers X = 42 and Y = 3 1 are represented as X = (2,0, 2,6) and Y = (1, 3, 7,4). The results of performing addition, subtraction, and multiplication on these numbersareX+ Y = ( 3 , 3 , l , l ) , X - Y = ( 1 , 4 , 3 , 2 ) , a n d X . Y = ( 2 , 0 , 6 , 6 ) , which are the residue representations of the correct answers, i.e., 73, 11, and 1302, respectively. The first optical system that utilized residue arithmetic was the photoelectric number sieve invented by Lehmer (1933). The machine was constructed with 30 spur gears, one for each of the prime numbers from 2 to 113, and it was capable of sifting 20 million numbers per hour. The system was used for finding the factors of large numbers such as the Mersenne number 279 - 1. The application of residue arithmetic in modern optics was first investigated by Huang (1975) and Huang et al. (1979). They proposed the use of different devices, such as prisms, optical waveguides or fibers, and gratings, to implement optical residue processors. Other architectures were later proposed that utilized spatial light modulators (Collins, 1977), optical correlators (Psaltis and Casasent, 1979), photo-diodes (Horrigan and Stoner, 1979), optical programmable computation modules (Tai et al., 1979),and holograms (Guest and Gaylord, 1980).
5.7.2 Signed-Digit Number System The signed-digit system, introduced by Avizienis (196l), is a weighted system in which both positive and negative digits are allowed. Of special interest is a system with three permissible digits: -1 (or i),0, and 1, This system is known in the optics community as the modified signed-digit (MSD) system (Drake et a/., 1986). As an example, the decimal number 5 can be represented as (liOl)Ms,,, since (lTOl)MsD= 1 x 2O + 0 x 2' - 1 x 22 + 1 x Z 3 = ( 5 ) , 0 . The negative of an MSD positive number can be obtained by complementing each digit of that number. The complement of 1 is - 1, of - 1 is 1, and of 0 is 0. Thus 1 -+ T , i + I, and 0 -+ 0. For example, the decimal number -5 can be represented by (iloi)M,D. The MSD system is redundant, i.e., there is more than one representation for each number. For example, _ _ _ the decimal number 5 can also be represented as (loI)MsD, (loii)M,D, (1 101 I),,,, etc. This redundancy can be used to limit the carry propagation only to the next significant digit (Drake ec al., 1986).This, in turn, makes it possible to perform an operation on all digits in parallel. The first optical MSD processor was proposed by Drake et al. (1986). Their architecture was based on arrays of prisms, holograms, and bistable devices to realize the functions needed for addition and subtraction. Other architectures based on truth-table look-up technique (Mirsalehi and Gaylord, 1986b) and symbolic substitution (Li and Eichmann, 1987) have also been proposed.
OPTICAL AND OPTOELECTRONIC COMPUTING
199
5.1.3 Binary Number System
Although the residue and signed-digit number systems are very suitable for parallel processing, most of the proposed architectures for digital optical processors are based on the binary number system. This is due to the fact that the existing digital systems work in the binary system. Therefore, any new processor which is based on a different number system should convert its result into binary in order to communicate with other systems. Unfortunately, the conversion of these systems (especially the RNS) into binary is not easy. Also, the RNS suffers from shortcomings in performing operations such as division and magnitude comparison (Szabb and Tanaka, 1967). As a result, these number systems are suitable only for special-purpose machines that use their strengths and avoid their weaknesses. A general-purpose machine, on the other hand, should be capable of performing any operation and is generally designed to work in binary. 5.2
Computing Techniques
In order to develop practical optical computers, techniques should be used that utilize the strengths of optics. These are not necessarily the same methods that have been developed for electronic systems. Present digital electronic computers are based on the von Neumann machine designed in the mid 1940s. I n this machine, a memory cell is accessed by the central processing unit through an address unit (Fig. 23). The advantage of this technique is that the number of required interconnections for accessing the memory is significantly reduced. For example, a 64-Kbit memory can be accessed by only 16 lines. The price that is paid for this reduction is that only one cell can be accessed at a time. This puts a limit on the speed of the processor, and is known as the von
ADDRESS MECHANISM
DATA INPUT
PROCESSING UNIT
FIG.23. Schematic diagram of a von Neumann machine.
RESULT OUTPUT
200
MIR MOJTABA MlRSALEHl
Neumann bottleneck. Optics is capable of massive interconnection. Therefore, there is no reason to be restricted to the von Neumann machine. Several techniques for optical computing have been proposed that utilize the parallelism and interconnectivity of optics. Some of these techniques are described below. 5.2.1
Threshold Gates
A binary threshold gate has n inputs and one output. The output z is obtained from the inputs ( x l , x 2 , . . . , x,) using the equations z
=0
if w i x i < T,
z
=
1
if w i x i 2 T,
(5.1)
where w i is the weight factor corresponding to the input x i and T is a threshold value. The conventional logic gates, such as AND and OR, are special cases of threshold gates. It can be shown that using threshold logic, a function can be implemented with significantly fewer gates than with the conventional gates. A problem with electronic implementation of threshold gates is that, as the number of inputs increases, the construction of these gates becomes impractical. This is due to the fan-in problem in electronics. Optics does not suffer from this problem; numerous beams of light can be connected to the input of an optical system. The realization of optical threshold-logic gates has been investigated by Arrathoon and Hassoun (1984). In their proposed architecture, the multiplications are performed with Bragg cells and the summation is obtained by a lens (Fig. 24). The output light is converted to an electric signal by a detector and the resultant signal is compared with a threshold voltage. The proposed architecture can be implemented on one substrate using the integrated optics technology. 5.2.2
Table Look-Up
There are two methods for implementing a digital function. In the first method, the function is divided into a series of basic operations which are implemented by logic gates. For example, the addition of two n-bit numbers can be realized by cascading n one-bit adders. In the second method, the information about the outputs for all possible inputs are first stored in a memory. To perform a process, the output for a particular input is obtained from the stored information. This method, known as the table look-up technique, in general provides the output faster than the first method. However, for many functions of practical interest, the amount of information
OPTICAL AND OPTOELECTRONIC COMPUTING
r.'i(;.
20 1
14. Integrated optical threshold gate (Arrathoon and Hassoun, 1984).
that should be stored is too large and the table look-up method becomes impractical. One solution to this problem is to use the RNS. Using the RNS, a large table can be replaced by several small tables. The total number of entries in these small tables is much smaller than the number of entries in the large table while the information contents of the two are equal. Each small table corresponds to a particular modulus and is independent of the other tables. Therefore, the operation on all moduli can be processed in parallel. The table look-up technique was first proposed by Cheney (1961). He described the use of magnetic core elements to implement residue functions. Huang et al. (1979) proposed several optical implementations for the table look-up technique. They used position coding to represent residue numbers, and showed how residue functions can be realized by spatial maps. These maps can be implemented by mirrors, prisms, gratings, optical waveguides, or fibers. In general, two types of maps are needed: fixed maps and changeable maps. Fixed maps are needed for performing an operation on the input number with a specific value-for example, adding the input number by 3. Changeable maps are needed to perform an operation on the input with a variable datafor example, adding a second number ( Y ) to the input number ( X ) . Using these two types of maps, any operation on residue can be implemented. In particular, the table look-up technique is very suitable for performing matrixvector multiplication. Another optical implementation of the look-up technique was proposed by Horrigan and Stoner (1979). They used an opto-electronic system to realize tables similar to Cheney's. They also introduced an electro-optical parallel summer which can be used for matrix-vector multiplication. Guest and Gaylord (1980) used the table look-up technique to realize truth tables. In their truth-table look-up processor, all the input combinations that
202
MIR MOJTABA MIRSALEHI
produce a one in each output bit are stored. The process is then performed by comparing the input pattern with these prestored patterns, known as the reference patterns. If the input combination matches one of the reference patterns, a one is designated to that output bit, otherwise a zero is designated to it. This is a type of content-addressable memory (CAM). The advantage of the CAM over the location-addressable memory (LAM) is that logical minimization techniques can be used to reduce the number of reference patterns. As a result, the amount of information that should be stored is significantly reduced. More reduction can be achieved by multilevel coding (Mirsalehi and Gaylord, 1986a). For example, the 16-bit full-precision addition of two multilevel coded residue numbers requires the storage of only 300 reference patterns. 5.2.3
Symbolic Substitution
Symbolic substitution as a method for optical computing was first proposed by Huang (1983). In this technique, the input data are spatially coded. An operation is performed by detecting specific patterns and replacing them with patterns that correspond to the result. For example, the binary values zero and one can be coded as shown in Fig. 25a. To perform an operation on two numbers, their codes are put on top of each other and the substitution rules are applied. The rules for addition are shown in Fig. 25b. The addition of two n-bit numbers is performed simply by applying these rules n times. Using an optical system, the substitution can be performed on all bits in parallel. An optical symbolic substitution processor is proposed by Brenner et al. (1986). This processor has four parts: a pattern splitter, pattern recognizers, pattern substituters, and a pattern combiner. The pattern splitter is used to copy the coded input data as many times as needed. For example, in the binary addition case, four copies of the inputs are made, since binary addition has four different rules. Each of these copies is checked by a particular pattern recognizer that corresponds to one of the possible patterns, and the locations that match that pattern are detected. The patterns detected by each recognizer are changed to the corresponding output patterns using a pattern substituter. The outputs of all substituters are then combined to obtain the output. The symbolic substitution technique is not limited to binary arithmetic. Any Boolean logic operation can be performed by this method. In fact, any arbitrary operation that can be expressed by some rules can be implemented by symbolic substitution. The input data can be coded by different techniques. The method described above is called position coding. In this method, two pixels are needed for each bit. Another technique is to use two orthogonal polarizations to represent 0 and 1. This method of coding has the advantage that it reduces the device area by half.
203
OPTICAL AND OPTOELECTRONIC COMPUTING
0 0
I
8-5
E-0
oto
=
00
u-0 It0 =
01
OtI
=
01
8-% 1+1
=
10
FIG.2.5. (a) Coding procedure for logical values 0 and I . (b) Substitution rules for addition (Brenner and Huang. 1985).
5.3
Architectures and Technologies
Digital optical processors have been proposed and realized using different technologies. Some of the architectures are described below. 5.3.1
SLM-Based Processors
Spatial light modulators are widely used in the architectures of digital optical processors. In most cases, a SLM is used toenter the input data into the system. In other processors, it is used to perform digital or logic operations. In this section, only some of the architectures in which the SLMs are the major elements for their operation are described. The application of liquid crystal light valve (LCLV) for digital processing has been investigated by Collins ( 1977). He used the birefringent effect of
204
MIR MOJTABA MlRSALEHl
the LCLV to rotate the polarization of light. Using the cyclic nature of polarization, the device was capable of performing residue arithmetic. In a later work, Collins r t al. (1980) used a Hughes LCLV to realize logic gates. Athale and Lee (1979) have developed a type of SLM called an optical parallel logic (OPAL) device. The device consists of a layer of twisted nematic liquid crystal which is used as an electro-optic material. This layer is covered by a thin film of SiO, and CdS with a checkerboard pattern. Athale and Lee fabricated an array of 8 x 8 pixels of the OPAL and showed how different logic functions can be realized with this device. Of particular interest is the implementation of a half-adder which can be implemented by two OPAL devices. An architecture in which the SLM is the key element is shadow casting. Digital processing by shadow casting was first proposed by Ichioka and Tanida (1984). In their proposed system, they used spatial coding to represent binary numbers. Two 2-D arrays were used to code the input numbers. These arrays were put in close contact to each other and were illuminated by four light-emitting diodes (LEDs) that were placed at the corners of a square. A coding mask was used at the output plane to spatially filter the desired function. Using this technique, it is possible to create all 16 logic functions for two binary inputs. Tanida and Ichioka described the use of color and computer-generated holograms for coding the data in their system. The shadow-casting architecture was extended to multiple-input, multipleoutput functions by Arrathoon and Kozaitis (1986). They also described how multiple-valued logic (MVL) can be implemented by shadow casting. The advantage of MVL over binary logic is that more information can be handled by each channel. This is achieved at the price of more pixels and LEDs than used in the binary case. Li et al. (1986) proposed the use of two orthogonal polarizations for encoding the data. A method for the design of logic units with polarization shadow casting is described by Karim et al. (1987). One of the advantages of the shadow-casting architecture is that it utilizes the parallel-processing capability of optics. The inputs are entered as 2-D arrays of data, and the desired logic operation is performed on all elements of the arrays. Another advantage of this architecture is that it is relatively simple. The system does not require a laser; it works with the incoherent light produced by the LEDs. Also, no lenses are needed. In spite of these advantages, shadow casting is presently limited to small systems. To use this technique for large and practical systems, spatial light modulators with a large number of pixels and fast operating speeds are needed. 5.3.2
Holographic Processors
The application of holography for digital processing was first introduced by Preston (1972). He used different phases (0 and 180") to encode the digital
OPTICAL AND (3PTOELECTRONIC COMPUTING
205
inputs of 0 s and 1’s. To realize a logic operation, a hologram was first recorded. The hologram was then developed and used to perform that operation on the input data. In the output plane, the result was obtained as bright and dark spots representing 1’s and 0’s. Preston experimentally demonstrated the logic operations IDENTITY and EXCLUSIVE OR on two input variables. He also described how the logic operations AND and OR can be realized (Preston, 1972). Preston’s work was extended by Guest and Gaylord (1980).They proposed two types of holographical truth-table look-up processors that function based on logical EXECUTIVE OR and NAND operations. Figure 26 shows their
Flci. 26 NAND-based holographical digital optical processor: (a) recording the truth-table holograms arid (h) example of multiplication with the processor. LSR is the least significant bit and MSR is the most significant hit (Guest and Gaylord. 1980).
206
MIR MOJTABA MlRSALEHl
NAND-based processor. In both systems, the information in the truth tables of the desired operation was stored in a photorefractive crystal, such as LiNbO,, as thick holograms. The crystal was then used to obtain the result of that operation on the input data. During the recording process, two different phases (0 and 180") were used for encoding zeros and ones. In the reading process, however, the zeros and ones were encoded by opaque and transparent pixels. With some modification in the data encoding, the NAND-based processor can be used for multi-valued functions (Mirsalehi and Gaylord, 1986a). The application of phase-only holograms for optical symbolic substitution has been investigated by Mait and Brenner (1988).These holograms are made of non-absorbing materials, hence, they have the advantage of high power efficiencies.
5.3.3 Acousto-optic Processors Acousto-optic processors are based on the interaction of sound and light waves. The heart of these devices is an acousto-optic Bragg cell described in Section 4.6.2. Several architectures for performing matrix operations with acousto-optic cells have been introduced (Rhodes and Guilfoyle, 1984; Casasent, 1984). Most of these architectures function as systolic processors. That is, the elements of a vector or a matrix are entered in a specific order at specific positions on the acousto-optic cell, and the output is obtained by a time-integrating detector. An example of a systolic matrix-vector multiplier is shown in Fig. 27. The system realizes the simple equation
Two short electric signals are sequentially entered to the driver of the acoustooptic cell. The amplitudes of these signals are chosen such that they produce two propagating gratings with diffraction efficiencies proportional to x 1 and x 2 . Another set of electric signals is used to drive the light-emitting diodes (LEDs) and produce light intensities proportional to the elements of the matrix. For correct operation, the two signals should be synchronized. After the first clock pulse, the acoustic wave corresponding to x1 will be in front of the lower LED. At this moment, the lower LED creates a light wave with an intensity proportional to a,, . The created light will be diffracted by the grating and detected by the upper detector. The power of the diffracted light will be proportional to a l , x l . After the second clock pulse, the acoustic wave corresponding to .xz will be in front of the lower LED. At this moment, the lower LED creates a light wave with an intensity proportional to u12.The
207
OPTICAL AND OPTOELECTRONIC COMPUTING
ACOUSTOOPTIC MODULATOR
LEDS
a2LBi a12 QII
g
1,1
INTEGRATING DETECTORS
-
y2
[I----,
FIG.27. Acousto-optic systolic matrix-vector multiplier (Rhodes and Guilfoyle, 1984)
power of the diffracted light will be proportional to u I 2 x 2and detected by the upper detector. Since the detectors are of integrating type, the total power detected by the upper detector will be y 1 = a l , x l + u l 2 x 2 . Similarly, the output y , will be obtained. Although the computational power of the systolic acousto-optic processors is high ( 10'" operations/s), they suffer from low accuracy ( - 8 bits). As a result, they are suitable for applications where high accuracy is not essential. More accurate acousto-optic processors can be achieved by sacrificing the throughput of the system for accuracy. Guilfoyle (1984) has developed a system that is 20-bit accurate and has a computing power equivalent to 2.5 x 10" multiply-add/s. To get high accuracy, he has used an algorithm introduced by Whitehouse and Speiser ( 1976) to perform digital multiplication by analog convolution (DMAC). According to this algorithm, the result of multiplication of two binary numbers can be obtained by convolving the two strings of ones and zeros that correspond to those numbers. The result will be in the mixed binary system which has a base of 2 but digits larger than 1 are allowed. For example, consider the multiplication of 19 and 27. The binary representations of these numbers are 10011 and 1101 1. If these two strings are treated as signals f ( t ) and g ( t ) and then convolved, the result will be f ( t ) * g ( t ) = 110231121 which is equivalent to the decimal number 513. Figure 28 shows a schematic diagram of the Guilfoyle's matrix-vector multiplier. The system consists of two acousto-optic cells which are driven by signals proportional to the elements of the matrix and vector. The two lenses a t the left side are used for expanding and collimating the laser beam.
-
208
MIR MOJTABA MlRSALEHl
Fig. 28. Systolic acousto-optic binary computer (SAOBiC) configuration (Guilfoyle, 1984).
The other lenses are used for imaging and Fourier transformation. The final result is imaged on the detector arrays. It should be mentioned that subsequent analysis (Psaltis and Athale, 1986) has shown that DMAC is not an efficient algorithm, so alternative numerical methods are being explored. Acousto-optic cells can be used to implement optical programmable logic arrays (PLAs). The implementations of full-adder and 2- and 3-bit multipliers have been described by Guilfoyle and Wiley (1988). One advantage of these processors is that, unlike electronic PLAs, they do not suffer from the fan-in problem. The OR operation on a large number of optical beams can be performed by using a lens to focus those beams on a detector. The combinatorial architecture is very powerful since any digital function can be written as a sum-of-products expression and be realized by an optical PLA. 5.3.4
Integrated-Optical Processors
The field of integrated optics started in the mid 1960s when the thin-film technology was used to develop optical waveguide devices. An optical waveguide is made of a transparent and dense material which has a higher index of refraction than its surrounding media. As a result of this characteristic, a light wave can propagate inside the waveguide without leaking out.
209
OPTICAL AND OPTOELECTRONIC COMPUTING
Optical fibers, which are widely used for communication today, are one type of optical waveguides. In integrated optics, optical devices are manufactured on a flat substrate by techniques similar to the ones used in integrated electronics. Here, we are interested in the applications of these devices for optical computing. Integrated-optical devices have several advantages, including small size, small power consumption, and high speed. On the other hand, they are restricted to two dimensions and cannot utilize the 3-D capability of optics. Optical waveguides have several structures. The simplest structure is the slab waveguide which consists of a thin film on a substrate. The index of refraction of the material used as the thin film is higher than the index of refraction of the substrate. Different materials have been used for fabrication of optical waveguides. The most widely used materials are LiNbO,, GaAs, and Si. So far, no material has been found that has all the properties desired for integrated optics. Most of the developed devices are made of LiNbO,, since the fabrication with this material is easier. On the other hand, GaAs has the advantage that it can be used to fabricate both passive and active devices. Integrated-optical devices can be used for logical and digital operations. One of these devices which is based on the Mach-Zehnder interferometer is shown in Fig. 29. The device, in general, has two electrodes surrounding each arm. A beam of coherent light created by a laser enters from the left side and is split into two beams. If a voltage is applied between the electrodes of one arm, due to the change in the index of refraction, the optical wave in that channel will experience a phase shift. The applied voltage is normally chosen such that the amount of phase shift is 180".The two optical beams add coherently in the output and produce the result. Using the Mach-Zehnder interferometer, logic operations can be realized. For example, consider Fig. 30a where a voltage proportional to the logic variable ( I is applied to the electrodes. If a = I , the optical wave in the top channel will be shifted by 180' with respect to the wave in the lower channel. As a result, the two waves cancel each other and there will be no light in the output, indicating a logical 0. If a = 0, the two waves will remain in phase and there will be an output light indicating a logical 1. Therefore, the device works
INPUT LIGHT
-
FIG 29
-
OUTPUT LIGHT
The Integrated optlcal mplementatlon of the Mach-Zehnder interferometer
210
MIR MOJTABA MlRSALEHl
as an inverter. Using this technique, other logic functions can be realized. Some examples are shown in Fig. 30 (Taylor, 1977). Integrated-optical devices can also be used for numerical operations, especially matrix-vector and matrix-matrix multiplications (Hurst, 1986). As an example, Fig. 31 shows a matrix-vector multiplier proposed by Verber (1984). A beam of coherent light impinges on a metallic grating which is fabricated on a planar waveguide. The electrodes of this grating are connected to voltages that are proportional to the elements of the vector. As a result, the diffracted light beams will have intensities that are proportional to the elements of the vector. These gratings are used as beam splitters to produce three beams of light with equal power in each channel. Each of the produced beams then impinges on a metallic grating which is connected to a voltage proportional to one of the elements of the matrix. Three lenses are used to
a
a
b
<->a@b
a
b
1
1
FIG. 30. Integrated optical implementations of logical functions: (a) NOT, (b) EXCLUSIVE OR, (c)AND (Taylor, 1977).
OPTICAL AND OPTOELECTRONIC COMPUTING
211
LiNb03 Planar Waveguide
1PI 1p2
I
p3
F c . 31.
Integrated optical matrix-vector multiplier (Verber, 1984).
combine the beams that correspond to each output term. The outputs are obtained from the detectors placed at the focal points of the lenses. The advantage of this multiplier is that it is fully parallel. Using the systolic processing techniques, the integrated-optical processors can be used for matrix-matrix multiplication (Verber, 1984). Another integrated-optical processor is the device that performs the Givens rotation (Mirsalehi et cil., 1986). The elementary rotation matrix may be expressed as
The Givens rotation is obtained when $ is chosen such that d = 0. A schematic diagram of the integrated-optical Givens rotation device is shown in Fig. 32. The device consists of a metallic grating and two phase shifters. An array of this device can be used for matrix triangularization which is an important operation for solving systems of linear equations (Gaylord and Verriest, 1987).
212
MIR MOJTABA MlRSALEHl
FIc;. 32. Integrated optical Givens rotation device (Mirsalehi et
6.
a/., 1986).
Hybrid Processors
The inherent two-dimensional nature of optical systems makes it possible to work with large two-dimensional arrays of data. A second advantage of optical systems is their capability of operating at tremendous data rates. Since signals travel through the optical systems with the speed of light, the throughput of these systems is enormous. With state-of-the-art technology, pure optical systems have some drawbacks. First, the majority of optical systems are analog in nature. As with other analog systems, high accuracy is difficult to achieve. The second problem is that optical systems by themselves are not capable of making decisions. The simplest decision based o n comparison cannot be made without the help of electronics. A third problem is that optical systems cannot be programmed as a general-purpose computer. Purely optical systems can only perform specific tasks, and they can be considered analogous to hardwired electronic analog computers. These deficiencies of optical systems are the strong points of electronic systems. Accuracy, control, and programmability are some of the advantages of digital electronic computers. To combine the nice features of both optical and electronic systems, hybrid processors have been developed.
OPTICAL AND OPTOELECTRONIC COMPUTING
213
One of the first suggestions to combine optical and digital electronic processors was made by Huang and Kasnitz (1967). In the early 1970s, several hybrid systems were proposed to perform specific tasks. The articles by Thompson (l977), Casasent (l98l), and Leger and Lee (1982) have extensive reviews of the literature on hybrid processors. In the following sections, we introduce some of the hybrid processors. First, we describe the general design characteristics of a hybrid processor, and then we present some specific systems. 6.1
General Hybrid Processor
Depending on the tasks that they are designed for, hybrid processors have different architectures. Figure 33 shows a general design which includes most of the existing hybrid systems. Since a hybrid processor is a combination of an optical system and an electronic one, interface devices between the two systems are essential elements of the processor. The input signal can be in three different forms: ( 1 ) a two-dimensional optical signal, (2) a plane wave modulated by an electrical-to-optical (E/O) interface that converts an electronic signal to the optical transmittance of a transparency, and ( 3 ) an array of light sources, such as laser diodes (LDs) or light-emitting diodes (LEDs). Beyond the input interface, the signal propagates as an optical signal and is manipulated by the optical system. The optical system may include SLMs whose transmittance functions are controlled by the central control unit. The output of the optical system can be read as an optical signal or converted to an electronic signal by an optical-to-electrical (O / E )interface which can be a one- or two-dimensional array of photodiodes.
OPTICAL OR ELECTRICAL INPUT SIGNAL
INTERFACE
~
OPTICAL SYSTEM
CENTRAL CONTROLLER AND DIGITAL PROCESSOR
-
6, DEVICES
FIG 33. Schematlc diagram of a general hybrid processor.
214
MIR MOJTABA MlRSALEHl
6.2
Algebraic Processors
Hybrid algebraic processors represent an important class of optical processors. These processors perform the basic operations of linear algebra with flexible architectures. In the last two decades, several optical systems have been proposed that implement the basic linear algebra operations. Cutrona suggested a simple matrix-vector multiplier in 1965. Multiplication of two matrices by coherent optical techniques was proposed by Heinz et al. (1970), and was demonstrated experimentally by Jablonowski et al. (1972) for the simple case of 2 x 2 matrices. An alternative method for matrix-matrix multiplication was introduced by Tamura and Wyant (1977). Schneider and Fink (1975) proposed an incoherent optical system for matrix-matrix multiplication. Krivenkov et al. (1976) and later Bocker (1984) described methods for matrix-matrix multiplication using incoherent light. A very important method in multiplying a vector by a matrix was suggested by Bocker (1974) and an improved version of this method was described by Monahan et al. (1975). Recently, attention has been focused on optical architectures that perform matrix multiplication by systolic, engagement, and outer-product techniques (Caulfield et al., 1984; Athale, 1986). 6.2.1
Matrix-Vector Multiplication
One of the first operations in linear algebra that has been implemented optically is the matrix-vector multiplication (Goodman et al, 1978). As shown in Fig. 34, the vector x is represented by an array of light emitting diodes (LEDs)and the matrix A is represented by an optical mask. The light produced by each LED is expanded horizontally and illuminates the corresponding row of the optical mask. The light emerging from each column of the mask is collected vertically and illuminates one element of the photodiode array. As a result, the output of the photodiode array represents the vector b where A X = b.
(6.1
This system is capable of multiplying large-size vectors and matrices. The limitation on the size is influenced by the electro-optic devices used and the medium on which the matrix is written. Another advantage of this system is its high speed of operation. Since the system operates in parallel, the size of the matrix does not affect the speed of the operation. The disadvantage of this system, which is the result of its analog nature, is low accuracy (- 1%). 6.2.2 Matrix-Matrix Multiplication
Another important linear algebra operation is matrix-matrix multiplication. This operation can be implemented optically by several techniques
OPTICAL AND OPTOELECTRONIC
LED ARRAY
COMPUTING
MATRIX MASK
215
PHOTODIODE ARRAY
FIG. 34. Schematic diagram of an optical matrix-vector multiplier.
(Bocker, 1984; Barakat, 1987). One method is to use the matrix-vector multiplier shown in Fig. 34. Let A be an N x K matrix, 5 be a K x M matrix, and C be their product. The multiplication A 5 = C can be written as (6.2) where bi and ci represent the vectors obtained from the ith columns of the matrices B and C , respectively. To perform this operation using the matrixvector multiplier, we write b, on the LED array and read the vector c, from the photodiode array, and then we keep feeding one column of 5 and read one column of C at a time. In M cycles we obtain the result of the operation. The optical matrix-vector and matrix-matrix multipliers described above can be used to solve many linear algebra problems. In the following section, we present an optical hybrid system capable of solving systems of linear equations, matrix inversion, eigenvalue problems, and other linear and nonlinear problems with high accuracy. A[b,
lb2l+Ml
= CClIC2I.+MI.
6.2.3 The Bimodal Optical Computer Solving a system of linear equations and determining the eigenvalues and the eigenvectors of a system are important problems. These problems arise in solving many physical problems, such as signal extrapolation, phased-array radar, and image enhancement. In most practical cases, the matrices involved
216
MIR MOJTABA MlRSALEHl
are very large ( - 1000 x 1000) and the required computations are very intensive. Although very powerful digital electronic computers have been developed, solving many practical problems is still very time consuming. Because of its inherent parallelism and high speed, optics seems a natural choice for solving this class of problems. Analog optics is very attractive for optical information processing and computing. As shown in the matrix-vector multiplier of Fig. 34, all the elements of the vector can be processed in parallel. At almost the same time that we write the input x, we can read the output b. If the optical path length between the input and output planes is 6 cm, the whole operation of the matrix-vector multiplication can be done in 200 psec. For N = 1000, the number of operations needed to perform Ax = b is on the order of lo6. Therefore, the speed of the processor is about 5 x lOI5 operations/sec. This illustrative example shows the high speed of the analog optics in performing linear algebra operations. Unfortunately, this high speed of operation is combined with low accuracy which is the nature of all analog systems. Analog optics is very fast but inaccurate. On the other hand, digital electronics is very accurate but not as fast as analog optics. The advantages of both analog optics and digital electronics can be achieved in a compromised hybrid system that slows down the processor speed but in return increases the accuracy substantially. The bimodal optical computer (BOC) introduced by Caulfield et al. (1986) is based on this idea. In the following discussion we will show how to solve a system of linear equations using the BOC. Consider an N x N matrix A, and N x 1 vectors x and b. A and b are given, and we want to solve the equation Ax=b
to find the vector x. Equation (6.3) can be solved by analog optic techniques, such as the relaxation method (Cheng and Caulfield, 1982). Consider the hybrid system shown in Fig. 35. First, an initial value for x is assumed and is written on the LEDs. Then the vector x is multiplied by the matrix A. The resultant vector y is compared with b using a difference amplifier, and the difference is fed back to correct x. This process of multiplying the new value of x by A and comparing y with b continues until the difference between y and b becomes negligible and the value of x converges to the solution of Eq. (6.3). This method of solving a system of linear equations is very fast. Its speed is limited only by the speed of the electro-optic devices and the feedback electronic system used, and can be in the nanosecond range. Let us consider now the accuracy of the system. In writing the matrix A on the optical mask (which can be a photographic film or a spatial light modulator) and the vector x on the LED array, large errors are introduced because of the nature of these analog devices. Also, reading the vector b from
A PHOT0DI0DE ARRAY
LED
PROCESSOR *
1
' I
FIG.35. Schematic diagram of a bimodal optical computer.
218
MIR MOJTABA MlRSALEHl
the photodiode array cannot be done accurately. Therefore, the system in Fig. 35 does not solve Eq. (6.3) but instead it solves (6.4)
Aoxo = bo,
where the subscript zeros indicate inaccuracies in the optics and electronics. The solution xo of Eq. (6.4) can be refined to get the vector x using the fol I ow ing algorithm . (a) Solve the system in Eq. (6.4) for x, using the analog optical processor. (b) Store the solution xo with high accuracy in the ditigal processor. Use a dedicated digital processor to calculate the residue r = b - Ax, = A(x
-
x,) = A Ax.
(6.5)
(c) Use the optical analog processor to solve the new system of linear equations Ay = sr, (6.6) where y = s Ax and s is a “radix,” or scale factor, which is chosen to use the dynamic range of the system efficiently. (d) Use the digital processor to refine the solution xo for x 1: x 1 = x0
+ AX.
(6.7)
If the refined solution x , is accurate enough, terminate the iterations. Otherwise, return to (b) for a better solution. The convergence and speed of the solution for the system of linear equations has been studied by Abushagur and Gaulfield (1986, 1987), and Abushagur et ul. (1987).The convergence of the iterative solution depends on two main factors. The first factor is thecondition number of the matrix A , , i.e., z(A,). The smaller the condition number is, the faster the solution will converge. The second factor is the error involved in reading and writing A , x, and b using the electro-optic devices. The higher the accuracy of these parameters are, the faster the convergence will occur. A bimodal optical computer has been built and tested experimentally for solving a system of linear equations (Habli ef ul., 1988).Although the accuracy of the analog processor was about 2 bits, a solution with 16-bit accuracy was obtained by the hybrid system after few iterations. These experimental results show that a highly accurate hybrid system can be obtained from a lowaccuracy analog processor. 6.3
Diffraction-Pattern Sampling Processor
The diffraction-pattern sampling processor is one of the most developed hybrid systems. It is used for automatic data analysis, especially in applications that require the analysis of large amounts of data (Lendaris and
OPTICAL AND OPTOELECTRONIC COMPUTING
219
Stanley, 1970; George and Kasdan, 1975).The detector that is usually used in this type of processor is a unique photodiode array which consists of 32 wedge-shaped and 32 ring-shaped photodetectors. The ring-shaped detectors provide information about the spatial frequency content of the diffraction pattern. The wedge-shaped detectors provide information about the orientation of the object. The output from the 64 detectors is digitized and read by a digital computer. Recognition algorithms are developed for specific applications. The diffraction-based processor has several applications (Casasent, 1981). Here we mention some of them. One of the initial application of this device is the analysis of aerial imagery. Urban imagery can be distinguished from rural imagery based on their diffraction patterns. Urban imagery contains highresolution details and regular structures such as streets and buildings. These features appear as high-spatial-frequency components in the diffraction pattern. A classification algorithm can be developed based on the highfrequency and angular content of the diffraction pattern. Another application of the system is the detection of muscular dystrophy. This is possible because of the significant difference in the diffraction patterns of healthy and diseased cells. This hybrid system has industrial applications, especially in quality control of the products. For example, hypodermic needles can be automatically inspected by the system while they are on the conveyor belt. Bad needles generate vertical components in their diffraction pattern. The wedgeshaped detectors detect the presence of these components and the needles with defects are rejected. Versions of this system are now commercially available (Clark, 1987) and are being used in various industrial environments.
7.
Conclusion
Since the invention of the laser in 1960, there has been an increasing interest in the field of optical information processing. Elements of optical computers, such as memories, logic arrays, and input/output devices, have been developed with various degress of success. The development of holographic memories pursued in 1970s was not very successful because of the material problems. Optical disks, on the other hand, are promising devices and have found applications in data storage. Although the feasibility of optical logic arrays has been demonstrated, large arrays of practical interest have not been developed yet. The input/output devices for optical computers, especially the spatial light modulators, are among the key elements that should be improved in order to use the full advantage of optics. One of the important features of optics is its capability of global interconnections, which is far beyond what can be achieved with electronics. Also, optical interconnections can be made programmable. This feature is useful for performing digital operations, such as discrete Fourier transformation, or for implementing neural networks.
220
MIR MOJTABA MlRSALEHl
Optical processors can be classified in three categories: analog processors, digital processors, and hybrid processors. Although advances in all three areas have been achieved, analog processors have been more investigated and developed. Among the operations that can be easily performed with optics are Fourier transformation, correlation, convolution, and spatial filtering. Also, optical processors have found applications in image processing. Among other important applications of these processors are spectrum analysis and synthetic-aperture radar data analysis. There is an increasing interest in developing digital optical processors. At this stage, most of the efforts in this field are concentrated on developing computing techniques and architectures. Among the promising techniques are symbolic substitution and table look-up. Various architectures using different technologies are under investigation for optical computing. Among the most promising technologies are acousto-optics and integrated-optics. Also, processors based on holography and spatial light modulators have been developed. Hybrid processors combine the advantages of analog optics with those of digital electronics. These processors have found applications in both laboratory and industry and are promising architectures for computationally intensive problems such as solving large sets of linear or nonlinear equations. Today, research in all areas of optical computing is pursued, and this makes the prediction of the future of the field difficult. Knowing the pitfalls of predicting with greater clarity, we tackle the more modest task of doing a linear extrapolation of the present activities. We suspect that the future will hold more, not less. In terms of what will be achieved during the next 10 years, here are our thoughts. 1. Optics will find more applications in electronic computers. Optical disks which are now commercially available, will become more popular. In particular, erasable optical disks will be more developed, and probably will replace the present hard disk memories. Optics will be used for clock distribution in VLSI circuits of high speed systems. 2. Hybrid optoelectronic processors will offer high speed for linear and near-linear algebra problems. 3. Coherent and partially coherent optical pattern recognition will find industrial and military applications in the field. 4. The utility of optics for expert systems will be established. 5. Optically implemented multi-layer neural networks will allow real time control of complex systems.
Even if only these accomplishments take place, optical computing will have established a major role in the total computer arena.
OPTICAL AND OPTOELECTRONIC COMPUTING
221
ACKNOWLEl>GEMEN’I
We are grateful to Joseph Shamir for helpful comments on a draft of this article RFYEKENcES Abbe, E. (1873). Beitrage zur theorie des mikroskops und der Mikroskopischen Wahrnehmung. Archif.. Mikroskopische Anot. 9, 41 3 -468. Ah-Mostafa. Y . S., and Psaltis, D. (1987).Optical neural computers. Scientific American 256 ( 3 ) , 8n 9s. Abushagur. M. A. G., and Caulfield, H. J. (1986).Highly precise optical-hybrid matrix processor. Proc. Soc. Phoro-Opi. Instrum. Eng. 639, 63- 67. Abushagur. M. A. G.. and Caulfield, H. J. (1987). Speed and convergence of bimodal optical computers. Opr. En
222
MIR MOJTABA MlRSALEHl
Bryngdahl, 0.(1974).Geometrical transformations in optics. J . Opt. Soc. Am. 64, 1092-1099. Casasent, D. (I98 1). Hybrid processors. In “Optical Information Processing Fundamentals,” (S. H. Lee, ed.), pp. 181-233. Springer Verlag, Berlin and New York. Casasent. D. ( I 984). Acoustooptic linear algebra processors: architectures, algorithms, and application. Proc. I E E E 72,831 -849. Case, S. K., and Haugen, P. R. (1982). Partitioned holographic optical elements. Opt. Eng. 21, 352-353. Case, S. K., Haugen, P. R., and LBkberg, 0.J. (1981).Multifacet holographic optical elements for wave front transformations. Appl. Opt. 20, 2670- 2675. Caufield, H. J., Ed. (1979).“Handbook of Optical Holography.” Academic Press, New York. Caulfield, H. J . (1985).Optical interference machines. Opt. Commun. 55, 259-260. Caulfield, H. J . (1987). Parallel N 4 weighted optical interconnections. Appl. Opt. 26,4039-4040. Caulfield, H. J., and Lu, S. (1972).“The Applications of Holography.” Wiley, New York. Caulfield, H. J.. Dvore, D., Goodman, J. W., and Rhodes, W. T. (1981a). Eigenvector determination by noncoherent optical methods. Appl. Opt. 20, 2263-2265. Caulfield, H. J., Rhodes, W. T., Foster, M. J., and Horvitz, S. (1981b). Optical implementation of systolic array processing. Opt. Commun. 40,86-90. Caulfield, H. J., Horvitz, S., Tricoles, G. P., and Von Winkle, W. A,, eds. (1984).Special issue on Optical Computing, Proc. I E E E 72, 755-979. Caulfield, H. J., Gruninger, J. H.. Ludman, J. E., Steiglitz, K., Rabitz, H., Gerfand, J., and Tsoni, E. (1986). Bimodal optical computers. Appl. Opt. 25, 3128-3131. Cheney, P. W. (1961). A digital correlator based on the residue number system. I R E Trans. Electron. Comput. EC-10, 63 - 70. Cheng, W. K., and Caulfield, H. J. (1982).Fully-parallel relaxation algebraic operations for optical computers. Opt. Commun. 43, 251-254. Clark D. (1987).An optical feature extractor for machine vision inspection. Proc. Vision ’87 (SOC. of Manufacturing Engineers) pp. 7-23 to 7-49. Clymer, B. D., and Goodman, J. W. (1986).Optical clock distribution to silicon chips. Opt. Eng. 25, 1103-1 108. Collins, S. A. (1977). Numerical optical data processor. Proc. Soc. Photo-Opt. Instrum. Eng. 128, 3 13- 319. Collins, S. A., Fatehi, M. T., and Wasmundt, K. C.(1980).Optical logicgates usinga Hughes liquid crystal light valve. Proc. Soc. Photo-Opt. Instrum. Eny. 231, 168- 173. Cutrona, L. J. (1965). Recent developments in coherent optical technology. In “Optical and Electro-optical Information Processing,” (J. Tippet et ul., eds.), Chap. 6, MIT Press, Cambrige. Massachusetts. Cutrona, L. J.. Leith, E. N., Parlermo, C. J., and Porcello, L. J. (1960). Optical data processing and filtering systems. I R E Trans. Information Theory IT-6, 386-400. Cutrona, L. J., Vivian, W. E., Leith, E. N., and Hall, G. 0.(1961). A high-resolution radar combatsurveillance system. I R E Trans. Militury Electronics MIL-5, 127-13 I . Cutrona, L. J., Leith. E. N., Procello, L. J., and Vivian, W. E. (1966).On the application of coherent optical processing techniques to synthetic-radar. Proc. I E E E 54, 1026- 1032. Drake, B. L., Bocker, R. P., Lasher, M. E., Patterson, R. H., and Miceli, W. J. (1986). Photonic computing using the modified signed-digit number representation. Opt. Eng. 25, 38-43. Dulfieux, P. M. (1946). “L’integrale de Fourier et ses applications i I’optique.” Facultk des Sciences, Besancon, France. Eichmann, G. and Caulfield, H. J. (1985). Optical learning (interface) machines. Appl. Opt. 24, 205 1 - 2054. Eichmann, G., Kostrazewski A., Ha, B., and Li, Y. (1988). Parallel optical pyramidal image processing. Opt. Lett. 13,431-433. Elias, P. (1953).Optics and communication theory. J. Opt. Soc. Am. 43, 229-232.
OPTICAL AND OPTOELECTRONIC COMPUTING
223
Elias. P.. Grey, D. S.. and Robinson. D. Z. (1952). Fourier treatment of optical processes. J . Opt. SOC..Afti 42, I27 - 134. Elion. H. A.. and Morozov, V. N. (1984).“Optoelectronic Switching Systems in Telecommunications and Computers.” Marcel Dekker, New York. Farhat, N . H. ( 19x7).Architectures for optoelectronic analogs of self organizing neural networks. Opr. I.err. 12, 6 - 8. Farhat, N. H. (19x8). Optoelectronic analogs of self-programming neural nets: architecture and methodologies for implementing fast stochastic learining by simulated annealing. Appl. Opt. 26, 5093 5103. Farhat. N. H.. Psaltis. D., Prata, A., and Paek, E. (1985).Optical implementation of the Hopfield model. 4 p p l . Opr. 24, 1469- 1475. Feldman, M. R.. and Guest, C. C. (1987). Computer generated holographic optical elements for optical interconnection of very large scale integrated circuits. A p p l . Opt. 26, 4377-4384. Fisher. A. I).. and Lee, J. N. (1987).The current status of two-dimensional spatial light modulator technology. In “Optical and Hybrid Computing,” (H. H. Szu. ed.), SPIE vol. 634, pp. 352-371. Filch, J. P.. Coyle. E. J.. and Gallagher, N. C. (1984). Median filtering by threshold decomposition. I E E E ’T’rcins.Amusr. Speech siy. Proc. ASSP-32, 1 183- 1188. Gabor. D. (1948). A new microscope principle. Nurure, 161, 777- 778. Gabor, D. (1969).Associative holographic memories. I B M J . Rrs. Deaelop. 13, 156-159. Gaylord. T. K. (1979). Digital data storage. In “Handbook of Optical Holography.” (H. J. Caulfield. ed.), pp. 379-413. Academic Press, New York. Gaylord. T. K.. and E. I. Verriest (987). Matrix triangularization using arrays o f integrated optical Givens rotation devices Comprrtu 20 (12). 59 -66. George. N., and Kasdan, H. L. (1975). Diffraction pattern sampling for recognition and metrology. Pro(..Electro-opticul Sysrems Design C o ~ / c r e n c epp. , 494- 503. Gibbs, H. M. (19x5). “Optical Bistability: Controlling Light with Light.” Academic Press, New York. Goodman. J . W. ( 1968).“Introduction to F-ourier Optics,” Ch. 7. McGraw-Hill, New York. Goodman. J. W., Dias. A,. and Woody, L. M. (1978). Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms. Opt. let^. 2, 1-3. Goodman. J. W.. Athale, Lconberger. F. J . , Kung, S. Y.. and Athale, R. A. (1984). Optical interconnections for VLSl systems. Proc. I E E E 72, 850-866. Goodman. S. D.. and Rhodes. W. T. (1988). Symbolic substitution applications to image processing. .4ppl. Opt. 27, 1708-1714. Goutzoulis, A. P.. and Abramovitz, 1. J. (1988). Digital electronics meets its match. I E E E Spectrum 25 (X), 2 I - 25. Goutzoulis. A. P.. Malarkey. I . C., Davies. D. K.. Bradley. J. C., and Beaudet. P. R. (1988). Optical processing with residue LED/LD look-up tables. A p p l . Opr. 27. 1674-1681. Guest, C. C.. and Gaylord, T. K. (1980).Truth-table look-up optical processing utilizing binary and residue arithmetic. A p p l . O p f . 19, 1201 1207. Guilfoyle. P. S. (19x4). Systolic acousto-optic binary convolver. Opt. Eny. 23, 20- 15. Guilfoyle. P. S.. and Wiley, W. J. (1988). Combinatorial logic based digital optical computing architectures. A p p l . O p f .27, 1661 1673. Hahli. M. A,. Ahushagur. M. A . G., and Caulfield, H. J. (1988).Solving system of linear equations using the bimodal optical computer (Experimental results). Proc. Photo-Opt. Insfrum.Eng. 936, 31 5 - 320. Heinr, R . A,. Artman. J. O,, and Lee. S. H. (1970).Matrix multiplication by optical methods. A p p l . Opt. 9.2161 2168. Homer, J. L.. and Gianino. P. D. (19x4). Phase-only matched filtering. A p p l . Opt. 23, 812-816. Horrigan. I-‘. A,. and Stoner, W. W. (1979).Residue-based optical processor. Proc. Soc. Photo-Opt. Insrritm. Enq. 185, 19 -27. ~
224
MIR MOJTABA MlRSALEHl
Huang, A. (1975).The implementation of a residue arithmetic unit via optical and other physical phenomena. Proc. Int. Opt. Comput. Conf., pp. 14-18. Huang, A. (1983). Parallel algorithms for optical digital computers. Tech. Digest, I E E E Tenth Int. Opt. Comput. Conf., pp. 13-17. Huang, A,, Tsunoda, Y., Goodman, J. W., and lshihara, S. (1979). Optical computation using residue arithmetic. Appl. Opt. 18, 149-162. Huang, T. S., and Kasnitz, H. L. (1967). The combined use of digital computer and coherent optics in image processing. Proc. Soc. Photo-Opt. Instrum. Eny. vol. 10, pp. 182 - 188, Seminar on Computerized Imaging Techniques. Hurst. S. (1986). A survey: Developments of optoelectronics and its applicability to multiplevalued logic. Proc. 16th Int. Symp. Multiplr-Valued Logic, pp. 179- 188. Ichioka, Y., and Tanida, J. (1984). Optical parallel logic gates using a shadow-casting system for optical digital computing. Proc. I E E E 72,787-801. lizuka, K. (1985). “Engineering Optics.” Springer-Verlag, Berlin and New York. Isailovit, J. (1985). “Videodisc and Optical Memory Systems.” Prentice-Hall, Englewood Cliffs, New Jersey. Jablonowski, D. P., Heinz, R. A., and Artman, J. 0. (1972). Matrix multiplication by optical methods: Experimental verifications. Appl. Opt. 11, 174-178. Jackson, K. P., and Shaw, H. J. (1987). Fiber-optic delay-line signal processors. In “Optical Signal Processing,” (J. L. Horner, ed.), pp. 382-404. Academic Press, New York. Javidi, B., and Horner, J. L. (1989). Multi-function nonlinear optical processor. Opt. Eng. 28 (to be appeared). Jenkins, F. A,, and White, H . E. (1976). “Fundamentals of Optics,” 4th Ed., pp. 385-386.’ McGraw-Hill, New York. Karim, M. A., Awwal, A. A. S., and Cherri. A. K. (1987). Polarization-encoded optical shadowcasting logic units: design. Appl. Opt. 26, 2720-2725. Kato, H.. and Goodman, J. W. (1975). Nonlinear filtering in coherent optical systems through halftone screen processes. Appl. Opt. 14, 1813- 1824. Keyes, R. W., and Armstrong, J. A. (1969). Thermal limitations in optical logic. Appl. Opr. 8, 2549 2552. Kohonen, T. (1988). “Self-organization and Associative Memory,’’ second edition. SpringerVerlag, Berlin and New York. Krivenkov, B. E., Mikhlyaev, S. V., Tverdokhleb, P. E., and Chugui, Y. V. (1976). Non-coherent optical system for processing of images and signals. In “Optical Information Processing,” (Y. E. Nesterikhin, G . W. Stroke, and W. E. Kock, eds.), pp. 203-217. Plenum Press, New York. Lee, S. H. (1981). Coherent optical processing. In “Optical Information Processing,” (S. H. Lee, ed.), pp. 43-68. Springer-Verlag. Berlin and New York. Leger, J . R.. and Lee, S. H. (1982). Signal processing using hybrid systems. In “Applications of Optical Fourier Transforms,’’ (H. Stark, ed.), pp. 131-207. Academic Press, New York. Lehmer, D. H. (1933). A photo-electric number sieve. American Mathematical Monthly 40, 40 1- 406. Lendaris, G . G . , and Stanley, G . L. (1970). Diffraction-pattern sampling for automatic pattern recognition. Proc. I E E E 58, 198-216. Li, Y., and Eichmann, G . (1987). Conditional symbolic modified signed-digit arithmetic using optical content-addressable memory logic element. Appl. Opt. 26,2328-2333. Li, Y., Eichmann, G . ,and Alfano, R. R. (1986). Optical computing using hybrid encoding shadow casting. Appl. Opt. 25, 2636-2638. Mait, J. N., and Brenner, K.-H. (1988).Optical symbolic substitutions: system design using phaseonly holograms. Appl. Opt. 27, 1692- 1700. Maragos, P. (1987). Tutorial on advances in morphological image processing and analysis. Opt. Eng. 26,623-632. -
OPTICAL AND OPTOELECTRONIC COMPUTING
225
Marechnl. A.. and Croce. 1’. (1953). Un filtre de friquences spatiales pour I’amelioration du contraste des images optiques. C. R . Accirl. Sci. 237. 607-609. Mirsalehi, M. M.. and Gaylord, T. K . (1986a).Truth-table look-up parallel data processing using an optical content-addressable memory. Appl. Opt. 25, 2277-2283. Mirsalehi. M. M., and Gaylord, T. K. ( 19X6b).Logical minimization of multilevel coded functions. Appl. Opt. 25,3078 3088. Mirsalehi, M. M.. Gaylord. T. K., and Verriest, E. I . (1986). Integrated optical Givens rotation device. Appl. Opt. 25, 1608-1614. Mirsalehi, M. M.. Shamir, J., and Caulfield, H. J. (19x7). Residue arithmetic processing utilizing optical Fredkin gate arrays. Appl. Opt. 26, 3940-3946. Monahan. M. A.. Bocker, R. P.. Broniley, K.. Louie, A. C. H., Martin, R. D., and Shepard, R. G. ( 1975) The use of charge coupled devices in electro-optical processing. Proc. IY75 Intern. ConJ. on rhr Applicurions uf Charyr-Couplid D e ~ k e s . Moslehi, B.. Goodman, J . W.. Tur. M.. and Shaw. H. J. (1984). Fiber-optic lattice signal processing. /‘roc,. I E E E 12, 909-930. Ochoa. E.. Allebach, J. P., and Sweeney, D. W. (1987). Optical median filtering using threshold decomposition. Appl. Opt. 26, 252 260. Owechko. Y.. Dunning, G. J., Marom, E., and Sorer, B. H . (1987). Holographic associative memory with nonlinearities in the correlation domain. Appl. Opf. 26, 1900-1910. Paek, E. C . . and Psaltis. D. (1987). Optical associative memory using Fourier transform holograms. Opt. En!]. 26,428-433. Peyghambarian. N., and Gibbs, H. M. ( 1985). Optical bistability for optical signal processing and computing. Opt. Eny. 24, 68-73. Porter. A. B. (19061.On the diffraction theory of microscopic vision. Philosophical Mayuzine and Journal o/ S ( , i c w c , 1 I , 154- 166. Preston. K . (1972). “Coherent Optical Computers.” pp, 232-250. McGraw-Hill. New York. Psaltis. 11.. and Athale. R. A. (1986). High accuracy computation with linear analog optical systems: A critical study. A p p l . Opf. 25, 3071-3077. Psaltis. D., and Casasent, D. ( 1979).Optical residue arithmetic: A correlation approach. Appl. Opt. 18, 163 171. Psaltis. D.. Casasent. D., Neft. D., and Carlotto. M. (1980). Accurate numerical cornputation by optical convolution. Proc. sot. f horo-Opt. /nutrimmi. En(/. 232, 15 I 156. Psaltis. D.. Brady. D., and Wagner, K . (19x8). Adaptive optical networks using photorefractive crystals. Appl. Opt. 27. 1752- 1759. Rhodes. J . E. (1953).Analysi!, and synthesis of optical images. Am. J . Phys. 21, 337-343. Rhodes, W. T. ( 1981). Acousto-optic signal processing: Convolution and correlation. Proc. I E E E 69.65- 79. Rhodes. W. T., and Guilfoyle. P. S . (19x4). Acoustooptic algebraic processing architectures. Proc. / L E E 72, 820-830. Schaefer, D. H., and Strong. J. P. (1977).Tse computers. Proc. I E E E 65, 129-138. Schmidt. U . J. ( 1973). Present state of the digital laser beam deflection technique for alphanumeric and graphic displays. In “Progress in Electro-Optics” (F.-Z. Camatini, ed.), pp. 161 180, Plenum Press. New York. Schneider, W., and Fink, W. (1975). Incoherent optical matrix multiplication. Opt. Acta 22, 879 889. Serra. J . ( 19x2). “Image Analysis and Mathematical Morphology.” Academic Press, New York. Shamir, J. (1987). F‘undamenlal speed limitations on parallel processing. Appl. Opt. 26, 1567. Shamir, J.. and Caulfield, H. J. (1987). High-efliciency rapidly programmable optical interconnections. Appl. Opt. 26, 1032 1037. Shamir. J., Caullield. H. J., Miceli, W. J.. and Seymour. R. J. (1986). Optical computing and the Fredkin gates. A p p l . Opt. 25, 1604- 1607. -
-
~
226
MIR MOJTABA MlRSALEHl
Sherwin, C. W., Ruina, J. P., and Rawclitk, R. D. (1962). Some early developments in synthetic aperture radar systems. I R E Trans. Military Electronics MIL-6, 1 1 1 - 1 15. Smith, S. D., Janossy, I., MackKenzie, H . A,, Mathew, J. G. H., Reid, J. J. E., Taghizadeh, M. R., Tooley, F. A. P., and Walker, A. C. (1985).Nonlinear optical circuit elements as logic gates for optical computers: the first digital optical circuits. Opt. Eng. 24, 569-574. Smith, S. D.. Walker, A. C., Wherrett, B. S., Tooley, F. A. P., Craft, N., Mathew, J. G . H., Taghizadeh, M. R., Redmond, J., and Campbell, R. J. (1987). Restoring optical logic: demonstration of extensible all-optical digital systems. Opt. Eny. 26,45-52. Strand, T. C. (1975). Nonmonotonic nonlinear image processing using halftone techniques. Opt. Commun. 15,60-65. Svoboda, A,. Valach, M. (1955).Operatorove obvody (Operational circuits), Stroje nu Zpracouuni Informuci, Sbornik 111, &AV, Praha. Szabo, N. S., and Tanaka. R. 1. (1967). “Residue Arithmetic and Its Application to Computer Technology.” McGraw-Hill, New York. Szu, H. H., Ed. (1987). “Optical and Hybrid Computing,” SPIE vol. 634. Szu, H . H., and Caulfield, H. J. (1987).Optical expert systems. Appl. Opt. 26, 1943-1947. Tai, A., Cindrich, I., Fienup, J. R.,and AleksotT, C. C. (1979).Optical residue arithmetic computer with programmable computation modules. Appl. Opt. 18,2812-2823. Takeda, M., and Goodman, J. W. (1986). Neural networks for computation: Number representations and programming complexity. Appl. Opt. 25, 3033-3046. Tamura, P. N., and Wyant, J. C. (1977).Matrix multiplication using coherent optical techniques. Proc. Soc. Photo-Opt. Instrum. Eny. 83,97- 104. Tanida, J., and Ichioka, Y. (1985a).OpticaI-logic-array processor using shadowgrams. 11. Optical parallel digital image processing. J . Opt. Soc. Am A 2, 1237-1244. Tanida, J., and Ichioka, Y. (l985b). Optical-logic-array processor using shadowgrams. 111. Parallel neighborhood operations and an architecture of an optical digital-computing system. J . Opt. Soc. Am. A 2, 1245-1253. Taylor, H. F. (1977).Integrated optical logic circuits. Proc. Topical Meeting on Integrated and Guided Wuue Optics, pp. TuC4-I to TuC4-4. Thompson, B. J. (1977).Hybrid processing systems-An assessment. Proc. I E E E 65,62-76. Tippett, J. T., Berkowitz, D. A,, Clapp, L. C., Koester, C. J., and Vanderburgh, A., eds. (1965). “Optical and Electro-Optical Information Processing.” MIT Press, Cambridge. Tsai, C. S. (1979). Guided-wave acoustooptic Bragg modulators for wide-band integrated optic communications and signal processing. I E E E Trans. on Circuits and Systems CAS-26, 1072- 1098. Vander Lugt, A. B. (1964).Signal detection by complex spatial filtering. I E E E Truns. /nf, Theory IT-I0 (2), 139-145. Verber, C. M. (1984).Integrated-optical approaches to numerical optical processing. Proc. I E E E 72,942-953. Whitehouse, H. J., and Speiser, J. M. (1976).Linear signal processing architectures. In “Aspects of Signal Processing” Part 2 (G. Tacconi, ed.), pp. 669-702. D. Reidel Publishing Company, Dordrecht-Holland/Boston-U.S.A. Zernike. V. F. (1935). Das phasenkontrastverfahren bei der mikroskopischen beobachtung. Zeitschr8,fur Trchnische Physik 16,454-457.
Management Intelligence Systems MANFRED KOCHEN School of Medicine (Mental Health Research Institute) and School of Business Administration (Computer & Information Systems) University of Michigan Ann Arbor. Michigan 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Background . . . . . . . . . . . . . . . . . . . . . . .
2.
3.
4.
5.
6.
1.2 Six Aspects/Kinds of Technology. . . . . . . . 1.3 Need for a MINTS . . . . . . . . . . . . . . 1.4 An Effectiveness Condition . . . . . . . . . . 1.5 Organization of the Chapter . . . . . . . . . On the Nature of Intelligence . . . . . . . . . . . 2.1 Intelligence as in Organizational Intelligence . . . . 2.2 Natural Intelligence . . . . . . . . . . . . . 2.3 Intelligence as Computation (AI) . . . . . . . . What is a MINTS: Requirements and Uses . . . . . . 3.1 Market Intelligence. . . . . . . . . . . . . . 3.2 Technology Intelligence . . . . . . . . . . . 3.3 Financial Intelligence . . . . . . . . . . . . . 3.4 Organizational Intelligence . . . . . . . . . . 3.5 Environmental Intelligence . . . . . . . . . . 3.6 Requirements for Intelligence in General . . . . . Analysis, Design and Maintenance of MlNTSs . . . . . 4.1 Architecture of a MINTS . . . . . . . . . . 4.2 MINTS Development Lifecycles . . . . . . . . 4.3 The Effectiveness Condition . . . . . . . . . . 4.4 A Model for Relating Natural and Artificial Intelligence Managerial Issues . . . . . . . . . . . . . . . . 5.1 Management OF a MINTS . . . . . . . . . . 5.2 Management WITH a MINTS . . . . . . . . . 5.3 Management IN a MINTS Environment . . . . . 5.4 Communication, Competition and Cooperation . . . 5.5 Emergent Properties and Systemic Intelligence . . . Conclusion . . . . . . . . . . . . . . . . . . Acknowledgement . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . .
.
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
221 229 230 232 233 234 234 236 238 240 242 243 246 249 249 250 250 253 253 251 260 262 261 261 269 269 210 212 213 214 214
Introduction
A management intelligence system (MINTS) is intended by the organization it serves to scan the organization's environment so that management can better assess its position with a view to enhancing the value of the 'DeQased
227 ADVANCES IN COMPUTERS. VOL . 28
.
Copynght 0 1989 by Academic Press Inc All nghts of reproduction In any form reserved ISBN 0- 12-012 128-X
228
MANFRED KOCHEN
organization and its services. Simple versions of such systems have existed for a long time. The Bible states that Joshua sent from his position east of the Jordan two men to scan the environment on the West Bank and Jericho. Returning from this covert mission, they reported that “the Lord has delivered all the land into our hands, and, moreover, all the inhabitants of the country do melt away because of us” (Joshua 2:24). Contemporary intelligence systems rely far more on overt, public sources than on covert espionage missions. They must screen, evaluate, correlate, interpret, analyze and synthesize vastly more information. These activities require judgment, hypothesis-formation, reasoning, and a great deal of knowledge, understanding, and intelligence. They are performed by persons with the help of computers. But it takes them a long time. It often takes more talented people than can be mobilized and paid to produce high-quality intelligence. The timing and complexity of what an organization must do to identify what is in its interest is rapidly changing. If the capabilities of human intelligence analysts and managers in one organization can be amplified to produce higher quality intelligence much more quickly, and in the face of vastly more information, uncertainty and ambiguity, then competitive organizations will also seek to amplify their capabilities whenever possible. Therefore, the management intelligence systems required by competing organizations in government and business will be the best they can obtain. The best are likely to make good use of advanced technology, such as artificial intelligence. The purpose of such man-machine MINTSs is to support professional strategists, planners, and researchers as would a good semiautomated research assistant, enabling the strategists to produce better plans, more quickly and at lower cost. The requirements of such MINTSs can provide needed direction to research in artificial intelligence and its relation to natural intelligence because, as will be argued, both are necessary for a MINTS to be effective. They also challenge researchers to investigate some basic scientific questions about the nature of intelligence. The purpose of this chapter is to emphasize the importance of Management Intelligence Systems and to encourage studies involuing them. The remainder of this introduction provides some background for the claim that MINTS constructed with some artificial intelligence and used by managers with natural intelligence and competence above a certain threshold will meet important needs. It is followed by an elaboration of why formal, computerized management intelligent systems are needed and why they are feasible. The introduction concludes with a condition for the effectiveness of a MINTS and it presents the logical structure for the remainder of this chapter. The reader interested primarily in more technical or artificial intelligence aspects might wish to skip to Section 4 and Fig. 2.
MANAGEMENT INTELLIGENCE SYSTEMS
229
1.1 Background The question of whether computers can be programmed to exhibit intelligence and thought has been seriously considered since at least the seminal work of Turing (1950). The early realization that computers are general-purpose symbol processors, steered by a coded program of stored instructions that can be processed just like data, with the computer being able not only to carry out the instructions but to change them and to carry out these changes right away as well, was one of those deep insights central to the intellectual development of computer science. Examples of symbol processing include not only numerical computations but derivations using algebra and the calculus, logical deductions, syntactic analysis, synthesis of music, art and chemical compounds, and self-modifying programs. Presumably any process that can be represented symbolically could be programmed for execution by a computer. I t has long been an intellectual challenge to discover what could not, in principle, be so represented. First, non-computable numbers were found (Turing, 1937). These corresponded to undecidable propositions. Next, the bounds of algorithmic complexity were found (Solomonoff, 1960, 1978, 1988; Kolmogorov, 1965,1968; Chaitin, 1974,1975a, 1975b). Many of the ideas that seem so contemporary, such as indexing, high-level languages, evolutionary programming, genetic algorithms, etc., were conceived in the early days of computing (the 1950s). There may have been much more creative and fundamental activity in the development of algorithms and exciting concepts before the explosive growth of technology than after it. The absence of powerful technologies may have contributed to the intellectual ferment because imagination could be exercised with fewer constraints, because the most intellectually creative persons were attracted to the field (e.g., mathematicians, etc.), and because they could devote all their time to this effort rather than to mastering use of the technologies and exploiting them. Nonetheless, the revolutionary impact of the transistor and of integrated circuits took even the most farsighted and imaginative thinkers by surprise. Reality proved to be far more astounding than the products of the most fertile minds. The seemingly limitless capabilities of the rapidly developing technology stimulated new and different ideas. While the earliest advances were idea-driven and hardware-limited, this was soon reversed. I n the later 1950s and 1960s, advances were hardware-driven and demand-limited. Soon, however, advances became demand-driven and software-, data-, and algorithm-limited. One of the earliest demands for computer use came from the intelligence community and from decisionmakers who felt that computers should help them make more informed decisions. The latter spawned interest in “Management Information Systems” or MISS. Because they tended to be prescriptilie, building on methods and
230
MANFRED KOCHEN
results of operations research and management sciences, MISS were not widely and effectively used, and emphasis shifted toward systems that support managers, with the term “Decision Support Systems” (DSS)replacing MIS, to be followed by numerous other acronyms such as SIS (Strategic Informations Systems; Wiseman, 1988).What we call a “Management Intelligence System,” or MINTS is an executive support system, i.e., a decision support system for high-level managers responsible for strategic decision-making and planning, supplying them with intelligence rather than information, and based on advanced information technologies, notably artificial intelligence, AI. In recent years, the business community has become aware of the need for formal MINTS, as has the military intelligence community before (Burke, 1984; Rothschild, 1984; Levite, 1987; Miller, 1987; Porter, 1980, 1985; Sammon, et al., 1987; Wagers, 1986; Ljungberg, 1983; Lancaster, 1978).This is in large part the result of increasing global competition, with newer firms entering the market that have rapidly commercialized technology and recently grown quite rapidly. Moreover, they have used technologies in strategic ways. For example, product development and delivery schedules have been shortened. Which firms can deliver a differentiated product first is often a horse race; the winner may lead by a few days. This makes the availability of intelligence more important. 1.2 Six AspectdKinds of Technology
It is useful in explaining management intelligence systems to distinguish six kinds, or aspects, of technology. The most familiar is what I call material technology. This is the hardware aspect of computer information systems. It involves all aspects of design, manufacturing and maintenance of components, of circuits, cards and boards. The performance of this aspect of computer technology (and, to a slightly lesser extent, of communications technologies) has been doubling per dollar per year, and this trend is likely to continue for at least a decade. The second kind of technology, sofcwure, includes programs of all kinds. These, too, have a material aspect in the sense that some are embodied in diskettes, for example. Documentation is part of such programs, whether they are operating systems, languages, application generators, applications, etc., and it appears in hard copy or online as programs. The third kind of technology is in the form of data and associated forms that make it possible to transform data into information and knowledge. The distinction between these three concepts is important. We regard knowledge to be represented by forms such as sentences, propositions, and formulas, in which there may be blanks to be filled in. For example, the sentence, “The melting point of ice under standard conditions is OOC,” represents knowledge.
MANAGEMENT INTELLIGENCE SYSTEMS
23 1
If “0°C” is deleted and replaced by a blank, the “sentence” is a form to be filled in; the string of symbols, “ O T , ” as given, illustrates data. If it is designated for insertion in the blank of the appropriate form, as illustrated here, it becomes ir$irmation, and the completed form is knowledge. The structuring of data, information and knowledge so that it can be easily updated, accessed by a variety of programs and users without introducing inconsistencies or unnecessary redundancy, and so that deductive and inductive inferences and be easily made, comprises this third aspect of technology. These are data/ knowledge-base management systems. The fourth aspect or kind of technology, model-base management systems, consists of models, algorithms, rules and techniques of inference, methods and concepts and ways of organizing and accessing them. It includes the architecture of hardware and software. This aspect of computers is what occupied the attention of pioneers prior to the hardware revolution. It is intellectual technology, though pioneers such as Turing and von Neumann not only developed algorithms but built machines for making them operational. Today, spreadsheet technologies permit anyone to construct simple operational models. The organization of all these models and algorithms, implemented as operational programs or not, so that the collection can be updated to maintain its integrity, accessed and fully used, comprises this fourth type of technology. The fifth kind of technology conists of operating procedures. Both it and the fourth kind are to some extent documented in publications. But procedures and operational principles exist largely in the memories and habits of practitioners. It is the stuff of expertise. It is what a knowledge engineer tries to capture when he interviews and observes experts while he builds an expert system. It includes such expertise as what hardware and what software to recommend, what data to get and where to obtain it, and what existing concepts, principles, models, methods, heuristics, and algorithms to use. It includes not only know-what (i.e., substantive knowledge) but know-how, know-who (whom to turn or refer to when local expertise or resources are inadequate), know-when, know-where, know-why (justification), and knowhow-much. We call this tacit knowledge know-X. The sixth, and probably the most important kind of technology consists of leadership, organizational and managerial structures. It has been called “heartware” by the Japanese, “orgware” by some Russians (i.e., Dobrov) and “peopleware” by some Americans. We shall call it the sociotechnological aspect. The problems it is to help solve are those of strategic planning, of formulating and prioritizing missions, objectives, and goals. It is to support execution of plans by helping in the selection of suitable persons, structuring their interrelations, and, above all, stimulating the awareness of values. It includes incentives and mechanisms for enforcement. In a sense, it differs in
232
MANFRED KOCHEN
kind from the other five aspects of technology in that they support the sixth. Actually, each aspect supports the other five in varying degrees. Management takes place in the environment pervaded by these six aspects of technology. A few managers must manage with the help of these six kinds of technologies as tools. They must use them wisely. Even fewer managers really control these six technologies. This sixth kind of technology is used on itself. The very procedures and organizational decisions and designs are called into question and affected by the introduction of technology. That is the meaning of “high” in “high technology,” when technology advances to the level at which it modifies itself, calling for new management perspectives. Each of these six kinds or aspects of technology is a codified, communicable way of solving problems. They were present in a computer information system even during World War I1 before the first electronic computers. T o help solve the decision problem in mathematics and logic, the concept of computability and the abstract model of a universal Turing Machine (Aspect 4) was invented and used. Turing and his team at Bletchley during World War I1 needed hardware and software as well as data and ideas (Aspects 1-4) to solve problems of deciphering ENIGMA codes. Of course the procedural and organizational aspects ( 5 and 6 )were necessary for coping with such problems as secrecy, security and keeping up with German changes. This forerunner to the system developed at Manchester was really a first modern MINTS. All six kinds of technology should be integrated and brought to bear in a coherent way on the strategic planning of an organization. Information, knowledge, understanding, intelligence, and wisdom are now necessary for organizations (and individuals) to cope. We will focus here primarily on intelligence. One aim of this chapter is to show that these six kinds of technologies are emerging in ways that, if integrated and properly directed, could meet the growing needs for MINTS.
1.3 Need for a MINTS A military unit that is responsible for carrying out the will of its commander in the presence of an adversary clearly requires information about the intentions, capabilities and commitments of that adversary. It also requires information about environmental conditions, such as terrain, weather, etc., to which intentions and commitments cannot be attributed. Thus, the unit needs intelligence about the barriers to the tasks facing both it and its adversaries, about the technologies to overcome these barriers, about the resources, etc. It needs such intelligence in time for effective plans and actions to be taken; obviously, the intelligence must also be reliably accurate, appropriately precise, comprehensible and credible to those receiving it. A business organization also needs intelligence. A firm in a rapidly changing market, in an industry undergoing fast technological changes, in an increas-
MANAGEMENT INTELLIGENCE SYSTEMS
233
ingly global and competitive climate with dramatic changes in labor, capital, land and knowledge must plan in the face of increasing uncertainty as well as in the face of more exacting timing and expertise requirements (Haruurd Business Rruiew, 1987). It needs intelligence about: its markets, such as probable changes in customer preferences, spending patterns, etc.; the technologies for improving processes of production, distribution, servicing, as well as for new products and services; the availability of capital, property, labor; and the intentions, capabilities and commitments of adversaries. Other firms in more atypical situations may not need intelligence as much. Without such intelligence, a vulnerable firm is less likely to make a sound plan that will further its interests, or that will keep it from being moved into intolerable positions, such as bankruptcy, hostile takeover, etc. Some large firms have begun to recognize this need for corporate intelligence functions to support strategic management (Gilad and Gilad, 1986; Sammon, er ul., 1987; Fahey, et ul., 1981; Burke, 1984). But such functions are interpreted narrowly and assigned to a corporate (human) library staff that is among the first to be reduced if cost savings must go into effect. Yet the first serious proposal for a Business Intelligence System (Luhn, 1958) involved computerized content analysis of documents and led to the invention of “Selective Dissemination of Information Systems.” Long-range strategic planning in the 1950s was practiced to a greater extent by what were then developing or recuperating countries (e.g., Japan, Korea, and West Germany) than in the United States. The relative positions of U.S. and East Asian firms is to a large extent due to this discrepancy in the emphasis on intelligent, intelligence-based long-term planning decades ago. For example, the Japanese in the late 1940s realized the trend toward an information society that they, having no natural resources but only brains, had to adapt to this trend, that they would face a labor shortage within the three or four decades i t would take them to lead in these forthcoming knowledge industries so that they must automate and develop robots, and that to capture world markets by reversing the quality image of their products, they must develop a highly educated and motivated workforce. A few organizations in the United States are beginning to realize that competitiveness and survival can only be attained with a planning horizon of decades rather than years. That will require long-range strategic planning on the part of U S . firms as well, with appropriate intelligence capabilities to support it. Fortunately, a scholarly literature in this area has begun to emerge (Porter, 1980, 1985; Rothschild, 1984; Ansoff, 1979).
1.4
An Effectiveness Condition
Assuming that MINTSs will be needed because the preconditions for such needs and for their technological realization are emerging, it is important to know conditions under which a MINTS is effective. I argue in Section 4.3, that
234
MANFRED KOCHEN
the users of a MINTS must have sufficient competence if it is to be effective. If these levels are too low, the system will amplify their errors and decrease effectiveness below what it was in the organization before a MINTS was used. If these levels are high enough the MINTS will amplify their users’ performance and greatly increase the organization’s effectiveness and efficiency. Effectiveness refers to the production of timely and high-quality intelligence that is likely to be used as well as to prove useful to a strategic planner. It is the value of the product. Efficiency refers to the value of the resources and effort that are expended in the production of output of a given value. Effectiveness should be as high as needed, and inefficiency as low as available resources can tolerate; if it is lower, unused resources can be reallocated to other productive activities. 1.5 Organization of the Chapter
The purpose of this chapter is to suggest the emergence of an important new field of inquiry in computer science, particularly in managing the impact of computer technologies on organizations. MINTS are already in demand and in operation, though without the required capacities. Methods of artificial intelligence, which are pervading all six aspects of technology, are likely to help meet these requirements. In addition to demand for MINTS by commercial and government organizations, availability of technology to meet the demand, the third factor necessary for making MINTS viable and important in practice (and as objects of scholarly inquiry) is the availability of managers responsible for strategic planning who use MINTS with sufficient com petence. To show the above, the following will be presented: (a) Clarification of the nature of intelligence in its three meanings for subsequent use in specifying what a MINTS is and does, and how well. (b) Description of a MINTS in terms of its requirements and uses, including examples. (c) Issues in the analysis, design and maintenance of MINTSs. (d) Issues in management and organization related to MINTSs. (e) Other issues, such as basic computer science aspects (why this is a major new development in computer science); ethics, competitiveness/ cooperation, new business perspectives, employment.
2.
On the Nature of Intelligence
Modern usage ascribes at least three meanings to “intelligence” of interest in this study. These are: intelligence as in business, military or political
intelligence (Wilensky, 1967; Montgomery and Weinberg, 1979); intelligence
MANAGEMENT INTELLIGENCE SYSTEMS
235
as in natural intelligence exhibited by people and animals (Woon, 1980; Fancher. 1985; Fischler, 1987; Cattell, 1987); intelligence as in artificial intelligence (Harmon and King, 1985; Winston and Prendergast, 1984; Silverman, 1987). Each meaning admits of four ways of analyzing intelligence: (1) as a kind of behavioral process or performance, ( 2 )as a set of capabilities, competencies or functionalities, (3) as a product, and (4) as a property. We argue that these reduce to just (1) and (2). Intelligence as a process is often observed by sampling intelligence products at various times. Intelligence analysts will furnish their clients from time to time with intelligence reports, briefings, gaming simulations and other products of the intelligence process. Intelligence as a product is the intelligence report. This resembles the scholarly paper in that i t is creative, adds to knowledge and contributes something new. In both, the scholar must carefully document and check his sources. He much check the validity of his claims, perhaps using several methods of justification. His presentation must be lucid and well organized. He generally brings years of intensive study in one or more specialties and in general fields to his work. He must use judgment in deciding when to stop research and to write. He has a chance to produce a work that is good enough or one that is a source of pride, a work of artistic and/or scientific distinction. But the intelligence paper differs basically from a scholarly paper in that the former must be useful to the executive who must base his decisions on the content of the report. Usefulness means timeliness-not later than needed for the strategic time at which a decision must be made, nor sooner. Usefulness also means that it can be quickly understood and used by a busy executive. Timeliness also refers to the age of information. The value of tactical intelligence-the kind used in military combat or in a business negotiationdepreciates at about lo"(, per day, at least; strategic intelligence in an active dispute ( e g , wartime) depreciates at l0nO per month, at least; strategic intelligence in peacetime depreciates at 20?,, per year; intelligence about semipermanent features, such as roads, etc., depreciates at 10% per year (Platt, 1957). Intelligence as a functionality is ;I potential that may be only partially realized. The greater the repertoire of viable strategies to choose from, combined with strategies for organization of this repertoire and selecting from it, the higher the level of intelligence. The various functional capabilities can be regarded as attributes or properties. It sufices to consider only two rather than four ways of analyzing intelligence: as a performance and as competence, similar to Chomsky's distinction. These two ways are complementary. Competence is necessary for performance. But competence cannot be observed except by observing samples of competence-limited and competence-driven performance. But observing performance samples cannot establish that general-purpose abilities and principles are at work.
236
MANFRED KOCHEN
As a behavioral process, all three meanings of intelligence have a common core: rapid and appropriate zooming (expansion or contraction) over levels of specialized knowledge and understanding. This is the primary thesis of this chapter, and the focus of this section. It will be justified by analyzing the three meanings. This explication fills an important need in the conceptual repertoire of computer science. It is also needed to show that business intelligence can be effectively and efficiently provided.
2.1
Intelligence as in Organizational Intelligence
An organization is seen to occupy a position in its environment, and that position is assigned a utility. People differ both in their perception of such positions and in how they value the positions. Scholars of corporate strategy attempt to objectify and render scientific their perceptions of a firm’s position, generally in the various markets in which the firm competes. That often takes the form of simplistic models in which a market is depicted as a linear space with two dimensions, such as market share and profitability (Fig. 1). To this might be added a family of indifference curves, the curve in Fig. 1 being one such indifference curve. Here firm A likes being in at any point on that curve as much as any other point on that curve. That is, firm A is indifferent between low market share and high profitability on the one hand and low profitability but high market share on the other hand. This is for a market dominated by two firms that share the total market between them. Of course the market could expand or change and new firms can enter; relative positions of existing players can change. But a firm also occupies a position in the use and development of technology, in various communities, or constituent groups such as its current
Profitability
t
FIG.I .
I I
I
I
I I
.3
.7
I
market share
Illustration of market positions and an indifference curve for one firm
MANAGEMENT INTELLIGENCE SYSTEMS
237
and prospective employees, customers, suppliers, owners, regulators, and affected non-participants. It also occupies a position in financial markets, such as its standing on various stock exchanges, and this may or may not be related to its position in its markets or its value. It also occupies a position in the technological environment that may enable it to control the supply, demand, or shaping of technological advances, apart from its use of technology as a strategic weapon. 2.1 . l . Organizational Intelligence as a Process
Organizational intelligence is not solely the concern of the intelligence community. There is intelligence implicit in the internal mental maps of decision-makers and in their use of these maps to relate their assessment of the organization’s current positions to the positions they intend the organization to move into. Decision-makers base their decisions on imperfect internal maps and on local information. An intelligence decision-maker learns from the consequences of these actions by improving his internal map, which in turn enables him to obtain and assimilate better as well as more information and knowledge. In doing so, he uses a MINTS. The more intelligent organization will appear to the observer to be clear about positions its values, to learn quickly and to move deliberately toward these. The less intelligent organization will appear to respond to local conditions rather than plan, to be unclear about the positions it values, to move more randomly. This is, of course, a rational view, but it is useful in defining intelligence. According to a less rational but more organic view, an organization can not only muddle through but can attain great success without being clear about positions it values or moving deliberately toward them. It has an “intuitive” understanding of these, recognizing them when it encounters them. I t may be lucky and appear wise afterwards. But we would call this at best understanding rather than intelligent. High levels of intelligence imply understanding. Useful rationality transcends the purely organic. Yet a higher form of intelligence stresses pursuit of interests rather than positions (Fisher and Ury, 1981). This involves the search for win-win situations, in which a firm can cooperate with its competitors, so that all gain something, even if not as much as one firm might gain if it did not cooperate. “Who should we pick a fight with in the industry and with what sequence of moves’?’’(Porter, 1980) is no longer the most intelligent question to pose. 2.1.2
Organizational Intelligence as a Capability or Competence
What properties of an organization can an observer note that enable him to infer that it will behave intelligently? The more intelligent organization will
238
MANFRED KOCHEN
have in key positions managers with richer internal models that reflect goals, interests, assumptions, capabilities, and response options, and who constantly improve and use these models. They are generalists who can specialize (or call upon specialists) appropriately as needed. They have the “profound knowledge” (E. Deming’s term) to improve quality, to innovate, and to manage technology. The analysis of intelligence as a competence resembles a selectionistic, Darwinian, approach rather than an instructional one (Edelman and Mountcastle, 1978). New patterns are not generated but discovered among a pre-existing repertoire of all useful possibilities and selected from them. Intelligent organizations, then, are those that survive competitive contests in an environment that requires accurate knowledge about current, and values future, interests and positions and about how to get there. Their managers are selected from a pre-existing population of competent ones, and these continually improve their models. According to this view, MINTSs serving surviving organizations are the products of a process of natural selection.
2.2 Natural Intelligence In Section 1.2 a distinction between data, information and knowledge was introduced. A fair test of a person’s knowledge is a set of questions that he should answer by producing or filling in the forms that we assumed to represent knowledge. This can be explicit or implicit when the respondent replies with “true” or “false” to given statements plus instructions to do so. Thus, multiple-choice, fill-in and essay examinations all test for knowledge.
2.2.1 Distinctions between Information, Knowledge and Understanding Information adds value to a datum in that it specifies in which blanks and forms the datum is to be inserted. Thus, tabulated data is information if the tables are properly annotated. Knowledge adds value to information in that it provides a complete and integrated form. Thus, the datum, “0°C” becomes information when inserted into the blank in the form “The melting point of ice under standard conditions is -,” and the completed form becomes knowledge. This definition information is consistent with the one offered by Yovits (Yovits and Ernst, 1969; Yovits et al., 1977; Yovits and Foulk, 1985; Yovits rf ul., 1987). A person (or animal or machine) with mastery of the content of a gazetteer, part of a telephone book or all of human skeletal anatomy has a vast amount of knowledge. But he may have little understanding. That is, he may be unable
MANAGEMENT INTELLIGENCE SYSTEMS
239
to readily comprehend or integrate a new item of knowledge into his knowledge repertoire other than adjoining it to a list or just concatenating it. To comprehend a new knowledge item is to discover or construct a blank in the structured assembly of knowledge attained so far, into which the new item fits. Awareness of such blanks or gaps, for which new knowledge items are sought, manifests itself in question-asking. Thus, a person’s understanding of a domain could be assessed by the questions he asks. A unit of understanding could be modeled as a structured set of knowledge units, somewhat analogously to a knowledge form, by u(x), where x denotes the location of a possible gap in the structure. A person (or machine) can broadly comprehend a domain of knowledge and he can have deep understanding of several specialized subdomains, without being able to switch from the perspective of a generalist to that of an appropriate specialist. We propose to conceptualize natural intelligence as that ability. It does not mean that an intelligent person must command one or more specialties. But he must recognize when such specialized knowledge is needed and how to obtain it and bring it together with other items from other specialties. If u denotes a domain that is understood broadly from a generalist perspective (e.g.. the principle of relativity, that the laws of physics should not change with a transformation of coordinates), and u 1 denotes a specialized subdomain (e.g., the tensor calculus), then intelligence depends on recognizing relations R(u, u , ; s), R ’ ( u , , u ; s) between u, u 1 and a state of the world s in which it is appropriate to switch from u to u , and from u t to u, respectively. Actually, intelligence would need to be specified by a set of such relations for each possible pair or n-tuples of domains, which are either immediately accessible or rapidly computed. Such a set could comprise a unit of intelligence. Intelligence as intellectual zooming (Kochen, 1972) is related to, but goes far deeper than, what has been called “navigation of a conceptual hierarchy” in the use of Menus, such as in Smalltalk, the Macintosh or LISP machines. The latter requires intelligence of the user. In general, intelligence requires knowledge as well as understanding. 2.2.2
Natural Intelligence as a ProcesslPerformance
The behavior of an organism indicative of such switching would enable an observer to call the organism intelligent. Polya (1954) has shown how finding analogies, generalizing and specializing serve as powerful principles for making mathematical discoveries, and these have been further explicated in a pilot program (Kochen and Resnick, 1987) for making discoveries in plane geometry, such as the Pythagorean theorem. There are no simple discovery algorithms despite programs claimed to simulate processes of scientific discovery using heuristics (Grabiner, 1986). Formulating and solving
240
MANFRED KOCHEN
problems-not only mathematical ones- that were previously unencountered, inductive and deducting reasoning, comprehending new patterns and new linguistic expressions and improving in the performance of all these tasks are indicators of intelligence. They all require the above-mentioned switching. But there is more to intelligence than problem-solving. For example, given the problem of adding 1 + 2 + 3 + ... + 100, intelligent behavior consists of trying to add different pairs, such as 1 100,2 99, etc., in the hope of finding a pattern. Less intelligent behavior is to add 1 2 to get 3, then 3 + 3, then 6 + 4, etc. More intelligent behavior, motivated by the idea that there might be a less tedious way to find this sum, is to try adding the numbers in a different order, e.g., the first and last. After noting that both 1 + 100 and 2 + 99 sum to 101, comes an inductive leap that all 50 such pairs add to 101, so that the sum is 50 x 101. Moreover, this method of pairing (or structuring the procedure) might be transferred to an entire domain of analogous problems when any of these arise. Another problem that can discriminate between more or less intelligent behavior is that of determining the number of pairs of winning tennis players that must play in matches (assuming no ties) until a final winner from a pool of 1024 contestants emerges. A less intelligent way is to divide the pool into half, with 512 matches played in the first round; then divide the pool of 512 winners in half, with 128 matches played in the second round, etc. Then add up all the matches: 512 128 64 ... = 1023. A more intelligent way is to notice that every player except the final winner must lose in exactly one match, where he is eliminated, since only winners play each other. Thus, the number of matches is equal to the number of players less 1, the final winner, or 1023. To determine whether an organism is intelligent according to this view, then, is to observe how it solves problems, how it reasons, how it learns. The observer will look for a cognitive strategy, for how the organism uses its cognitive processes to plan, to encode stimuli, to switch between the general and the specific, and to analogize. (Sternberg, 1986, 1988; Sternberg and Wagner, 1986). The criteria are the use of level-switching strategies just described and illustrated with the tennis tournament problem, even if the problem is solved faster in a few special cases without such strategies.
+
+
+ +
2.3
2.3.1
+ +
Intelligence as Computation (Al)
Al as a Process
How can an observer of two artifacts discriminate between one that behaves more intelligently than the other? This is a variant of the Turing imitation game, in which an observer is required to distinguish between a machine and a
MANAGEMENT INTELLIGENCE SYSTEMS
24 1
person who tries his best to confuse the observer. Assume that both artifacts have the same hardware, the same knowledge bases, and are subjected to the same operating procedures (e.g., same instructions) under identical organizational conditions. Only some of the architectures, the computational algorithms, and their implementations as programs differ. To be specific, suppose both have programs for playing chess and checkers. But one has a program that knows about board games in general and can decide when it should play chess or checkers on the basis of its opponent’s behavior. The other does not. The first automation is more intelligent, in our view, though this is a very simple task. It becomes more interesting if the more intelligent machine must learn the rules of a variety of games by playing them and inferring them from the opponent’s behavior as well as from consequences of its own rule violations, and discovering that there are such games as chess and checkers and forming its own programs for playing the games, possibly transferring ideas from one board game to the other. Again, the observer looks for appropriate switching between generalist and specialist perspectives. The organization theory literature distinguishes (Scott, 1985; El Sawy, 1985) three ways to identify effectiveness that is applicable to identifying intelligence. The first way is via structure: the capacity to perform well and the size of the knowledge base. Size may not, however, be important. The program CHAOS, which had attained the world championship in computer chess was also the smallest; its rules were carefully chosen. It embodies intelligence transferred from its designers. But i t is not intelligent. The second way is via process (e.g., ability to retrieve knowledge). The third way is via outcome (e.g., are actions intelligent).
2.3.2
A1 as Functionality
or Competence
Here the observer does not evaluate the machine on the basis of its inputs and outputs but by analyzing the documentation that describes its functionality. He asks whether a machine can, not whether it does, switch from a generalist perspective to appropriately specialized ones, and vice versa, in situations requiring that. Hardware components organized into massively parallel architectures, such as neural nets, and supporting software with non-deterministic algorithms may give rise to different functions and to newly emerging properties. Selforganization and evolutionary programming (Kochen, 1988) may give the machine new, unpredictable functionalities exhibiting greater adaptivity and intelligence. Our primary concern here is with AI in strategic management (Holloway, 1983) and particularly in the MlNTSs that support strategic planning.
242
MANFRED KOCHEN
3. What is a MINTS: Requirements and Uses In this section, a MINTS will be characterized in terms of its functional requirements from a strategy planner’s point of view. Competent use of a MINTS is intended to amplify his performance. In what follows, a business firm is used as an example of an organization because it has much in common with other organizations requiring intelligence, because of the experience the author gained in teaching a course on this topic in a business school, and because of the growing importance of MINTS for business firms. This chapter is based on a new course, with the same name as this paper’s title, offered by the School of Business Administration at the University of Michigan in the Winter of 1987. I t was well received by 45 students, mostly enrolled for the MBA, but also several Ph.D. students in Industrial Engineering, Information Science and other fields. All students, working in teams, produced actual intelligence reports. Some were used by local firms and enterpreneurs in several cases. Each was supplemented by an analysis of how the use of A1 technologies does or would improve the process of producing such reports. Several teams produced operational expert systems. One of these, for example, generated, in good English and online, an analysis and recommendations for a firm’s strategic positioning, given financial data in LOTUS 1-2-3 and based on ratios such as fixed assets to net worth, return on equity, etc. It was demonstrated as a commercial products at the Avignon 1987 Exposition under the name of LEADER. It resembled a prior system using fuzzy sets theory, FAULT (Whalen, et al., 1982). The first general requirement for a MINTS is to help the firm it serves to clarify its map or image of the firm’s environment, to clarify the concept of “position” and interest in that environment, and to discriminate between positions and interests it values highly and ones it values negatively. The MINTS is to help managers with strategic planning and professional strategists by offering research assistance that increases their productivity and performance. As Porter indicated, competitive intelligence begins with the activities of collectors, such as the firm’s sales force, its engineering staff, its suppliers, advertising agencies, security analysts, etc., and scanning of public sources, such as articles, speeches by competitors’ management and several special publications such as The Data Informer by Information U S A , the Corporate 1000 by the Washington Monitor, the Infiwmation Sourcehook f o r Marketers and Strategic Planners by Chilton Book Company, the Handbook ($Strategic Planning by Wiley, the Informution Weapon by W. Synnott, How to Check Out Your Competition by J. W. Kelly, How to Find Out About Companies by Washington Researchers, the Thomas Register of American Manufacturers, etc. Fuld (1985) introduced the “intelligence triangle,” in which the base consists of the technique and the foundation. The middle part comprises the
MANAGEMENT INTELLIGENCE SYSTEMS
243
basic sources, and at the apex are creative sources. Basic sources include investment manuals, industry directories, government documents, newspapers in the competitors' locations, current industrial reports, financial reports, patent or court records, SEC filings, credit services, state corporate filings, and state industry directories. Creative sources include classified ads, environmental impact statements, trade shows, yellow pages and city directories, visual sightings, such as the number of cars in a competitor's parking lot or expansion of the parking lot, and interviews with people who meet the com petit or. Data from such sources must be compiled, as by clipping services or regular situation reports on competitors. It must be organized, by maintaining files on competitors, for example. Jt must be digested and summarized. It must be communicated to strategists, by means of competitor newsletters or briefings on competitors during planning sessions, for example. But there is more to intelligence than competitor intelligence, as set forth in what follows. 3.1
Market Intelligence
Drucker ( 1 985) proposed the following very simple, yet profound and farreaching assertion, that the purpose of a firm is to create customers. This means creating goods and services of value to certain consumers. It means marketing and innovation. It means enhancing the value of the firm, its suppliers, and the community of which i t is part. To market is to target potential customers, discover their needs, arouse these needs and meet them better than they are being met and at prices they are willing to pay. The added value of a marketing mix lies in a bundle of satisfactions. A MINTS is required to support the following marketing functions.
3.1.1
Market Analysis
Should the firm push a new product or service that its champion feels will certainly meet a latent need? Or should it analyze expressed or established needs for which markets already exist'? The MINTS is required to furnish intelligence on which to base this decision. How large is the market in which a firm is already operating? How rapidly is it growing? What determines its size and growth? What are the market shares of competitors and those offering substitute products and services? How have they been growing, and what factors govern their growth? Answering these questions and those to follow is the primary responsibility of market researchers, not of the MINTS. The MINTS is required to support these researchers, to act as an intelligent research assistant. If market researchers find it difficult to obtain reliable data, the MINTS should suggest
244
MANFRED KOCHEN
additional sources, both for verification and for enlarging the supply of data as described in Section 4. They might refer the researchers to external services, such as Washington Researchers (1986, 1987a, 1987b) or to online databases such as that of the Conference Board, a computerized collection of over 800 economic time series. If no source can supply data that reliably answers some questions, the MINTS is required to find reliable data from which answers to the question can be inferred. (This is similar to “backward chaining” in AI.) This applies particularly to information about the market strengths and weaknesses, intentions and commitments of competitors in markets of vital importance. Intelligence is particularly important about competitors in global markets. (Jaffe, 1975; Montgomery and Weinberg, 1979) To repeat, the purpose of a MINTS is to support the professional planners as would an automated research assistant, permitting them to be more productive and at lower costs. 3.1.2
Consumer Analysis
Who uses the firm’s products and services, and how? Who makes the purchasing decisions? What are the customers’ revealed preferences? For the sales force, such questions are usually answered by networking. A good salesman has acquaintances whose acquaintances may be good prospects, and he actively seeks out these acquaintance chains and uses them (Kochen, 1989). The MINTS is required to support this networking process by inferring likely prospects from a knowledge base about a large population by storing a starter’s set of acquaintances, and the sets of acquaintances of all these onceremoved acquaintances, etc., and providing for easy access to all these named persons. Advertisers use the answers to such questions for determining what audiences to try to reach, by what media, with what messages and how to present them. The MINTS is required to help them by synthesizing answers to the questions and by directing them to models likely to help them. Most important, it is required to report changing patterns in demand, in fashion and in taste, with long lead times. 3.1.3
Trade Analysis
What wholesalers and retailers are in place between the firm and the customers? Which ones have and which are likely to leave and which ones to enter the value network? More broadly, what are all the participants in the network who add (or subtract) value in transforming factors of production and materials or goods at various stages of finishing into finished products and services? This includes warehouses, suppliers, transporters, and, of course, government at several levels. How is the structure of the network changing? It
MANAGEMENT INTELLIGENCE SYSTEMS
245
includes shifting patterns of flows and interactions as well as changes in the network nodes. Here the MINTS may be required to analyze a cluster of firms as an interacting system. 3.1.4 Economic Analysis This includes requirements to estimate the fixed and variable costs, breakeven points of various marketing programs as well as their expected utilities. Some of the variables characterizing a market mix are: the number of different product lines that satisfy the same need-type; the number of different producttypes in each line (differing in color, size, or shape, for example); and the degree of similarity between lines in end use, in production technology, and in distribution channels. The requirement for the MINTS is to bring together knowledge from economics in general with details from highly specialized subfields specific to the product and its technology. All this knowledge is required to be of high quality and transformed into relevant, usable form if i t is not in that form when retrieved. 3.1.5 Market Repositioning
Two kinds of actions are generally considered: adding features to products and services for increased differentiation; and reducing costs and changing prices. (Beatty and Ives, 1986). Here the MINTS is required to provide expected consequences of such changes and of the likely moves of competitors. The MINTS may also be required to generate selected rumors or leaks (e.g., to the press. to financial analysts) in advance of or in place of changes. In general, the MINTS is required to perform two activities at the same time. (a) To monitor selected indicators and alert management if their values fall outside a specified “normal” region. This resembles exception reporting. (b) To search its accumulated knowledge base for emerging and noteworthy patterns that have been seen before very rarely or not at all. Activity (a) may be conducted periodically, continually, at pre-specified times or on an ad hoc basis. Only reasonably current data are acquired, and from several sources, to increase reliability by cross-checking. Leading, concurrent, and lagging indicators are preferred, in that order. Thc MINTS is required to evaluate the costs and utilities of these indicators. Activity (b) may also be conducted at various times, or as a constant background activity when the facilities are available. What makes a pattern
246
MANFRED KOCHEN
noteworthy is its ability to trigger understanding: to uncover a blank in a knowledge structure, such as a contradiction, or a gap, and to generate new hypotheses. 3.2
Technology Intelligence
A firm’s position depends in part on its production function. This, in turn, depends on the technology used. Creating customers means marketing and innovation, and innovation means the processes of production, from product development to design to manufacturing to delivery to servicing, and continuing improvements in those processes. A firm can improve its position not only by improving its position in its existing markets and by entering new but existing markets, but also by creating new markets and by changing its production or distribution processes. Innovation can lead to new markets and to changes in process (Diebold, 1984; Tushman and Moore, 1982). A technology is often used by several firms. It may characterize an entire industry. Consider, as an example, the information industry. Two dimensions have been proposed (Harvard University, 1980) to characterize simple technological properties of various goods and services in that industry. One dimension varies from pure form and no substance, such as blank paper or a courier service, to predominantly substance and little form, such as books or professional services. The other dimension ranges from what is primarily a product, such as a file cabinet or a film, to what is primarily a service, such as the U.S. mail or a financial service. A computer is about equally a product and a service and also about equally form and substance. A computer manufacturer may regard itself in the middle of this map of the industry, and may seek to shift toward the service end in the face of competitors entering at the product end. Or it may try to innovate in the product-substance region, in which there is a scarcity of goods and services. 3.2.1
Technology Analysis
A n important role of technology is in the production and distribution process. Here, three of the general technological properties of main contemporary interest are degree of flexibility, labor-amplifying potentials and quality. By flexibility we mean the ease with which a productive unit can be switched from the performance of one function to another (Kochen and Deutsch, 1973). In the context of a flexible manufacturing cell or a CNC machine this could be measured by the setup speed, the inverse of the time it takes to change the settings of a lathe, for example. By labor-amplifying potential, we mean the ratio of person-hours it takes to do a given task with the technology to the number of person-hours it takes to d o the same task
MANAGEMENT INTELLIGENCE SYSTEMS
247
without it. Quality has perceptual and objective aspects. Customers ultimately choose according to perceived quality. But their perceptions are influenced by ratings, such as those offered in Consumer Reporrs, reports from other customers, advertisements, etc. A firm may position itself according to where on these three dimensions it fits. A n ideal position may be one in which its productive system turns out high-quality products and services, automatically with the least human labor and with the flexibility of a job-shop that can accommodate individual customers’ requirements very quickly and reliably. A technology found increasingly important for high-quality competitive production is sociotechnology. New or better ways to organize and motive people have been found to be more critical than the use of Flexible Manufacturing Systems (FMS), for example. The requirements for the MINTS are to anticipate technological advances likely to affect these and other variables. This includes the assessment of existing technologies for applicability to the firm. It means sifting through masses of articles and reports for a few that could indicate promising technological advances. Whether these are likely to be commercialized depends on many other factors, and reliable documents pertaining to all these must be brought together for a valid assessment to be made. Predicting the success of attempts at commercialization is even harder. It is less predictable and controllable than breakthroughs in science. In any case, the MINTS is required to screen, evaluate, interpret and synthesize all these items into a coherent report. In the near future this will be done exclusively by human intelligence scholars, but these may count on increasing degrees of support from technology. 3.2.2
Technology Management
The use of advanced technology in production systems does not guarantee faster and more reliable delivery, better quality, lower inventories, higher throughput, greater flexibility and lower cost. The purpose of FMSs is to combine the ability of a job shop to custom-tailor products and services to clients and the ability of an assembly line to mass produce standard commodities. The former takes advantage of economies of scope, stressing the production of a variety of goods and services, with the unit cost of producing the last variant decreasing with variety. The latter takes advantage of economies of scale, stressing large batch runs, with the unit cost of producing the last item in a batch run decreasing with batch size. If manufacturers use FMSs primarily to exploit economies of scale while their competitors use them to exploit economies of scope (Jaikumar, 1986) they are mismanaging the technology and may suffer losses in their relative position and interests. The MINTS of a firm is required to assess the quality of technology
248
MANFRED KOCHEN
management, both by competitors and by their own firm. They should also forecast the likelihood of various programs in these firms, including management recruiting strategies, for improving that quality. 3.2.3
Intelligent Production
A most important requirement for a MINTS is that it provide direct support for the “smartness” of the workforce, both management and labor. The task is to advise management about intelligent artifacts, used by sufficiently intelligent operators, for scheduling production, for quality process/quality control, for inventory control, for design/layout, for diagnosis of failures, etc. A variety of expert systems exist and many more are being developed to support these functions. The MINTS should incorporate these into its model base and information about them into its knowledge/database.
3.2.4
Intelligence in Products
Increasingly, products such as cars and appliances have built-in computer/ communication systems that are transparent to their users. Also increasingly, these systems are more intelligent. Cars are likely to be guided by intelligent chips distributed on highways that communicate with onboard computers to let the system know where the car is and the options available to the driver. Intelligence in an appliance captures the user’s intention by eliciting from him instructions in simple, general terms, and it carries them out without requiring him to provide detailed commands or dial settings. The requirement for a MINTS is to make available to the firm state-of-the-art technology options for the introduction of intelligence into products as well as into services, to be discussed next. 3.2.5
Intelligence in Services
Since industry discovered A1 in the 1970s, its impact has been mainly in the services. Across the world the labor force is shifting into services. The first industrial revolution shifted production from material-intensive activities, in which human and animal labor was the main source of energy and intelligence, to energy-intensive activities, in which human labor was the primarily source of intelligence. Machines augmented energy-expending labor. The current industrial revolution is shifting production from energyintensive activities to intelligence-intensive activities. Information machines are augmenting intelligent labor, and displacing some. Experimental expert systems for diagnosis in medicine, for exploration in geology, in research and in engineering are showing considerable potential for increasing productivity and quality of services in these areas.
MANAGEMENT INTELLIGENCE SYSTEMS
249
Technology Assessment and Anticipation have become major fields of study, as has what is called Technology Transfer. The first is performed by the Office of Technology Assessment of the U S . Congress and by those it supports through research grants. It, like the National Academy of Sciences, Engineering and Medicine and other agencies provide technology intelligence to the U S . Congress. The Executive Branch of the U.S. Federal government has its own MINTS, as do state and some local governments. Computer networks have been used to exchange certain kinds of information. For example, the experience of one township in the use of materials other than salt to control icy road conditions may be shared with other townships with similar problems and conditions. Of course, every major firm does its own technology assessment as well. Technology forecasting, which might better be called anticipation, has a large literature. If it were the case that technology advances are usually preceded by scientific discoveries, then there should be a major science intelligence effort. Scientific breakthroughs are quite unpredictable, but no more so than technological breakthroughs. Yet the publication and emphasis of major discoveries-for example, how the body's own proteins may be the source of the most effective drugs ever, or how metals could be replaced with wood or superplastics (Science 85, 1985)-could stimulate innovation. Imaginative and well-presented projections (e.g., Ishikawa, 1986)-even good science fiction-can help shape the future and serve to enhance self-fulfillment of the vision. 3.3
Financial Intelligence
Creditors need intelligence about potential debtors for use in accurately assessing risk. Individuals in search of mismanaged, undervalued firms that they might take over need intelligence about potential prey. And weak firms on the lookout for potential predators that could threaten hostile take-overs need intelligence. Forecasting and analyzing mergers and acquisitions is a prime function of industrial analysts who provide their reports and services for a fee. Such analysts are of course MINTS. (For examples, see Section 3.6.) It is the requirement of a firm's MINTS to stay abreast of such services and their output, to synthesize tthem into a report tailored for their client firm. Intelligence embodies in services and products of (and to) the financial sector, such as "smart cards" belong in Sections 3.2.5, and 3.2.4. 3.4
Organizational Intelligence
This includes intelligence about people, their qualifications, intentions, capabilities, commitments, associations, moves, and relations to one another and to institutions. In certain industries, a firm's critical success factor is to
250
MANFRED KOCHEN
attract and keep key technical personnel. Intelligence about the likelihood of such persons leaving their firms, even about who they are, is of great value. Information about who reports to whom, who is on a fast track and who is not, is of value not only to salesmen in choosing whom to contact but for assessing the strengths of a competitor. A great deal about a company’s intentions, capabilities and commitments can be inferred from its organizational structure. It is required of a MINTS not only to provide complete, accurate and timely characterizations and assessments of its own and its competitors’ organization, but to determine what can be inferred from such knowledge (Dutta and King, 1983; Levite, 1987; Ljungberg, 1983). 3.5
Environmental Intelligence
In its broad sense, a firm’s environment includes the market, technology, the financial world, and organizational aspects. It even includes the firm’s internal environment. It also includes environments in their narrower meanings: the natural environment, such as the terrain where plants are located, climate, amenities, etc.: the political environment, including regulations affecting the firm from various levels of government, the stability of governments, public policies and program affecting the firm; the economic climate in which the firm operates; the social environment; and the ecological environment. The MINTS of a firm is required to provide intelligence about each of these aspects.
3.6
Requirements for Intelligence in General
It is important to stress that intelligence differs from information, knowledge and understanding in that it brings together into a coherent, interpreted whole carefully screened and evaluated elements of what is known and understood in several specialized domains, and in that it brings it to bear on decision-making, policy-making and planning. The intelligence may be of strategic or of tactical importance, but more intelligence is required for effective higher-level, long-range strategic planning. Intelligence is generally understood to be needed in situations of competition and conflict. There, it is vital for each firm to have an accurate assessment of its adversaries’ (a) intentions, especially as they affect the firm; (b)capabilities for carrying out these or other intentions; (b) commitments to a course of action; (d) progress (successes, failures, ability to adapt and learn from experience) in pursuing the chosen course of action. Such assessments are at least as important in non-conflict situations as well. A firm should at all times be looking out first for opportunities, including the opportunities for cooperating with other firms, for discovering win-win situations. The MINTS
MANAGEMENT INTELLIGENCE SYSTEMS
25 1
is required to help in this task. Secondarily, a firm should be vigilant in detecting threats or traps, and in this, too, it depends on its MINTS. Capability refers to the quantity and quality of resources of all kinds, their state of readiness and availability, morale and the ability of management to mobilize resources rapidly and sometimes secretly so that they can be brought to bear at a time and place chosen by the firm with the desired effect (e.g., surprise, victory in the case of a contest, etc.) To assess relative capability, of both adversaries and its own firm, a MINTS must combine estimates about the balance of strengths and weaknesses on all relevant factors into a composite judgment of the probabilities and risks of various actions and their consequences. An example of contemporary intelligence of a general kind is the kind of analysis of trends and issues in the computer industry that securities analysts and similar organizations (e.g., Gartner, Bernstein Research, and the Conference Board) frequently produce. It is argued by Marc G. Shulman of Salomon Brothers, for example, that IBM’s introduction of its PC in August 1981-which they thought would result in 250,000 units sold by the end of 1986, but resulted in sales of three million-eventually strengthened the competitive position of DEC at the expense of IBM because it led to the proliferation and legitimization of end-user computing rather than computing by data-processing professionals. The lack of integrative products and services to meet the demand for peer-to-peer networks generated by end-user computing hurt IBM more than it did DEC. DEC had positioned itself in distributed processing, which became end-user computing. DEC can now price its products on the basis of their value rather than on the basis of their cost. Its position depends on the company’s ability to gain widespread acceptance of VAX networks plus the ability to resist the spread of UNIX. The analysis then goes on to examine IBM’s two-pronged strategy (OS2 in relation to Systems Application Architecture (SAA) and Personal System 2 in a key role in SAA) and what DEC needs to do by mid 1988 to counter it: volume shipments of VAX8800-based processors; VAX-Compatible LOTUS 1-2-3 packages; application software not available on IBM PCs; commoditization at physical and logical levels, with DECnet supported by workstation vendors (e.g., SUN microsystems), and supercomputer vendors. DEC must offer more differentiated products to replace what it will lose when its competitors attach their products to DEC networks, which, analysts claim, means that DEC must be the leader in the software of the 1990s-artificial intelligence and expert systems. Other analysts (e.g., Bernstein Research) envision a three-way industry-wide contest. This is an alternative to Shulman’s analysis. One party is a revitalized, market-driven IBM using SAA. Another is DEC, joined by Apple and Cray, using DEC net-based services. The third group includes AT&T, SUN Micro, Xerox, NAS, Stratus, Amdahl with UNIX as a standard and RISC (reduced instruction set based on most frequently used instructions) microprocessors.
252
MANFRED KOCHEN
This group is attacking DEC in its bid to be the technology leader, while DEC‘s key target is IBM. There are, of course, other perspectives that take account of significant players, such as the Japanese, the Europeans, Hewlett-Packard, Wang, CPQ, Tandem, Unisys and NCR. Some of these may shift toward IBM as the safe alternative. Some may shift toward the third group epitomized by SUN Micro and the open systems model. There is already an organization in Europe called X/Open, comprising 15 companies that hope to work together to position themselves to provide functionality that is distinctly different from IBM. But they seem to have more diverse intentions in their market strategy than in their political rhetoric. Some may form fourth or fifth new foci to compete with the other three. The Europeans, if they can be grouped, seem to be in the SUN Micro camp already. These are but illustrations of what analysts tell investors. It is not clear that investors, vendors, customers or other decision makers (e.g., in government or independents) will be persuaded by these reports, and if they are, whether their decisions based on such intelligence will be sound. In other words, either these intelligence analyses are not sufficiently sound from the point of view of scientific objectivity and scholarship, or they offer the best that can be done and this is not enough to characterize the risk, ambiguity, and uncertainty in an acceptable way. Because the issues are very complex, enveloped by thick clouds of confusion, and may defy clarification in terms of a few simple variables such as the demand for networking, customer advantage, delivered costs, enhancement of variety, vendor dependence, cost of integration, demand for compatibility, etc., it may not be possible to reduce or at least clarify risks to levels that all users consider acceptable. Are such analysts MINTS? Shulman serves as an intelligence officer-he is part of the MINTS-for Salomon Brothers, and to the extent that the firm offers his products to others, he is a MINTS serving a larger clientele. A firm such as DEC has its own MINTS to support its top management, and it will use the output of other MINTS, such as the above, in its own analyses. A final point that needs emphasis is that a MINTS should not be required to (a) leave no stone untuned; (b) never make a mistake; or (c) never miss an “unanticipated event,” even a fatal, “bolt out of the blue.” Even if it were theoretically possible to do (b) and (c), it would be far too costly, and the surveillance activity may cause more damage, by possible invasions of privacy, than it prevents. A MINTS should be designed for use as a tool to beat statistical odds. Intelligence failures that are statistically very rare or improbable and hence very, very hard to anticipate are more excusable than failures that could quite readily have been anticipated. The latter failures are by far the most common and the most damaging (e.g., the surprise attack on Pearl Harbor or the 1941 invasion of the USSR), and it is toward preventing them that top priority in the design of MINTS should go. (For more examples, see Section 4.3.)
MANAGEMENT INTELLIGENCE SYSTEMS
4.
253
Analysis, Design and Maintenance of MlNTSs
Developing a MINTS employs all the procedures used in developing any computerized information system, such as requirements analysis, rapid prototyping, feasibility studies and implementation. Because of the opportunity to introduce A1 into MINTS and into the MINTS development process, there arc variants in the procedures as well as new procedures. After describing what a MINTS with A1 features is like, this section will indicate these changes in the conventional methods of systems analysis and design as well as the new procedures. 4.1
Architecture of a MINTS
Figure 2 sketches how a MINTS might look to a system developer and to a user, Raw data streams into the system in response to environmental scanning, as shown at the top of the figure. Two basic types of analysis are shown in box 1. One is monitoring indicators. The other is searching for novel patterns. Only the first is applied to the incoming data stream. The second requires comparing and correlating incoming data with what has been accumulated, which is no longer raw data. It is neither useful nor feasible to keep all incoming data. Hence, it must be screened on the basis of estimated reliability, utility, precision, clarity and novelty. The screening function can be partially automated with the help of an expert system if criteria for data evaluation can be specified and the judgments of experts can be expressed in the form of programmable algorithms. Currently the functions in box 1 are generally performed by persons, whose capabilities for dealing with vast and diverse data and knowledge streams are limited. Monitoring well-defined indicators is easily automated, but the search for novel or rarely encountered patterns is a challenge. It is one that resembles the search for patterns of tracks in a bubble chamber photograph corresponding to experimentally induced nuclear events that have not been observed before. The idea is to screen out patterns that fail to correspond to any known patterns or that are unusual variations of known patterns. This physics problems is far easier because all the patterns to be scanned already exist on the photo. Here, the population consists of all possible patterns that can be formed by several data time series that might be correlated. Only sampling in an a priori restricted universe of possible patterns makes this possible, and at the risk of missing interesting patterns. The trained human mind and eye may be very good at noticing the unusual or unfamiliar, and the computer, in symbiotic partnership with a person, could display data in various ways- using methods of Exploratory Data Analysis (Mosteller and Tukey, 1977; Tukey, 1977)-to enhance this ability. Moreover, much of the data in intelligence analysis is qualitative, in the form of unformatted text and graphics, and this
ENVIRONMENT
Solicited Data
Unsolicited Data
I I
I I
I
1
I
1
I I
Gross Screening. Transform Information into Knowledge I
I
I
I I
I
1
I
1
I
I. Compare with norms of indicators; search for noteworthy patterns
Screen for Archi,ving
I
I
I
,
*
I
I
I
2. Generate Hypotheses (use insight. abduction, exploratory data analysis)
*,+
c
Archival
Report;
client
1
\
Reset Scanning Parameters
\
\
5. Find Answers (use conlirniatory analysis; test and eliminate hypothesis;
\\
*
I Generate, synthesiw intelligence rpt. ltransform ink) intelligence)
I
.I
I
Seek and evaluate new sources (use an expert system)
. ,-'.
__*
representation (essence or intelligence)
/ "/on\ V
Flow of control FIG. 2.
Architecture of a MINTS
1
MANAGEMENT INTELLIGENCE SYSTEMS
255
would have to be translated into a canonical language, perhaps one resembling the form of influence diagrams (Howard and Matheson, 1984; Bodily, 1985). The most important function in Fig. 2 is in box 2. Here the intelligence analysis process really begins. I t starts in response to an alert or to a stimulus that motivates hypothesis or idea formation. This motivation may stem from external data. But it may also stem from the reflections or meditations of a human analyst. The knowledge archives are partly in his own long-term memory; they extend it. I t may be an insight on his part that actually triggers the alert. For example, suppose that data about the growth of a Japanese automobile component supplier indicates that growth is very high (knowledge), but it is not in the critical region of some indicator that would cause an alert. Suppose that there is further knowledge that this supplier has an outlet in the United States in a state where cars are produced. That, in itself, is also not remarkable. Suppose, further, that the supplier uses advanced technology very effectively, resultingin deskilling of most jobs, while high-skilled jobs such as design are in Japan. That, too, does not by itself justify an alert to a U S . firm. If all three statements are studied together, they may give rise to the suspicion (idea, hypothesis) that the Japanese firm intends to create high-skilled jobs in Japan with a consequence, perhaps unintended, of further deskilling jobs in the United States. At this point, knowledge is transformed into understanding (box 3). Analysts realize that they need knowledge not available to them. They ask questions. Question-asking programs (e.g., SHRDLU) in well-defined domains of discourse (e.g., stacked blocks) have been constructed. I t may not be possible to construct a domain-independent algorithm that asks good questions. A physicist may have a general understanding of a large domain, such as science, and ask simple questions about it. He is likely to have a deeper understanding of a more specializing domain, such as physics, about which he can ask more profound questions. I n the same way, in box 2, some domain restriction may occur, and a specialized question-asking program is selected (box 4). The questions i t generates define the strategy of the investigation to be conducted by the intelligence analyst. The next phase, question-answering (box 5), draws upon information retrieval ( I R ) and artificial intelligence. The state of the. I R art is such that the questions need to be transformed into a query language, generally Boolean combinations of search terms, directed to one of 3000 online bibliographic databases. The latter consist of indexed references to documents that might contain the answer. Another role for A1 or an expert system is to advise the investigator about which database(s) to use. The questioner (currently a human investigator, in the future, perhaps his automated research assistant) is
256
MANFRED KOCHEN
presented first with the number of articles that are retrieved from a database which are indexed with the specified terms, so that he can revise his searchterms and combine them: he can also see a sample of titles that would be retrieved, better to guide him. He finally receives a printout of the titles of and bibliographic information about articles that match his search specification. He must then scan these for answers to his questions. Sometimes the questioner may pose his question in a query language that can automatically search a database or a knowledge base. For example, if he wishes to know for the past five years the number of Americans hired by that Japanese firm in its U.S. plant and the skill levels of those hired, there may be a database in a database management system that provides this data. In response to his general question, he should be informed about the existence of databases of possible use to him so that he (or a surrogate person or program) could, if he wished, formulate requests these database systems can process. Many database management systems (DBMS) are directly coupled with statistical analysis packages (SAP) so that confirmatory statistical analysis (hypothesis testing) can be done in a continuous process (e.g., by multitasking). Ideally, a DBMS and a SAP should also be integrated with simulation systems or modeling packages, such as IFPL (Interactive Financial Programming Language), and also with tutorial systems, so that the investigator can get online guidance about which methods and programs or languages to use when. If the investigator obtains no direct answer to his question, he turns to an artificial intelligence program (box 6) that searches its knowledge base for units of knowledge from which the answer to the question may be inferred. For very restricted domains of discourse, automatic question-answering algorithms for English-like questions have been developed (Kochen, 1969a, 1969b).(This could also be done at various other points in Fig. 2.) If that fails after a reasonable effort, the needed knowledge may be sought from an external source. If that fails, or if the inquiry strategy proves to be fruitless, the line of investigation is abandoned, and replaced by a new one that is expected to do better. This is done in box 7 by zooming back to the general domain and selecting a different set of specialized domains or some other representation shift procedure (Amarel, 1962).I contend that this is the essence of intelligence. If, as a result of a shift that leads to a more fruitful inquiry strategy on the research path, a reliable answer is produced (box 8), then understanding (expressed as the question in box 4) is transformed into intelligence. It should not be inferred that we consider an investigator highly intelligent only if he or his automated research assistant does this general-special switching very rapidly. If he mulls over the shift slowly and deliberately, he should not be excluded from the class of intelligent investigators. But if other factors are the same, the faster switcher is more intelligent.
MANAGEMENT INTELLIGENCE SYSTEMS
257
The remaining two processes (box 9 and lo), complete a learning loop. Learning by the MINTS is necessary if only to enable it to keep up with a changing environment. For it to improve it must learn faster than required by the changing environment. To meet the requirements stated in general terms in Section 3, a MINTS can be regarded as organized into subsystems. Each subsystems has its own knowledge base and specialized hypothesis generators, question askers, question answerers, and expert systems (boxes 2-6). But there are also hypothesis generators at the system level. The subsystems are further organized into more specialized sub-subsystems, in which there is increasing expertise, as suggested in Section 3. It is the integration of all these subsystems that makes the production of intelligence possible. 4.2
MINTS Development Lifecycles
Some aspects of a MINTS are conventional computer information systems. Some aspects are expert systems. Other aspects are non-computerized research systems. The traditional system development lifecycle for conventional computer information systems consists of overlapping phases such as (a) Anulysis. Given a system, estimate performance under various conditions. This involves determining performance requirements, determining feasibility and writing specifications, and using empirical, mathematical and simulation methods. Some of this is now done with computer-based tools. (b) Design. Given specified performance criteria, determine a system likely to meet them. This may involve rapid prototyping, buy or lease decisions, invention, mathematical computation, and heuristics. (cj Fuhricurion. This involves scaling up of the prototype, writing software, establishing databases and programming, and installing the system. (d) Testing. This is being recognized as a major problem. Some very large systems cannot be tested under realistic conditions such as high-speed conflict, and it is dangerous to place too much faith in the competence or performance of such untested systems. (ej Insrullution or Migrotion. This includes setting up operating procedures and an organizational structure, as well as conversion or migration from a previous system. (f) Operation and Muintenunce. This includes updating databases, improvement of functionality, and correcting imperfections. About two thirds of the effort of most programmers are expended on such maintenance activities (Fox, 1982).
258
MANFRED KOCHEN
This is planned obsolescence in the expectation of a new generation of technology. It is also based on projections of unacceptable cost and reliability due to excessive patching on top of patches. It gave rise to the idea of composite information systems (Madnick and Wang, 1988).
(8) Phasing Out und Replacement.
The development of expert systems (ES) can be regarded as having two lifecycles, neither identical to that of a conventional system (Sviokla, 1986), one for the ES, and one for the Knowledge Base. For management of program development, a variety of tools are available. At the level of hardware, devices as inexpensive as personal computers, and also special LISP machines, are in use. Expert systems on PCs generally consist of fewer than 400 rules; on LISP machines, they have between 500 and 1000 rules. At the other extreme in levels is a specified problem domain in which a human expert’s models, heuristics, knowledge and inference strategies are observed and analyzed by a knowledge engineer, who tries to represent this expertise in production rules or frames or some other means. This results in expert systems such as XCON (Waterman, 1986).Expert systems are written with programming tools such as EMYCIN, TIMM, INSIGHT, S.l, M.l, etc. The knowledge engineer may do this in a “knowledge engineering environment,” such as KEE, LOOPS, ART, OPS.5 which automates many processes for him. High-level languages such as PROLOG, INTERLISP-D, LISP, FOCUS, IFPS or C serve as even more general-purpose tools. These run under operating systems such as UNIX, which appears to be emerging as a standard. Expert system development goes through the following phases but with a great deal of looping: (a) Initial definition and system identification. (b) Construct a first prototype. This is done in place of requirement determination for ordinary systems. (c) Formulate a plan for the program, involving the user. (d) Design the documentation. (e) Develop the skill and improve it adaptively. (g) Field test the system under realistic conditions. (h) Operation. (i) Maintenance A common way to represent knowledge in expert systems is by means of “production rules,” or sentences of the form, “If any firm in market A raises its price, then firm A will raise its price to the same level within a week, almost certainly” (Kochen, 1971). Since then, limitations on the applicability of
259
MANAGEMENT INTELLIGENCE SYSTEMS
production rules for representing knowledge have been discovered, and modifications have been proposed (Kochen and Min, 1987), for example to separate declarative and procedural knowledge (Anderson, 1983) though Heidegger had distinguished action and description long before. Knowledge bases are often measured by the number of such rules, though such a simple numerical count cannot be meaningful. (See the previously mentioned example of CHAOS, a chess program.) Still, it is asserted that the knowledge base of an expert system such as R 1 has grown linearly from about 500 rules in early 1980 to 3250 rules by year end 1983 (Sviokla, 1986). But the utility of a knowledge base does not grow linearly with the number of rules. Consider as an extreme case a knowledge base of propositions for an axiomatic mathematical domain. A minimal knowledge base would comprise the axioms. Fewer than the minimum number are not enough. Restatements of the axioms or minor variants of theorems implied by the axioms would not add as much as would significant, non-obvious theorems or alternative noncontradictory axioms. As the number of propositions in the base grows, the number of ways of combining them for use in proofs of new propositions grows exponentially, but so does the effort required to find fruitful or significant new conjectures and their proofs. It is possible that when a knowledge base becomes large enough, the last item added to it decreases in value with KB size relative to the effort needed to generate and update the knowledge base with it. Thus, the value relative to effort of KB may vary with age, assuming linear growth, as shown in Fig. 3. The time at which this curve attains its maximum could be regarded as its natural lifespan, and M its maximum size. Thereafter,
value,'efTort ratio
f
I
I I
K B size
-L
M
( # of
production rules)
FK; 3. A possible relation between the sire of a knowledge base and its effectiveness/cost ratio.
260
MANFRED KOCHEN
the domain may divide into specialized domains, with more specialized KBs replacing the original ones. There is now a significant literature about how to gather and use intelligence. (Fuld, 1985; Ghoshal and Kim, 1986; Miller, 1987; Wagers, 1986). Such intelligence can be used to build the knowledge bases, though most of the valuable inputs will be current, as shown in Fig. 2. Because of the long history of experience with intelligence production, some basic principles have energed (Platt, 1957). These are unlikely to become invalid as advanced technologies are introduced into intelligence production.
I . Principle of Purpose. Every intelligence project must be directed by the use to which the results are to be put. This includes the problem chosen for attack, its formulation, and a clear vision of how a solution to the problem would serve as a guide to policy or action. 2. Principle of Exploitation of Sources. Explore and assess all sources that can shed light on the project. Vary the sources and use them to cross check one another. Identify strengths and weaknesses of each source. 3. Principle of Significance. Give meaning to bare data. For example, compare facts at one time with corresponding facts at the same date a year ago. Interpret, explain all facts. 4. Principle of Cause and Effect. Seek causes and effects whenever possible in search for the key factor. 5. Principle of Morale. In assessing a competitor or adversary, or even a potential partner, assess the will-to-win of his leadership and his staff. Is he unusually aggressive or unusually defeatist? 6. Principle of Trends. Estimate the direction of probable change. 7. Principle of Degree of Certainty. Attach reliabilities to statements of fact, degrees of precision to quantitative data, and probabilities or other measures of the weight of evidence to estimates and conclusions. The Bayesian approach is probably the soundest one. 8. Principle of Conclusion. The intelligence project is not complete until it offers conclusions, answers to the question “So what?” 4.3
The Effectiveness Condition
There is a rich history of intelligence failures in the military, politics, criminology and business (Strong, 1969). There are also examples of brilliant successes. Consider examples of such successes and failures in six categories (A- F, defined as follows) to suggest that in simple situations, intelligent management is necessary for success, whether or not advanced technology is used; in complex situations, advanced technology must also be used.
26 1
MANAGEMENT INTELLIGENCE SYSTEMS
Category
Success or Failure'?
Complexity?
A B C D E F
Failure Failure Success Success Failure Success
Simple Simple Simple Simple Complex Complex
Intelligent Management?
Use of Advanced Technology or AI?
No No
No Yes No Yes No Yes
Yes Yes Yes Yes
A good example of A, an industrial intelligence failure, is the introduction of Michelin tires into the American market. U S . tire companies failed to note that Michelin built plants in Canada with six times the capacity of the Canadian market, where, unlike in the United States., they were permitted. Another famous example of A is Sorge's message in 1941 to the Russian government about the exact date and place of the planned German invasion. Stalin had overwhelming corroborating evidence about the impending attack from a variety of reliable sources. Both are relatively simple and straightforward situations, with intelligence from reliable sources, obtained with conventional rather than sophisticated technological means. The failure was not that of intelligence agents or analysts, but of the client. Whether he suffered from paranoia, cognitive dissonance (in this case, ignoring information that was inconsistant with prior beliefs), competing hypotheses to explain the information, lack of appreciation for the importance of the information, etc., it is lack of natural intelligence or competence in the system, as defined here. Examples of B include Pearl Harbor and the Bay of Pigs. The United States had broken the Japanese code prior to Pearl Harbor with the help of such technologies as were available, and had reliable intelligence about the planned attack on Pearl Harbor. Lack of natural intelligence and competence in the chain of command prevented the message from reaching the President. Similar errors occurred in the ill-fated Bay of Pigs invasion in Cuba and several incidents in the Pacific. The research, unaided by advanced technology, by Bernstein and Woodward that uncovered the Watergate affair illustrates C. It was also a relatively simple puzzle. The investigation succeeded all the way because of the natural intelligence of the elected representatives of the American people who became concerned. Lest it be thought that the introduction of technology might detract from success, consider the Cuban missile crisis as an example of D. Detecting the missiles was relatively simple, using photo intelligence; success was due to the natural intelligence of John F. Kennedy and his staff. When the situation is very complex, as is the 1988 revolt of Palestinians
262
MANFRED KOCHEN
under Israeli occupation (type E), and if natural intelligence is attributed to Israeli leadership, it would seem that the use of sophisticated technologies might have avoided the failure to estimate the intensity of discontent or the intentions and strategies employed by PLO leaders. Another example of this complex kind was the failure to anticipate the success of Khomeini in ousting the Shah of Iran; that is more likely to be a failure of natural intelligence. An example of F is the deciphering of the German Enigma codes by A. Turing in Project Ultra during World War 11. The total situation was complex. The natural intelligence of all those involved in Ultra, which included very few, such as Winston Churchill, F. D. Roosevelt, and a few they trusted, was high. It could not have been done without using the most advanced computing technology that could be brought to bear. It lead to interception of key messages between General Rommel and the German high command in Berlin, which played an important role in defeating the Germans in North Africa. It contributed greatly to the allied victory in that war by many more instances of this kind. The most important cause of intelligence failure is the “poverty of expectations” (Platt, 1957). This is a routine obsession with a few, familiar dangers and opportunities. Other major causes are, in order of decreasing frequency: insufficient knowledge; general incompetence; biases, such as cognitive dissonance; deception by the competitor or adversary; self-deception; mirror imaging, in which the principal assumes that his competitor will do what he would do were he in the other person’s position; judging new phenomena solely in the light of past experience; misreading of signals and indicators; overload and resulting lack of attention and vigilance; inadequate communications; unclear or ambiguous command and control structures. If a MINTS could help to stimulate and guide imagination, it would meet a major need, help overcome a main cause of intelligence failure, and this is a key requirement. The general proposition advanced here is as follows. If the natural intelligence of users of a MINTS exceeds a certain level, then the introduction of advanced information technologies, such as AI, will increase their performance in leading their organization to success and survival. If the natural intelligence of these users is below that level, these technologies will decrease their effectiveness. It may help them lead their organization to failure or to increase the chaotic aspects of performance. 4.4
A Model for Relating Natural and Artificial Intelligence
We now try to prove this proposition by explicating the concepts of intelligence with an abstract and simplified model. Some informal philosophical preliminaries help to motivate the formalism. We start with “things” and
MANAGEMENT INTELLIGENCE SYSTEMS
263
“ideas” as two universal constituents. Things are material entities that interest scientific experimenters, and which we know from sensory experience. Brains are things. Platonic ideas have no material counterpart. They are pure “thought.” They are not things. Minds are not things. “Ideas” include values, concepts, beliefs, and hypotheses. Persons are both things and ideas, brain and mind. They can generate and comprehend ideas. Above all, they are governed by values, i.e., preferences, revealed or explicit. Machines, even if produced automatically by other machines, are not governed by values. They do not express preferences except as imputed to or designed into them by persons. They may generate and process concepts, beliefs or hypotheses, but they have no preferences. Persons and machines are two basic constituents of organizations. Natural intelligence applies to persons. Artificial intelligence applies to machines. Both are required for the organizational intelligence needed to cope with complex situations because that requires the making and use of accurate value-maps of the world; natural intelligence is needed to make maps; artificial intelligence helps in using them. The key concept to be explicated is that of a value-map. The idea of a statespace was originated by physicists in the last century; state-transition system concepts were adapted by early automata theorists. Utility-theory concepts were developed by early decision-theorists and economists. They were first combined into a model of organizations as information systems. (Kochen, 1954, Kochen, 1956). Suppose for the sake of discussion that we, as objective observers, can represent the “world” in which an organization (denoted by A for actor or decision-maker) moves, survives and thrives or suffers as an object in a state-transition space, illustrated in Fig. 4. The illustration uses just two
~
FK,. 4. Illustr~ttiiigthe etTects of actions i.e.. shifts in positlon in a state-space (such as that of position5 in the market).
264
MANFRED KOCHEN
dimensions (say market share msA and profitability PfA) that we “know” to be relevant for A’s well-being in the sense that if we were to observe A in certain states we would observe A to be alive, alive and prospering, alive and suffering, dead, etc. To be sure, “we” cannot know whether and how A is suffering except by observing and inferring, or asking and believing. The state-space could as easily consist of four dimensions, for example, the market share of his major competitor B as well (say msBand pf,, with msA + msB = 1). Preferred states for A would be toward the upper right of the space, with msAand PfA as high as possible. Generally, there are regions of the space that “we know” A to value positively (perhaps in varying degrees), others that we know A to value negatively (perhaps in varying degrees), and possible paths for A to traverse. Figure 4 shows in a highly simplified way one path from a starting state so to a state in the positively valued region. To traverse it, A must choose actions ao2 rather than sol, which are the two choices “we know” are available to him when he is in state so.This causes a transition to state s,, where he must choose uI3rather than a I z .He would experience a downward gradient, and he could learn from the feedback that he is heading in the wrong direction, at least temporarily. It is possible that the (only) path to the positive state from so must take a downward turn (local minimum), but “we know” that it will eventually get him toward a highly value “goal”-state. Now, A does not know Fig. 4. He does act, experience and evaluate the consequences of his actions. He also observes, communicates and reflects, generating ideas that such variables as ms,, pfA are important for him, i.e., he expresses his values. He encodes in symbols the state he was in, the action he took and the state this led to. He correlates the three, thus forming a local map. If he finds himself to be in the same state again he remembers that, and if the action taken previously led to greatly increased value, he repeats it. He checks whether the same transition rule recurs; if not, he revises it using probabilistic estimates so that it is consistent with all his prior experience; if so, he tries to analogize and generalize to similar states and actions. A cannot build a useful value-map solely by patching together the local maps thus acquired from experience. He must “imagine” how the local maps fit together into a pattern that reflects the continuities and discontinuities (i.e.,the topology) among the states he values positively and negatively. The image is analogous to a landscape with a few mountain ranges and peaks corresponding to highly valued states, a few canyons, chasms and nadirs corresponding to negatively valued states, and mostly mesas, steppes or flat terrain. Superposed on this relief are the branched paths that A imagines are accessible to him from various sites, depending on what he chooses to do from the options he believes are available to him. A has, at any time t , such an imperfect map (incomplete and inaccurate relative to what “we know” to be thecase). What is important is that A has a global (bird’s eye) perspective rejecting his values as he perceives them at t , (he may change these perceptions when he finds that a state he
MANAGEMENT INTELLIGENCE SYSTEMS
265
expected to value highly causes him great distress and he is in it) as well as the possibility of switching to local (worm’s eye) perspectives at various levels in an apprapriurely flexible way. Internal maps of the kind illustrated by Fig. 4 have been modeled as expert systems. (Kochen and Min, 1987) A MINTS has also modeled as an expert system that does two things: (a) I t speeds up the improvement of the system that represents a strategic planner’s value-map. (b) It integrates two or more such expert systems, for example one denoting his value-map and another denoting his map of his major competitor. A system that performs the above functions (a) or (b) is a meta-expert system. It operates on, modifies, and combines other expert systems. It can be said to facilitate “learning” or adaptive improvement. As such, it represents a high level of flexibility. We propose to define A’s natural intelligence by the rate at which he modifies his map in the direction of increased completeness and accuracy and by 4, the quality of his map in this regard. Both aspects are important, because the world-the “actual terrain” of Fig. 4 as “we” see it-may change, or there may be plateaus in how 4 increases over the long term (e.g., new statevariables may add to or replace old ones, values may change, constraints on and opportunities for action may be added, etc.). This pair of variables characterize only two of several variables that characterize competence. It is the most important in the context of this chapter, and focusing on it will provide more insight than a comprehensive analysis. The other variables are: k,, the number of important and relevant questions a respondent can answer well enough, somewhat as in traditional educational testing for knowledge (this is know-what); (know-how), the number of key tasks and procedures a respondent R can do with a high enough level of skill; k , (know-who), the number of valuable personal contacts a respondent can draw on and think of using for getting help, support, for referral, etc.; k, (knowwhen), R’s sense of timing, awareness of the existence of strategically critical time windows and priorities, as evidenced by the number of good opportunities not missed; k , (know-where), which corresponds to R’s sense of place, such as where to look for and quickly find certain important items, and where to be at appropriate times; k, (know-why), which refers to R’s ability to justify and persuade, to explain and make credible his positions; k, (know-howmuch), which reflects R’s sense of quantity, his ability to make sound quantitative estimates, to express judgments about what is too much or too little; u (understanding) the number of important, incisive, answerable and relevant questions that R is able to formulate and thinks of posing; i (intelligence), which refers to how adeptly and quickly R can switch from
266
MANFRED KOCHEN
the perspective of a generalist to that of a team of specialists in appropriate specialties at appropriate levels; and w (wisdom), which reflects R’s ability to bring to bear values and integrate all the other aspects of his know-X on deciding what to do when and how to do it. The role of machines in a MINTS-and A1 technology, in particular-is to help A determine and assess his current state, to apprise A of opportunities and threats in the near and more distant future, to help him select goals (targets of opportunity), to check consistency with his values and to advise A in choosing action sequences. This means map-utilization. Action sequences associated with probabilistically branching paths can be viewed as programs for algorithms accompanied by a claim that they will lead to the most highly preferred or valued states from the current state, with high probability and at a given risk. We propose to measure A1 by the extent to which such automatically formed programs are correct according to A’s map at the time, and by the speed with which they are formed and executed. Such machines support A by telling him the probable consequences of various action courses, but leaving it to his judgment to decide, according to his values, how much risk is acceptable, when and where to seize opportunities, what goals to pursue. If q, the quality of A’s map is below a threshold q,-i.e., insufficient natural intelligence-then there is a high probability that (a) states will be misvalued; (b) paths will be missing; (c) paths will be mislabeled, with incorrect associations between states, actions and the transition states. Using such an imperfect map will either result in random behavior or in downward directions much of the time. There is a very small chance that an imperfect map is biased in a positive direction, because relatively few paths are highly valued and relatively fewer paths lead to them. If the expected number of such flaws in the map is sufficiently large, the probability of random behavior or of dysfunctional strategies, with negatively valued outcomes in either case, will be very high. The use of such maps by A1 will lead to such negative outcomes more rapidly and with much higher probability. To the contrary, if q exceeds q,, the reverse occurs. Thus, q, is the point at which A1 begins to pay off. But qo may increase with time. Thus, though q > qo at one time, that condition may no longer hold due to qualitative changes in the world, such as technological, economic, political or social discontinuities. Then, if the rate at which the map is improved is large enough, it may cause q to increase until it exceeds qo once again. It appears that in most industries only 5-1074 of the firms that start when the industry is launched survive a decade, and those who do are transformed discontinuously every 5- 10 years by changing strategy (products, markets), structure, people (replacing entire top management team), process and possibly even values, (Tushman e t al., 1987) generally in anticipation of such major external discontinuities.
MANAGEMENT INTELLIGENCE SYSTEMS
267
Hopefully the discussion in this section has stimulated the research-oriented reader to ask many questions to be answered by further research. The above conjecture regarding the relation between natural intelligence, artificial intelligence and the benefit of business or organizational intelligence has the status of an empirically testable hypothesis. Managerial and leadership competencies other than natural intelligence are necessary as well. Social networking is also a necessary competence, at least for the successful adoption of technology (Kochen and Chin, 1989). Generally, informal sources of information will also play a more important role in the future (Compaine and McLaughlin, 1987). The concept of a map to aid in strategic planning needs much more elaboration. In Figs. 1. and 4 constraints operate and make certain regions of the space inaccessible and certain transitions forbidden. Rules in expert systems that are equivalent to state-transition maps should apply to sets of points, perhaps to fuzzy sets, rather than to individual points. The transitions should be regarded as probabilistic and not all applied in the same time interval in a synchronous way. The indifference curves, as in Fig. 1, are better modeled as partial orderings (preference) than as crisp curves. Above all, the repertoire of possible actions by all the players in combination should be extended. These are but a few of the more obvious steps that are needed. On a more fundamental level, it can be argued that the A1 support systems could contain enough intelligence to compensate for the user’s lack of natural intelligence, and, in principle, displace the human analyst or manager altogether, as posited by, say, Fredkin. We do not take this position, because A1 is basically a human construct for which a human-perhaps the designer or the owner-ultimately bears responsibility, unless humans lose control over this technology or become extinct.
5.
Managerial Issues
MINTSs are likely to become important in the decades to come. So will the search for a scientific foundation underlying their design and use. Schools of Business will have to teach management of MINTS, as well as management with MINTS as tools after their faculties learn this. All of us will have to learn how to manage and cope in the environment of MINTS. 5.1
Management of a MINTS
To appreciate the difficulties of managing an intelligence organization, we need only to look at the relation between the CIA and the branches of government to which it reports. The most critical issue is the degree of autonomy, authority, power and responsibility delegated to the MINTS. On
268
MANFRED KOCHEN
the one hand, intelligence is so necessary for leadership that those who control it have a great deal of power. By withholding or distorting information reported to a chief executive (or to others), the chief intelligence officer (CINTO) can shape the kinds of policies and decisions made. He can even do this inadvertently by the kind of information collection policy he employs. Yet, to be effective, he must have the complete trust of the chief executive officer (CEO) to support him. If the CEO is open-minded and the CINTO loyal and trustworthy, then the CEO’s management of his MINTS (represented by the CINTO) is likely to be effective. But in politics, as perhaps also in war, sports, business and love, persons in power do best to be kind and cautious. The CEO may be open-minded and ready to change course as long as his power base is not threatened, for which he is likely to remain vigilant. There is good reason for caution, because sudden changes in the environment are likely to call for radical and sudden responses from organizations, and that includes the possibility of his being replaced. (Tushman et ul., 1987; Nadler and Tushman, 1986) The same holds for the CINTO. Hence, the CEO’s open-mindedness has, in practice, limits, as does the CINTO’s loyalty. If the CINTO is subordinate to (and funded by) the CEO, with the latter having the power to discharge or promote the former, the value of services rendered by the MINTS may be compromised. The CEO may always suspect the CINTO of acting in self-interest. In a situation in which providing the CEO with valid intelligence could threaten the CINTO’s position, self-interest would imply that the CINTO would withhold or distort such intelligence. If, on the other hand, the CINTO is independent of the CEO, the latter may doubt the CINTO’s incentives and motivation in supplying intelligence. Intelligence cannot be paid for according to the value of reports to the CEO, or else he will get biased reports and possibly act inconsistently with the organization’s values and policies. A system of checks and balances, similar to that between the three branches of the U.S. government, is probably the best way to manage an independent MINTS. Even the best MINTS, headed by the most competent and loyal CINTO, however, will not help the survival of an organization that lacks leadership. This means vision about what to do, and secondarily how to do it well. (Bennis and Nanus, 1985) It requires of the leader also a strong personality, with connections that he is willing and capable of using. It requires clear intention, ability to mobilize and capabilities, commitment and carrying through. It requires the ability to transform at times of change (Tichy and Devanna, 1986). What does it mean for an organization to “have” an independent MINTS? Can the same MINTS serve more than one client organization? Does it manage and support itself? Does it compete with other MINTSs on a mar-
MANAGEMENT INTELLIGENCE SYSTEMS
269
ket for intelligence‘? Could there be demand for the services of many small MINTSs which specialize? Not according to our conceptualization of intelligence, which emphasize appropriately flexible switching among various levels of specialization and across specialties. If each organization has its own MINTS, will there not be a common overlapping use of coverage that it might pay all of them to share? Many of these and other issues concerning the management of MINTSs are subordinate to the broader issues of competition and cooperation, discussed in Section 5.4.
5.2
Management with a MINTS
The most important principle that is implied by the main thesis of this chapter is that the CEO responsible for deciding how to use a MINTS as a tool in strategic planning should permit only sufficiently competent people in his organization to use it. He should then ensure that the MINTS aids them proactively as well as responsively. A good MINTS is a very flexible and versatile tool. I t must be used as such. A MINTS is like a combination of private detective, lawyer, accountant, librarian/information analyst, and consultant/adviser, which maintains continual surveillance over everything of importance to the CEO and which is looking out for his interests. How does he manage with its services‘?In its zeal to justify and motivate his payment for its services, the MINTS may overload him with input. Thc essence of useful intelligence is that it is carefully screened and prioritized. At all times the effective manager reminds the MINTS of his values, which are the basis of prioritization, offering concrete feedback about the priorities of what is supplied and specifying beforehand what he prefers, whenever possible, in a form that permits clear determination of whether it is useful or not.
5.3 Management in a MINTS Environment The issues here are too many to cover. Since much of intelligence is about people, issues of privacy, confidentiality and secrecy are primary. If everyone in an organization pervaded by a good MINTS feels that the MINTS maintains a growing and secret record about him or her that is likely to be of value to top management-which may include evaluations of performance and potential, personal and other sensitive matters-then fear may also be pervasive. “Secret” means accessible only to those with a “need to know,” as interpreted by someone empowered by the CEO to make such .judgments. If the person about whom the record is kept is denied access to the complete file, this fear may be greater, perhaps justifiably so, since he has no
270
MANFRED KOCHEN
opportunity to check the accuracy of the record. On the other hand, knowing that negative evaluations are in the record may make him less content than not knowing that. Fear also increases with suspicion, sometimes justified, that unauthorized persons can gain access to the record. The secrecy issue also applies to technologies embodied in products and production processes. Insofar as a technology is codified-e.g, embodied in blueprints-it is readily imitated and appropriated by competitors, suppliers, customers, etc. Patents, copyrights and trade secrecy offer limited protection. Even technological know-how that is very tacit, in the expertise of experts that even they cannot articulate and communicate, can be appropriated by hiring those experts. Hardware is perhaps most readily appropriated, by reverse engineering, though highly efficient production processes, resulting from numerous incremental imporvements, are more tacit. Software can be imitated and appropriated with increasing facility, as we move toward greater standardization, among other factors. Knowledge and databases, like TV and recorded performances, can also be copied. Model bases, particularly idiosyncratic heuristics, may be somewhat more tacit, though once they are computerized, they become subject to imitation. Only procedures and sociotechnology remain tacit. How much investment to ensure secrecy, to thwart potential imitators, is justified? A small business firm cannot afford as much to guard its technology as a large imitator firm can afford to appropriate or replicate the technology. Moreover, the larger firm may have the needed cospecialized technologies and complementary assets (e.g., distribution channels, service, competitive manufacturing, a popular brand name) in place when the innovative small firm does not. 5.4
Communication, Competition and Cooperation
We generally think of using a MINTS in a world of competing adversaries, often as if they were in a zero-sum conflict situation. Clearly, an increase in one firm’s share of a given market comes at the expense of decreases in the shares of some competitors. In many sports contests or some cultural competitions, gains by one contestant are losses by the other@).In an armed conflict, too, gains by one side are losses by the other, at least on the surface. In a love triangle, too, success of one suitor spells failure for the other. Yet, to compete, adversaries must communicate, if only by widely available products and services at known prices or other transactions open to public scrutiny. (In the love triangle, communication among competing suitors occurs at least through the object of their love). Success in conflict often depends on one party’s ability to conceal (or reveal deceptively) its own intentions, capabilities, commitments and actions, even if that party’s
MANAGEMENT INTELLIGENCE SYSTEMS
27 1
capabilities are weaker than its adversaries’ but compensated for by surprise (Levite, 1987),stealth, concentration and speed. Such fighting to win, in which communication is at best unintentional, may be beyond the “rules of the game.” Even war, and certainly competition in business, sports, culture and perhaps in the love triangle, is governed by some rules of “fair play.” Extreme forms of deception may be considered unfair. Industrial espionage is as frowned upon as fraud and crime (Eels and Nehemkis, 1984) and not to be identified with business intelligence. But communication can pave the way for reciprocity, said to be a way of life in the U.S. Senate (Matthews, 1960; Mayhew, 1975) which, in turn, illustrates the emergence of cooperation (Axelrod, 1984). There are few situations in which the interests of all (both)parties are completely opposed to one another. There are always some win-win opportunities, in which each party gains something of value. This kind of situation is better modeled by the non-zerosum prisoner’s dilemma game than by a zero-sum game. Here, the reward to each of two players if both cooperate is greater, say 3 (the pair of numbers in the upper left cell), than if they both compete, in which case both get, say, 1 (the pair of numbers in the lower right cell). But if one offers to cooperate while the other intends to compete, the former gets the “sucker’s payoff,” say 0, and the latter gets rich quick with, say, 5 . This is shown in the off-diagonal cells of the table in Fig. 5. Cooperation is interpreted in the case of two burglars imprisoned for a crime as not testifying to the partner’s guilt, while competing means betraying the other in the hope of gaining release. Could MINTSs be used to find those opportunities in which every part wins sornerkiny of value to it. even if not as much as it would gain if it were the only or the # I winner‘? Could it help parties in conflict switch from interactions based on concern for position to interactions based on concern for their interests (Fisher and Ury, 1981)? Or, if the Tit-for-Tat strategy submitted by A. Rapoport, which consistently won in the tournament staged by Axelrod, or an even better strategy claimed to have been found by use of Holland’s generic algorithm, leads to cooperation, is a MINTS unnecessary‘? A MINTS is useful, if only to make
Orp I
3. 3
0. 5
Compete
FK, S. I’risoncr‘s dilemma payoll’ matrix. The lirsl number in each cell is the payolr to organimlion I . T h e second number I\ the payolT to organiiation 2. F o r example, the benefit to orgiiniLation 1 by ii unilateral move to compete when organiailion 2 seeks 10 cooperate is 5. while the benelil to orgaiiimtion 7 I S 0. a s i n the lower left cell.
272
MANFRED KOCHEN
parties aware of the payoff matrix, the nature of the game and the optimality of the Tit-for-Tat or reciprocity strategy; also, the evolution of cooperation can take very long, and the use of MINTS can accelerate its pace. Generally, the assumption of a completely informed, rational person, which underlies much of economic theory, requires either a learning approach based on continual improvement (Kochen, 1971) or a MINTS or both. The use of shared knowledge base items that each of several MINTS in competing organizations collects, and which they know their competitors collect, could be a step in the direction of cooperation while competing. All would gain. But the most valuable kind of intelligence is still that which no competitor has or knows that the organization in question has, and which supports strategically advantageous moves by it. 5.5
Emergent Properties and Systemic Intelligence
A society comprises many living organizations and institutions, each directed by persons with intentions, capabilities (for mobilizing resources), commitments and courses of action in various stages of completion. Some, such as IBM, DEC and Apple, are business firms competing in one or more markets. Others, such as the members of MCC, have begun to cooperate on certain dimensions, such as Research and Development u p to the development of prototypes. Advanced information technologies are pervading nearly all organizations in developed societies, and may soon pervade those in less developed societies as well. As they affect effectively managed manufacturing firms by increasing productivity, quality, and throughput, and lowering product delivery time, costs, variabilities and uncertainties, these technologies increase the wealth generated and the value added. But they do so by displacing and transforming the requirements for human labor (OTA, 1988). Yet, it is people who must meet the demands for greater creativity, to invent new products and services likely to be in demand (assuming insatiability), so that they can acquire the purchasing power to enjoy this share of the increased wealth. Otherwise, societies may become fragmented into two tiers. In one tier are the few who have the competence to manage the advancing, wealthproducing technologies and who thus own or control most of the wealth that is generated. In the other tier are the many without enough means. The arguments against this scenario are based on the assumption that societies such as nation-states will continue to perceive our world of limited resources as a zero-sum game (Cyert and Mowery, 1987). It is argued that job losses in the United States are due to the loss to foreign competition rather than due to labor-displacing technologies in the United States, and that the introduction of automation would, by increasing quality and productivity and by lowering costs, expand our market share sufficiently to create many more jobs than are
MANAGEMENT INTELLIGENCE SYSTEMS
273
lost. But this should mean lost jobs in competing societies. If proper use of the technology enables one highly skilled person to do the work of a dozen less skilled persons without the technology, some low-skilled jobs will be lost somewhere in the world. This is an example of an emergent or systemic property. Another example is the realization, before long, that it is against the national interest of a society such as the United States to invest in research and development that leads to public knowledge, hence easily appropriated by competitors better positioned to commercialize it than firms in the United States. Yet investment in basic R&D has a very high rate of social return on a world-wide basis. Suppose that a MINTS could discover a win-win strategy, such as a Tit-for-Tat strategy by the United States, to use against a competitor who commercializes U S . innovations but fails to equitably share the rents from this combination of innovation and commercialization. Should such a strategy work, the United States could continue to do what it does best-e.g., research, development, and innovation- while, say, Japan does what it is best at, e.g., commercialization, with both gaining, though not as much as if either were the sole winner. The MINTS might generate systemic intelligence to the effect that the most survivable future is one in which there is no # 1, #2, in a linear hierarchy of players, but every player gains something of value to him.
6.
Conclusion
A Management Intelligence System is a stable structure with the function of screening, evaluating, and synthesizing information, based on knowledge and understanding of its environment to help the intelligent managers it supports in setting goals, assessing their organization’s position, selecting appropriate strategies, tactics and actions, and in carrying these out in appropriate ways. Business firms that face shorter product cycles, easily imitated technologies, and intense global competition for high-quality, low-priced, customized, rapidly delivered and reliably supported products and services are beginning to recognize the need for a MINTS. They are learning the management of MINTS, with these as tools and in environments pervaded by them. System professionals are being challenged to develop such MINTS and system scientists have an opportunity to create scientific underpinnings for the analysis, design and use of MINTS. Some of the main concepts and issues toward a theoretical foundation for these scientific underpinnings were presented here. Principles based on experience and research with other computerized information systems were brought together and applied to MINTS. The resulting analyses and designs were found to be sound and likely to have a significant impact on strategic business planning.
274
MANFRED KOCHEN
A MINTS will help an organization improve or maintain its interest with the help of a MINTS only if its leaders, the MINTS users, are sufficiently competent. Competence includes natural intelligence, which is defined as the ability to shift rapidly and appropriately among different levels of specialization in domains of knowledge and understanding needed for making useful cognitive value-maps. A MINTS with sufficient functionality to help an organization facing opportunities and threats in complex situations requires artificial intelligence. The latter is defined as the ability for, and performance in, reading and using cognitive value-maps. A MINTS is necessary but not sufficient to ensure competitiveness. Foreign competition has been characterized by Gomory (1988):(a) tight ties between manufacturing and development; (b) an emphasis on quality; (c) the rapid introduction of incremental improvements; (d) great effort by those in the production process to be educated in the relevant technologies, in the competitors’ products, and in events in the world. An organization may fail because it does not have or use a MINTS when one is necessary to detect opportunities and threats. It may fail even if it uses a MINTS because: the MINTS provided inaccurate, incomplete or ambiguous intelligence, or because it presented it unpersuasively; the valid output of the MINTS was not paid attention to; valid output was attended to but used erroneously. It may succeed without a MINTS by chance. It may also succeed because of the experience-based exceptional intuition, talent or competence of its leaders or because of the simplicity of the situation. With situations becoming more complex, or competitive, high-performance MINTS will be needed more and more. Persons who can manage, build and operate them will be in greater demand. The scientific foundation for their professional use must be created. Some cornerstones for this foundation were laid in this chapter. ACKNOWLEDGEMENT Thanks are due to Professors R. Copper, J. Fry and M. Gordon of the University of Michigan Business School for their helpful comments on an earlier draft, to Ph.D. students Moonkee Min and Choon Lee, who helped prepare the “course pack” used as a text for thecourse/seminar based on the MINTS concept, and to all the students in the Business School of the University of Michigan, many of whom will be leaders, who participated in the course. The excellent work of Betty Wolverton in typing this manuscript is also greatly appreciated.
REFERENCES Amarel. S. (1962). On the Automatic Formation of a Computer Program which Represents a Theory. In “Self-Organizing Systems” (M. Yovits, G. Jacobs, and G. Goldstein, eds.), pp. 107175. Spartan, Washington, D.C. Anderson, J . ( I 983). “The Architecture of Cognition.” Harvard University Press, Cambridge, Massachusetts.
MANAGEMENT INTELLIGENCE SYSTEMS
275
AnsolT, I. (1979).“Strategic Management.” Wiley, New York. Axelrod. R. (1984).“The Evolution of Cooperation.” Basic, New Y o r k . Beatty. C. M.. and Ives. B. (1966). Competitive Information Systems in Support of Pricing. M I S Qucirterlj 10 ( I ) . 85-96. Bennis. W., and Nanus, 8. (19x5). “Leaders.” Basic, New York. Bodily, S. E. (1985).“Modern Decision Making.” McGraw-Hill. New York. Burke. R. C. (1984). “Decision-Making in Complex Times: The Contribution of a Social Accounting Information System.” Society of Management Accountants of Canada. Hamilton, Ontario. Cattell, R. B. (1987).“Intelligence.” North-Holland. Amsterdam. Chaitin. G . ( 1974). Information-theoretic Limits of Formal Systems. J . A C M (July 21), 403-424. Chaitin. G . ( 1975a).Randomness and Mathematical Proof. Scientific, American 232 (May).47-52. Chaitin, G. (l975b). A Theory of Program Size Formally Identical to Information Theory. J . AC‘M 22 (July),329-340. Compaine. B. M.. and McLaughlin. J. I-. (1987). Management Information: Back to Basics. Informution Management Rcv. 2 ( 3 ) , I5 34. Cyerl. R. M., and Mowery, D. C., eds. (1987).“Technology and Employment.” National Academy Press. Washington, D.C. Diebold. J. (1984) “Making the Future Work: Unleashing Our Powers of Innovation for the Decades Ahead.” Simon and Schuster, New York. Drucker, P. F. (1985). “Innovation and Entrepreneurship.’’ Harper and Row, New York. Dutta. B. K., and King. W. R. (1983). A Competitive Scenario Modeling System. Management Scirnw (March). Edelman, G. M.. and Mountcastle, V. W. (1978).”The Mindful Brain.” MIT Press, Cambridge, Massachusetts. Eels, R., and Nehemkis, P. (1984).“Corporate Modeling and Espionage.” McMillan, New York. El Sawy, 0. A. (lY85). Personal Information Systems for Strategic Scanning in Turbulent Environments: Can the C E O G o Online‘? M I S Quurterly 9 ( I ) , 53-60. Eysenck, H. J . (1962). ”Know Your Own IQ.” Penguin, Hammondsworth, United Kingdom. Fahey, L., King, W. R., and Narayan, V. K. (1981). Environmental Scanning and Forecasting in Strategic Planning-The State of the Art. Long Runge Planning 14 (February). Fancher, R. ( 1985).“The Intelligence Men: Makers of the IQ Controversy.” Norton, New York. Fischler. R. B. ( 1987).“Intelligence.” Addison-Wesley, Reading, Massachusetts. Fisher, R., and Ury. W. (1981). “Getting to Yes: Negotiating Agreement Without Giving In.” Houghton -Mifllin, Boston. Fox. J. M. (1982).”Software and Its Development.” Prentice-Hall, Englewood Cliffs, New Jersey. Fuld. I-. M. ( 1985). ”Competitor Intelligence: How to Get it; How to Use It.” Wiley, New York. Ghoshal, S., and Kim, S. K. (1986). Building Effective Intelligence Systems for Competitive Advantage. Sloun Munagemcnt Rev. 49 (Fall). Gilad. T., and Gilad. B. (1986). Business Intelligence-The Quiet Revolution. Sloan Managemem Reit. 49 (Summer). Gomory. R . (1988). Bridge 18 (Spring), 13. Grabiner. J. (1986). Computers and the Nature of Man: A Historian’s Perspective on Controversies about Artificial Intelligence. Bull. Amer. M a t h . Soc.. 15 (2), 113-126. Harmon. P.. and King. D. ( 1985). “Expert Systems: Artificial Intelligence in Business.” Wiley, New York. Hurclard Biisiness R r c i e w (1987). Do You Think There is a Competitiveness Problem’? (May/June). I ~ - 2 7 . Harvard University (1980). Mapping the Information Business, (J. McLaughlin and 0.A. Biriny), Research Report P-80-5; 1986 update by McLaughlin and A. L. Antonon. A New Framework for the Information Arena. (B. M. Compaine). Research Report P-80-3. Guest presentations on ~
276
MANFRED KOCHEN
Command, Control, Communications and Intelligence, 1980,1981, 1982, by W. E. Colby, B. R. Inman, W. Odom, R. Tate, W. 0.Baker, J. H. Cushan, R. D. DeLauer, H. Dickinson, R. H. Ellis, D. C. Richardson, C. Rose, W. C. Miller, and many others. Holloway, C. (1983).Strategic Management and Artificial Intelligence. Long Range Planning 16 (October). 89-93. Howard, R. A., and Matheson, J. E. (1984). Influence Diagrams, In “The Principles and Applications of Decision Analysis,” Vol. 11. Strategic Decision Group, Menlo Park, California. Ishikawa, A. (1986). “Future Computer and Information Systems.” Praeger, New York. Jafk, E. D. (1975).Multinational Marketing Intelligence: An Information Requirements Model. Management International Rev. 19,53-60. Jaikumar, R. (1986). Postindustrial Manufacturing. Haroard Bus. Rev. (November/December), 69-16. Kochen, M. ( 1954). An Information-Theoretic Model of Organizations. Trans. IRE-PGIT 4, 61-75. Kochen, M. (1958).Organized Systems with Discrete Information Transfer. General Systems 11, 48-54. (Ph.D. Dissertation, Columbia University 1956). Kochen, M. (1969a). Automatic Question-Answering of English-like Questions about Simple Diagrams. J . ACM 16 (I), 26-48. Kochen, M. (1969b). Automatic Question-Answering of English-like Questions about Arithmetic. Proc. Purdue Centennial Year Symp. Information Processing 1, pp. 249-273. Kochen, M. (I971j. Cognitive Learning Processes: An Explication. In “Artificial Intelligence and Heuristic Programming(N. V. Findler and B. Meltzer, eds.), pp. 261-317. Edinburgh University Press, Edinburgh. Kochen, M. (1972). Directory Design for Networks of Information and Referral Centers. In “Operations Research: Implications for Libraries” (D. Swanson and A. Bookstein, eds.). University of Chicago Press, Chicago. Kochen, M., ed. (1989).“The Small World.” Ablex, Norwood. New Jersey. Kochen, M., and Chin, W. (1989). Social Networking for Successful Adoption of Technology. Proc. Hawaii Int . Con$ System Sciences-22. Kochen, M., and Deutsch, K. W. (1973).Decentralization by Function and Location. Management Science LO (April), 841 -856. Kochen. M., and Hastings, H. M. (1988). “Advances in Cognitive Science: Steps Toward Convergence.” AAAS Selected Symposium. Westview, Boulder, Colorado. Kochen, M., and Min, M. (1987). Intelligence for Strategic Planning. Proc. 7th Int. Workshop on Expert Systems and their Applications. Kochen, M., and Resnick, P. (1987). A Plausible Mathemachine. Human Systems Management 7 (2). 163-169. Kolmogorov, A. N. (1965). Three Approaches to the Quantitative Definition of Information, Information Transmission I, 3-1 I . Kolmogorov. A. N. (1968).Logical Basis for Information Theory and Probability Theory. I E E E Trans. Information Theory IT-I4 (September), 662-664. Lancaster, F. W. (1978). “Toward Paperless Information Systems.’’ Academic Press, New York. Levite, A. (1987). “Intelligence and Strategic Surprise.” Columbia University Press, New York. Ljungberg, S. (1983).Intelligence Service-A Tool for Decision Makers. Int. Forum on Information 8 (December), 23-26. Luhn, H. P. (1958). A Business Intelligence System, I B M J . Research and Deoelopment 2 (4), 3 14-3 19. Madnick, S . E.. and Wang, Y. R. (1988). A Framework of Composite Information Systems for Strategic Advantage. Proc. 21st Annu. Hawaii Int. ConJ System Sciences, 111, pp. 35-43. Matthews, D. R. (1960). “US. Senators and Their World.” University of North Carolina Press, Chapel Hill. North Carolina.
MANAGEMENT INTELLIGENCE SYSTEMS
277
Mayhew, D. R. (1975).“Congress: The Electoral Connection.” Yale University Press, New Haven, Connecticut. Miller, T. (1987).Staying Alive in the Jungle, Focus: Competitive Intelligence. Online Access Guide (March/April), 43 - 9 1. Montgomery, D. B., and Weinberg, C. B. (1979). Toward Strategic Intelligence Systems. J . Marketing 43 (Fall), 41-52. Mosteller, F., and Tukey, J. W. (1977). “Data Analysis and Regression: A Second Course in Statistics.” Addison-Wesley, Reading, Massachusetts. Nadler, D., and Tushman, M. (1986). “Strategic Organization Design.” Scott Foresman, Homewood, Illinois. OTA (1988). “Technology and the American Economic Tradition: Choices for the Future.” US. Congress Office of Technology Assessment, Washington, D.C. Platt, W. (1957). “Strategic Intelligence Production. Praeger, New York. Polya, G. (1954). “Mathematics and Plausible Reasoning,” Vols. I and 11. Princeton University Press, Princeton, New Jersey. Porter, M. E. (1980).“Competitive Strategy.” The Free Press, New York. Porter, M. E. (1985). “Competitive Advantage.” The Free Press, New York. Rauch-Hindin, W. B. (1985).“Artificial Intelligence in Business, Science and Industry.” PrenticeHall, Englewoods Cliffs, New Jersey. Rothschild, W. E. (1984).“How to Gain (or Maintain) the Competitive Advantage in Business.” McGraw Hill, New York. Sammon, W. L., Kurland, M. A,, and Spitalnik, R. (1987). “Business Competitor Intelligence.” Wiley, New York. Science 85 (1985). The Next Step. 25 Discoveries that Could Change our Lives. Science 85 6 (9).
Scott, W. R. (1985).“Organizations.” Prentice-Hall, Englewood Cliffs, New Jersey. Silverman, G. B., ed. (1987). “Expert Systems for Business.” Addison-Wesley, Reading, Massachusetts. Solomonoff, R. J. (1960). “A Preliminary Report on a General Theory of Inductive Inference,” ZTR-138. Zator Company, Cambridge, Massachusetts. Solomonoff, R. J. (1978). Complexity-based Induction Systems-Comparison and Convergence Theorems. IEEE Trans. Information Theory IT-24(4). 422-432. Solomonoff, R. J. (1988). The Application of Algorithmic Probability to Problems in Artificial Intelligence. In “Advances in Cognitive Science” (M. Kochen and H. M. Hastings, eds.). Westview/AAAS, Boulder, Colorado. Sternberg, R. J. (1986). Inside Intelligence. American Scientist 74 (March/April), 137-143. Sternberg, R. J., ed. (1988). “Advances in the Psychology of Human Intelligence.” Erlbaum. Sternberg, R. J., and Wagner, R. K., eds. (1986). “Practical Intelligence: Nature and Origins of Competence in the Everyday World.” Cambridge University Press. Strong, K. (1969).“Intelligence at the Top.” Doubleday, Garden City. Sviokla, J. J. (1986). Business Implications of Knowledge-Based Systems. Database (Summer), 5-19;(Fall), 5-16. Synnott, W. R. (1987). “The Information Weapon: Competition Through Technology.” Wiley, New York. Tichy, N., and Devanna, M. A. (1986). “The Transformational Leader.” Wiley, New York. Tukey, J. W. (1977). “Exploratory Data Analysis.” Addison-Westley, Reading, Massachusetts. Turing, A. M. (1937). O n Computable Numbers, with an Application to the EntscheidungsProblem. Proc. London Math. Soc. 42 (2), 230-265. Turing, A. M. (1950). Computing Machinery and Intelligence. Mind 59 (October), 433-460. Tushman, M. L., and Moore, W. L., eds. (1982).“Readings in the Management of Innovation.” Pitman, Boston.
278
MANFRED KOCHEN
Tushman, M. L., Newman, W. H., and Romanelli, E. (1987). Convergence and Upheaval: Managing the Unsteady Pace of Organizational Evolution. Calqornia Management Rev. 29 (1). Wagers, R. (1986). Online Sources of Competitive Intelligence. Database (June), 28-38. Washington Researchers (1986). “How to Find Company Intelligence On-Line.” Washington Researchers (1987a). “European Markets: A Guide to Company and Industry Information Sources.” Washington Researchers. (1987b). “How to Find Information About Companies.” Waterman, D. A. (1986).“A Guide to Expert Systems.” Addison-Wesley, Reading, Massachusetts. Whalen, T., Schott, B., and Canoe, F. (1982).Fault Diagnosis in a Fuzzy Network. Proc. Int. Con5 Cybernetics and Society, pp. 35-39. Wilensky, H. L. (1967). “Organizational Intelligence.” Basic, New York. Winston, P., and Prendergast, K. A. (1984). “The AI Business: Commercial Uses of AI.” MIT Press, Cambridge, Massachusetts. Wiseman, C. (1988).“Strategic Information Systems.” Irwin, Homewood, Illinois. Woon, P. A. (1980). “Intelligence.” North-Holland, Amsterdam. Yovits, M.C., and Ernst, R. L. (1969). Generalized Information Systems: Consequences for Information Transfer. In “People and Information” (H. P. Pepinsky, ed.), pp. 3-31. Pergamon, New York. Yovits, M. C., and Foulk, C. R. (1985). Experiments and Analysis of Information Use and Value in a Decision-Making Context. J A S I S 16 (2), 63-81. Yovits, M. C., Rose, L. L., and Abilock, J. G . (1977). Development of a Theory of Information Flow and Analysis. In “The Many Faces of Information Science” (E. C. Weiss, ed.). Westview Press, Boulder, Colorado. Yovits, M. C., deKorvin, A,, Kleyle, R., and Mascarenhas, M. (1987). External Documentation and Its Quantitative Relationship to Internal Information State of a Decision Maker: The Information Profile. JASIS 38 (6). 405-419.
AUTHOR INDEX Numbers in italics indicate the pages on which the complete references are given
A
Belady, L. A., 26.26-27, 2l. 63, 65 Bell, C. G . , 14, 66 Bellman, R. E., 74, 103 Bendall, D. G., 18, 63 Bennis, W., 268, 275 Bentley, J. L., 131, 131-132, 145, 146 Berg, N. I., 193, 221 Bergman, L. A , , 173, 221 Berkowitz, D. A., 196,226 Berra, B. P., 113, 115, 120,146 Berra, P.B., 120, 147, I50 Bezdek, J. C., 103 Bic, L., 144, 146 Bird, R. M., 108, 147 Bitton, D., 127, 145, 147 Block, E., 116, 147 Bocker, R. P.,198,214,215,221,222,225 Bodily, S. E., 255, 275 Boehm, B., 33, 63 Bonucelli, M. A., 109, 118, 135, 145. 147 Boral, H. L., 116. 127, 144, 145, 147 Borrione, D.,63 Boyce, R. F., 147 Boyde, J. J., 193, 221 Bradley, J. C., 172, 223 Brady, D., 161, 174, 225 Bray, H. 0.. 114, 115, 147 Brenner, K.-H., 202,203,206,221,224 Broadbent, G., 2, 3, 63 Bromley, K., 214,225 Brown, W. M., 190, 192,221 Bryngdahl, O., 172, 186, 222 Bullock, A,, 30, 63 Burke, R. C., 230,233,275
Abbe, E., 179,221 Abilock, J. G., 238. 278 Abramovitz, 1. J., 196, 223 Abu-Mostafa, Y.S., 181,221 Abushagur. M. A. G., 218,221,223 Adlassnig. K. P., I03 Aguero, U., 3, 6, 11, 47, 52, 62, 63 Akashi, H.,104 Akin, 0..3, 63 Alagic, S.,21, 63 Aleksoff, C. C., 198, 226 Alexander, C., 3, 9, 63 Alfano, R. R., 204,224 Allebach, J. P., 185, 225 Amarel, S.,256, 274 Ambardar. V., 109. 146 Ammon, G. J., 163, 221 Anderson, D. B., 193, 221 Anderson, 1.. 259, 274 Ansoff, I., 233, 275 Arbib, M. A,, 21, 63 Armstrong, J. A,, 196, 224 Arnold. C. S.,5, 65 Arora, S.K., 146 Arrathoon, R., 171, 200, 201, 204, 221 Artman, J. 0..214,223,224 Athale, R. A., 154, 173,208,214,221,223 August, R. R., 193, 221 Avizienis, A,, 198, 221 Awwal, A. A. S.,204,224 Axelrod, R., 271, 275
B C
Babb, E., 123, 146 Bandler, W., 91, 103 Banerjee, J.. 125, 145, 146 Barakat, R.,215, 221 Basov, N. G . , 160,221 Baum, R. J . . 125, 145, 146 Beatty, C. M.,245, 275 Beaudet, P. R., 172,223
Calabria, J. A , , 163, 221 Campbell, R. J., 168, 226 Cardenas, A. F., 147 Carlotto, M., 154, 225 Carlsson, C., 103 Casasent, D., 154, 194, 198, 206, 213,219, 222, 225
279
280
AUTHOR INDEX
Case, S. K., 186, 222 Cattell, R. B., 235, 275 Caulfield, H.I., 154, 158, 172, 173, 173-174, 174, 175, 186, 181, 214, 216, 218, 221, 222,223,225,226 Chaitin, G., 229, 275 Chamberlin, D. D., 147 Champine, G. A., 1l3, 115, 147 Chang, H.,149 Chamiak, E., 53,63 Cheney, P.W., 201,222 Cheng, W. K., 216,222 Cherri, A. K., 204,224 Chin, W., 267,276 Chugui, Y.V.,214,224 Cindrich, I., 198, 226 Clapp, L. C., 196,226 Clark, D., 219, 222 Clymer, B. D., 173,222 Collins, S.A., 198, 203, 222 Compaine, B. M., 267,275 Conway, L.,5, 66 Coyle, E. J., 185, 223 Craft, N.,168,226 Croce, P., 175, 180, 225 Cross, N., 2, 3,4, 63 Cutrona, L. J., 190, 193, 214, 222 Cyert, R. M., 272,275 Cmgala, E., 103
D Damm, W., 21, 41, 63 Darbik, T. J., 173,221 Darke, J., 3,32, 63 Dasgupta, S., 2, 3, 5, 7, 8, 10, 11, 14, 16, 17, 21, 28, 32, 35, 36, 39, 41, 46,47, 51, 52, 63, 64 Davidson, S., 5, 64 Davies, D. K., 172, 223 Davis, E. W., 121, 147 deBakke.r, J., 40,64 Defiore, C. R., 120, 146,147 deKorvin, A., 238,278 DeMillo, R.,45, 64 De Mori, R.,103 Demurjian, S. A., 109, 147 Deutsch, K.W., 246,276 Devanna, M. A., 268,277
DeWitt, D. 1.. lZ7,145, 147 Dias, A., 154, 172,214,223 Diebold, J., 246,275 Dijkstra, E. W., 2,21,40,64 Dixon, J., 20, 64 Dohman, G., 63 Duhmen, 21 Doty, K. L., 122, 146, 148 Drake, B. L., 198,222 Drucker, P.F., 243, 275 Dubois, R., 103 Duffieux, P.M.,175, 222 Dumpala. S. R.,146 Dunning, G.J., 161, 225 Dutta, B. K., 250,275 Dvore, D., 154,222
E Edelman, G. M.,238,275 Eels, R.,271, 275 Eich, M. H.,109, 118, 144, 147-148 Eichmann, G., 184, 186, 198,204,222,224 El Sawy, 0. A., 241, 275 Elias, P.,175, 223 Elion, H.A., 161, 168, 170,223 Emam, A., 151 Encarnacao, J., 2,20, 64 Emst, R. L., 238, 278 Esener, S. C., 173,221 Evans, B., 2, 64 Eysenck, H.J., 275
F Fahey, L., 233, 275 Fancher, R.,235,275 Farhat, N. H.,154, 186, 187,223 Fatehi, M. T.,204,222 Feldman, M. R.,173,221,223 Feng, T.Y.,UO, 148 Fienup, J. R., 198, 226 Fink, W., 214,225 Finnila, C. A., 148 Fischler, R. B., 235, 275 Fisher, A. D., 169, 223 Fisher, R.,237, 271, 275 Fitch, J. P.,185, 223
AUTHOR INDEX Fletcher, J. C., 108, I48 Flcyd, R. W.,40,64 Flynn, M. J., 110, 148 Foster, M. I., 154,222 Foulk, C. R., 238,278 Fox, J. M., 257, 275 Freeman, H. A., 114, 115, 147 Freeman, P.,3, 64 Friedland, D.,l27,145, 147 Fu, K. S.,105 Fuld, L. M., 242, 260,275 Fushimi, S., 109, 118, U7,140, 145, I48
G Gabor, D., 161, 223 Gaines, B. R.,104, 105 Gaines, R.S.,120, 146, I48 Gajski, D.,109, 118, 137, 140, 145, 148 Galage, D., 116, 147 Gallagher, N. C., 185, 223 Ganoe, F., 242,278 Garcia-Molina, H., 109, 144, 148 Gardarin, G., 15I Gates, R., 120, 121, 146, 150 Gaylord, T. K.,158, 168,198,201,202,205, 206,211,212,223,225 George, N., 219,223 Gerfand, J., 216,222 Gero, J. S., 2, 3. 34. 64 Ghoshal, S.,260,275 Gianino, P. D., 184,223 Gibbs, H. M., 168,223. 225 Gibson. P. M., 218,221 Giertz. M., 74, I03 Gieser, J. L., 5, 66 Gilad, B., 233,275 Gilad, T.,233,275 Giloi, W.K.,2, 3, 64 Goguen, J. A., 70,I03 Gomory, R.,Zl4.275 Gonzalez-Rubio, R., I48 Goodman, I. R., I03 Goodman,J. W.,154, 171-172, 172, 173, 180, 186, 189. 198, 201,214, 222,223, 224, 225,226 Goodman, S. D., 185, 223 Gopalakrishnan,G. C., 21,64 Could, S.J., 28, 64
281
Ooutzoulis, A. P., 172, 196,223 Grabmer, J., 239,275 Gray, J. N., 147 Grey, D. S., 175,223 Gries, D. G., 21.40, 64 Gruninger, 1. H., 216, 222 Ouest, C. C., 168, 173, 198,201,205,221, 223 Guilfayle. P.S . , 206,207, 208,223,225 Gupta, M. M., 91, I03 Guttag, J., 12, 65
H Ha, B., 184, 222 Habli, M. A., 218, 223 Habli, M., 218, 221 Hall, G. O., 190,222 Hall, L. O., 92, 103 Hamilton, M. C., 193,221 Hammer, M. M., 147 Handler, W.,110, 148 Haniuda, H., I48 Hanson, N. R., 32,64 Harmon, P.,235, 275 Ham, R.,5 5 . 6 4 Hartmann, R. L., 144, 146 Haskin, R.,108, 148 Hassoun, M.H., 200,201,221 Hastings, H. M., 276 Haugen, P. R.,186,222 Hawthorn, P. B., K7, 145, 147, I48 Hayes-Roth, F.,2.64 Healy, L. D., 122, 146, I48 Heinanen, J., 7, 10, 51, 64 Heinz, R. A., 214, 223,224 Held, G. D., 148 Hikita, S., I48 Hiroshi, S., 140, 145, 148 Hoare, C. A. R.,6,21,23,40,41,64.65 Hollaar, L. A., 148 Holloway, C., 241, 276 Hong, S. J., 34, 65 Hong, Y. C., 148 Hopkins, W. C., 5, 65 Homer, J. L., 185, 194,223,224 Homitz, E.. 15.65 Honigan, F. A., 198, 201,223 Horton, M. J., 5, 65 Horvitz, s., 154,214,222
282
AUTHOR INDEX
Howard, H. C., 3,30,66 Haward, R. A., 255,276 Hsiao, D. K., 108, 109, 111, 113, 115, 125, 145, 146, 147, I48 Huang, A., 198,201,202, 203,221,224 Huang, T. S., 2U, 224 Hubka, V.,2, 65 Hurson, A. R., 109, 118, U5, l37,144, 145, 148, 149, I50 Hum, S., 210, 224
I Ichioka, Y.,168, 186, 204, 224, 226 Iizuka, K., 157, 224 Irani, K. B., 142, 143, 150 Isailovic, J., 162, 224 Ishihara, S., 198,201,224 Ishikawa, A., 249,276 Ives, €3.. 245, 275
J Jablonowski, D. P.,214,224 Jackson, K. P.,171, 224 Jaffe, E. D., 244,276 Jaikumar, R.,247,276 Janossy, I., 168,226 Jaques, R., 2, 65 Jarrel, N. F., Q7, 145, 147 Javidi, B., 185, 224 Jenkins, F. A., 191-192, 224 Johnston, A. R., 173,221 Jones, J. C., 2, 3, 8, 30, 65
K Kacpryk, J., I03 Kakuta, T., 140, 145, I49 Kambayashi, N., 149 Kambayashi, Y., 149, 151 Kandel, A., 70,82,91,92, 94,99, 103, 104,105 Kannan, 125, 145, 146 Kannan, K., 148, I49 Karim, M. A., 204,224 Kasdan, H . L.,219,223 Kasnitz, H.L., 213,224 Katevenis, M. G. H., 6, 65
Kato, H.,186,224 Kaufmann, A., I04 Kawakami, S., 148 Ken,D. S., 148, 149 Keyes, R. W., 196, 224 Kickert, W. J. M., I04 Kim, S. K., 260,275 Kim,W., 109, 118, 137, 140, 145, 148, 149 King, D., 235, 275 King, W. F., 111, 147 King, W. R., 233, 250, 275 Kiska. J. B., 91, I03 Kitsuregawa, M.,140, 145, 148 Klass, P.J., 116, 149 Kleyle, R., 238, 278 Kochen, M., 239,241,244,246,256,258,259, 263,265,267,2?2,276 Koester, C. J.. 196,226 Kogge, P.N., 17, 65 Kohonen, T.,161,224 Kohout, L. J., I03 Kolmogorov, A. N., 229,276 Kostrazewski, A , , 184,222 Kozaitis, S., 171, 204, 221 Krivenkov, B. E., 214,224 Kuhn, T. S . , 30, 57, 65 Kung, H.T.,109, 118, 130, U1, 131-132, 145, 146, I49 Kung, S. Y . , 154, 173,223 Kurland, M.A., 230, 233,277
L Lakatos, I., 57, 65 Lamb, S., I49 Lancaster, F. W., 230,276 Langdon, G. G . , Jr., I49 Langley, P., 34, 65 Lasher, M. E., 198, 222 Lasker, G. E., 104 Latombe, I. C., 2, 34, 65 Laudan, L.,55, 57, 65 Lawson, B., 3,4, 20, 65 Lea, R. M., I49 Lee, C. Y.,120,122, 146, 148, 149 Lee, J. N., 169, 193, 221, 223 Lee, S. C., 104 Lee, S. H., 173, 180, 204,213, 214, 221,223, 224
AUTHOR INDEX
Lee, S. Y.,149 Leger, J. R.,213, 224 Lehman, M. M., 3,26,26-27,27, 63,65 Lehman, P.L., 109, 118, 130,131, 145, 149 Lehman, T. J., 109, 144, 149 Lehmer, D. H., 198, 224 123, 145. 149 Leilich, H. 0.. Leith, E. N., 190, 193,222 Lenat, D. B., 2, 64 Lendaris, G . G . , 218-219, 224 Leonberger, F.J., 154, 173, 223 Levite, A., 230, 250, 271,276 Li, Y., 184, 198,204, 222,224 Lin, C. S., 123, 145, 150 Linde, R. R., 120,121, 146, 150 Lipovski, G . J., 122, 146, 148, I51 Lipovski, S. J., 150 Lipton, R. J., 45, 64,109, 144, 148 Liskov, B., 12, 65 Liuzzi, R. A,, 150 Ljungberg, S., 230, 250, 276 Lakberg, 0. J., 186,222 Lodi, E., 109, 118, l35, 145, 147 Losee, J., 55. 66 Louie. A. C. H., 214, 225 Love, H. H., Jr., 148 Lu, s., 172,222 Luccio, F.,109, 118, 135, 145, 147 Ludman, J. E., 216, 222 Luhn, H. P., 233, 276
M MacKenzie, H. A., 168, 226 Madnick, S. E., 258, 276 Maestrini, P., 109, 118, 135. 145, 147 Mahoney, W.C., 108, I50 Mait, J. N., 206, 224 Maller, V. A., I50 Mamdani, E. H., 104 Maragos, P.,185,224 March, L., 3, 66 Markcha], A., 175, 180, 225 Malarkey, E. C., 172, 223 Marom, E., 161, 225 Martin, R. D., 214, 225 Maryanski, F. J . , 150 Mascarenhas, M., 238,278 Matheson, J. E., 255, 276
283
Mathew, J. G. H., 168, 226 Matthews, D. R., 271, 276 May hew, D.R ., 271. 277 Maynard Smith,J., 18, 66 McDermon, D., 53, 63 McLaughlin, J. F., 267, 275 Mead, C . , 5, 66 Medawar, P. B., 55, 66 Miceli, W. I., 173, 198, 222, 225 Middendorf, W.H., 2, 30, 32, 66 Mikhlyaev, S. V., 214, 224 Miller, L. L., 137, 144, 149, 150 Miller, T.,230, 260, 277 Mills, H. D., 21, 66 Min, M., 259, 265, 276 Minsky, N., 150 Mirsalehi, M. M., 174, 198, 202, 206, 211, 212, 225 Miyazaki, N., 140, 145, 149 Monahan, M. A., 214.225 Montgomery, D.9,234,244,277 Moore, W.L., 246,277 Morozov, V. N., 160, 161, 168, 170, 221, 223 Moslehi, B., 171-172, 225 Mosteller, F., 253, 277 Mostow. J., 3, 20, 34, 66 Moulder, R., 120,145, 150 Mountcastle, V. W.,238, 275 Mowery, D. C., 272, 275 Mueller, R. A., 16, 36, 38, 66 Mukherjee, A., 117, I50 Mukhopadhyay, A., 150 Murakami, K., 140, 145, 149 Muraszkiewicz, M., 150 Musgrave, A . , 57, 65 Myers, G . J., 150
N Nadler. D., 268,277 Nanus, B., 268, 275 Narayan, V. K.,233, 275 Naughton, J., 4, 63 Neft, D., 154, 225 Negoita, C. V.,104 Nehemkis, P.,271, 275 Newell, A . , 14, 36, 38, 66 Newman, W.H., 266,268,278 Nguyen, H. B., 118, 123, 151
284
AUTHOR INDEX
Nickels, T., 55, 66 Nishida, T., 104 Nixon, R., 173, 221 Nyguen, H.T., 103
R
0 Ochoa, E., 185,225 Oflazer, K., 109, 118, 123, 125, 126, 145, 150 Oliver. E. J., 145, 146, 150 Owechko, Y.,161,225 Ozkarahan, A. E., 109, 118, 123, 126, 145, 150 Ozkarahan, E. A., 118, 123, 145, 150, 151
P Paek. E. G., 154, 161, 186,223,225 Pagli, L., 109, 118, 135, 145, 147 Parhami, B., 122, 146, 150 Parker, J. L., 121, 146, 150 Parlermo, C.J., 190,222 Patterson, R. H., 198,222 hull, M .C., 120, 122, 146, 149 Pedrycz, W., 103 Peng, T., 124 121, 146, 150 Perlis, A., 45, 64 Peygharnbarian, N., 168,225 Platt, W.,235, 260, 262,277 Polya, G.,239,277 P o p , U. M.,160,221 Popper, K. R., 32,55,57,66 Porcello, L. J., 190. 192, 193, 221,222 Porter, A. B., 180, 225 Porter, M.E., 230,233,237,277 Powell, J. A.,2, 65 h e l l , J., 64 pmta, A., 154, 186,223 hendergast, K.A,, 235,278 Preston, K.,204, 205, 225 haltis, D., 154, 161, 174, 186, 187, 198, 208, 221,223,225
Q Qadah, G. Z., 109, 115, 116, 142, 143, 144, 145, 150
Rabitz, H., 216,222 Ralescu, D. A., 104 Ramamoorthy, C. V., 33,66 Rauch-Hindin, W. B., 277 Rawcliffe, R. D.,190, 226 Redfield, S., 116, 147 Redmond, J., 168,226 Rehak, D. R., 3, 20, 30, 66 Reid, J. J. E., 168, 226 Resnick, P.,239, 276 Rhodes, J. E., 175,225 Rhodes, W. T., 154, 185, 194,206, 207, 222, 223,225 Rittel, H.W.,31, 66 Robinson, D. Z., 175, 223 Rohmer, J., 148 Ramanelli, E., 266, 268, 278 Rase, L. L., 238,278 Rosenthal, R. S., 112, 115, 151 Rathschild, W. E., 230, 233, 277 b e , P. G., 3, 66 Rudolph, J. A., 121, 151 Ruina, J. P., 190,226 Ruse, M.,18, 66
S Sahni, S., 15, 65 Sammon, W. L., 230,233,277 Sanchez, E., 103, 104 Sata, J., 2,66 Schaefer, D. H.,171, 225 Scherlis, W.L.,45, 66 Schlechtendahl, E. G., 2,20,64 Schmidt, U. J., 173,225 Schrnucker, K.J., I04 Schneider, M.,82,94,104 Schneider, W.,225 Schon, D. A,, 4, 66 Schon, B.,242,278 Schuster, S. A., 118, 123, 145, I50, I51 Scott, D. S., 45, 66 Scott, W. R., 241,277 Seo, K.,I49 Serra, J., 185,225 Seymour, R. J., 173,225 Shahdad, M.,S,60, 66
AUTHOR INDEX Shamir, J., 172, 173, 173-174, 174, 225 Shaw, E., 13. 151 Shaw, H. J., 171, 171-172, 224, 225 Shepard, R. G., 214, 225 Sheraga. R. I., 5, 66 Sherwin, C. W.. 190, 226 Shibayama. S . , 140, 145, 149 Shimura, M., 105 Shriver, B. D., 2, 3, 5, 16, 63, 64 Siewiorek. D.P..14, 66 Silverman, 235 Simon, H. A , , I, 2, 3, 6, 7, 15, 32, 34, 35, 36, 38,66.67 Sincovec, R.,33. 6 7 Skala, H. J.. 104 Slotnick, D. L., 113, 121, 123, 151 Smith, A . J., 46.47.67 Smith, D.C., 109-110, 151 Smith, D. R., 21, 64 Smith, D., 123, 145, 150 Smith, J. M., 109-110, 123, 145, 150, 151 Smith, K. C., 118, 123, 145, 150, 151 Smith, S. D., 168, 226 Soffer. B. H., 161, 225 Solomonoff, R. J.. 229, 277 Sommerville, I., 33, 67 Song, S. W., 109, 114, 115, 118. 131, 133, 135, 145, 151 Speiser, J. M., 207,226 Spillers, W. R., 2, 3, 67 Spitalnik, R., 230, 233, 277 Sriram, D..3, 20, 30, 66 Srivas, M. K., 21, 64 Stallybrass, O., 30, 63 Stanley, G. L., 218-219,224 Steige, G., 123, 145, 149 Steiglitz, K., 216, 222 Sternberg, R. J., 240, 277 Stonebraker, M. R., 148 Stoner, W. W., 198, 201, 223 Strand, T. C., 186, 226 Strawser, P. R.,109, 147 Streibl, N., 202,221 Strong, J. P., 171, 225 Strong, K . , 260,277 Su, S. Y. W., 114, 115, 123, 145, 151 Sugeno, M . , 104 Suppe, F., 55.67 Sviokla, J. J., 259, 277 Svoboda, A., IY, 226
285
Swamlander, 117 Sweeney, D. W., 185,225 Synnott, W. R., 277 Szabo, N. S., 197, 199, 226 9zu, H. H., 168, 173,226
T Taghizadeh, M. R., 168, 226 Tai, A., 198, 226 Takagi. T., 104 Takeda, E., 104 Takeda, M., 189, 226 Takeguchi, T., I04 Talbot, T., 2, 64 Tamura, P. N . , 214,226 Tanaka, H., 140, 145, 148 Tanaka, K . , 105, 151 Tanaka, R. I., 197, 199,226 Tanenbaum, A. S., 14, 67 Tanida, J., 168, 186, 204, 224, 226 Taylor, H. F. , 210, 226 Termini, S., 104 Thomas, D. E., 34, 67 Thomas, D. T, 163, 221 Thompson, B. J., 213, 226 Thurber, K. J., 147 Tichy. N., 268, 277 Tippett, J. T., 196, 226 Tong, F., 151 Tooley, F. A. P., 168, 225-226, 226 Traiger, I. L., 147 Tricoles, G. P., 214, 222 Trillas, E., 104 Tsai, C. S . , 195, 226 Tsoni, E., 216, 222 Tsunoda, Y., 198, 201, 224 Tu, J. C., 108, 147 Tukey, J. W., 253,277 Tur. M., 171-172, 225 Turing, A. M.,229, 277 Turner, R.,53, 67 Tushman, M. L., 246,266,268,277,278 Tushman, M.,268,277 Tverdokhleb, P.E., 214, 224
U Ury,W., 2 3 , Zll, 275
286
AUTHOR INDEX
V Valach, M . , 197,226 Valdes, J., 109, 144, 148 Valduriez, P., 151 Vander Lugt, A. B., 180,226 Vanderburgh, A., 196,226 Vanderslice, R., 149 Varghese, J., 16, 36, 38. 66 Vari. A,, 104 Verber, C. M., 210, 211, 226 Vemest, E. I., 211, 223, 225 Vivian, W. E., 190, 193,222 Von Winkle, W. A., 214,222
White, H. E., 191-192,224 Whitehouse, H. J., 207,226 Wiener, R., 33, 67 Wilensky, H. L., 234,278 Wiley, W. J., 208, 223 Wilkinson, K.W., 127, 145, 147 Wilsey, P. A., 7, 10, 51, 64 Winston, P., 235, 278 Wirth, N., 21.23.41, 65, 67 Wiseman, C., 230, 278 Wong, E., 148 Woody, L. M., 154, 172,214,223 Woon, P.A., 235, 278 Worthy, R. M., 108, 147 Wu, W. H.,173,221 Wyant, J. C., 214,226
W Wagers, R., 230,260,278 Wagner, A., 21.41, 64 Wagner, K.,161, 174, 225 Wagner, R. K.,240,277 Wah, B. W., Un, 145, 151 Walker, A. C., 168, 226 Walker, D., 4, 63 Wang, P. P., 104 Wang, Y. R.,258, 276 Warman, E. A., 2, 66 Wasrnundt, K. C., 204, 222 Waterman, D. A., 2, 64,258, 278 Webber, M. M., 31, 66 Wegner, P., 60,67 Weinberg, C. B.,234,244, 277 Westerberg, A. W., 2, 67 Whalen, T.,242,278 Wherrett, B. S.,168, 226 Whewell, W., 55, 67
Y Yager, R. R., 104, 105 Yao, B. S., 127, 145, 151 Yao, S. B.,151 Yokota, H.,140, 145, 149 Yovits, M. C., 238,278 Yu, P.. 173,221
Z Zadeh, L. A., 69.70, 78,91, 105 Zeidler, H. C., 123, 145, 149 Zemankwa-Leech, M., 99,105 Zernike, V. F., 180, 226 Zimmerman, G.,3, 67 Zirnrnerrnan, H.J., 105 Zyxno, P.. 105
SUBJECT INDEX A
Automatic microcode generator, structure, 16-17 Axiom of assignment, 41-42
Abstraction level, 47-48 Acousto-optic Bragg cell, 193-194 Acousto-optic processors, 206-208 Acousto-optic systolic matrix-vector multiplier. 206-207 Addition. optical computing, 155-157 Algebraic processors, 214-218 bimodal optical computer. 215-218 matrix-matrix multiplication, 214-215 matrix-vector multiplication. 214-215 Algorithmic approach, 38-39 Analog optics, 216 Analog processors, 175 applications spectrum analysis, 193-196 synthetic-aperture radar, 189-193 correlators, 178-179 Fourier transform processors, 175-178 image processors. 182-186 nonlinear processors, 186-189 spatial filters. 178-182 Analysis-synthesis-evaluation paradigm, 30-34 problems, 31-33 AND gate, optical, 166 Architecture block-oriented database, 121-123 database machines, 109, 118 endo-architecture, 35 exo-architecture, 14, 24-26 fully parallel associative, 120-121 management intelligence systems, 253-257 Artificial intelligence as functionality or competence, 241 model for relation to natural intelligence, 262-267 paradigm, 34-38 as process. 240-241 ASE paradigm, 30-34, 60 problems, 31-33 ASLM. 135-137 Associative array approach, 113 Associative memory ASLM. I37 holographic, 161 Associative Search Language Machine, 135-137
B Backend system, 113 Berra’s classification, database machines, 113 Bimodal optical computer, 215-218 Binary number system, 199 Biological evolution, 17-18 Bistable optical devices, 166-168 Block-oriented database architecture, low VLSIcompatible database machines, 121-123 Boral and Redfield‘s classification, database machines, 116 Bounded rationality, 7, 17 B r a g angle, 193 Bray and Freeman’s classification, database machines, 114
C Cache memory, design, 46-47 Cache miss ratio, 46 CAD systems, 2 CAFS, 123 CASSM, 123 CASSS, 122-123 Cellular logic approach, 113 Cellular organization. structure. 121-122 Certainty factor, 95 Champine’s classification, database machines, 113 Chinese remainder theorem. 197 COFESS, 94-99 certainty factor, 95 example, 95-99 pattern recognition, 94-95 types of relations, 94 Coherent transfer function, 182 Communication, management intelligence systems, 270-272 Competition, management intelligence systems, 270-272 Complex spatial filters. 180-182 Computer-aided design, see CAD systems
287
288
SUBJECT INDEX
Computer hardware description languages, 4-5 Computer language, design, 60-61 Computing systems design, 2 Concurrent space, definition, 110 Consumer analysis, 244 Content Addressable File Store, 123 Content-addressablememory, 202 Context Addressed Segment Sequential Memory, 123 Context Associated Segment Sequential Storage, 122-123 Controller, ASLM, 135 Control operator processing, holographic memories, 160-161 Control processor, Delta, 140 Cooperation, management intelligence systems,
no-m
Correlators, 178-179 Critical slowing down, 168
D Data, 231 Database computer, 125-126 Database machines, 108-110, 119,143-144 architecture, 109 associative-based, 118 classification, 110-112, 144-146 based on architectural characteristics and technology, 118 Berra, 1u Boral and Redfield, 116 Bray and Freeman, 114 Champine, 113 Hsiao, 113-114 new, 116-119 Qadah, 116 relationship among, 115 Rosenthal, 112 Song, 114,116
Su, 114 indexing level, 116, 119 low VLSI-compatible, 120-123 multiprocessor backend, 126-128 pipelining, 110-111 ment development in technology, 117-118 semi VLSIcompatible, 123-129 tree-structured, 118
see also High VLSI-compatibledatabase
machines Database management systems, 256 Database processor, ASLM, 135 Database systems, indexed, 108 Data dependencies, 17 Data/knowledge-basemanagement systems, 231 Data path, 9-10 DBC, 125-126 Deblurring process, 184-185 Delta, 140-142 Demarcation criterion, 56 Design assembly of components, 58 characteristics, 3, 28-29 as change, 3-5 decision-making, 15 engineering-science distinction, 4 incompletenessof requirements, 7-8 module dependency diagram, 13 plausibility dependency graph, 12-13 plausibility statement, 12 preciseness of requirements, 6-7 representation of target, 8-13 requirement specifications, 5-8 structural form production, 9 values, 4-5 as hypothesis, 58 as implementation, 48-49 as scientific discovery, 55-61 basis, 57-61 hypotheticodeductiveform of reasoning, 55-56 Kuhnian paradigms, 57 as specification, 48-49 as theorem, 45 theory of plausible designs, 11-12 Designer, world view, 32,38 Design method, 29-30 Design paradigms algorithmicapproach, 38-39 analysis-synthesis4uation paradigm, 30-34 artificial intelligenceparadigm, 34-38 formal design paradigm, 40-47 methods, 29 pattern, 30 see also Theory of plausible designs Design processes, 1-3 based on theory of plausible designs, 52 evolutionary nature, 17-28
SUBJECT INDEX concepts of testing and adaptation, 18-19 design-while-verify approach, 21-22 distinction between iteration and evotution, 20 laws of program evolution dynamics, 27 modification of original requirements, 25-26 ontogenic and phylogenic evolutions, 27-28 parallels with hypothetico-deductive form of reasoning. 58-59 reconciliation with formal design paradigm. 44-45 stepwise refinements, 21 succession of versions, 26-27 satisficing nature, 14-17 Design style, 32, 36 Design theory, aims, 2 Design-while-verifyapproach. 21-22 DIALOG, 127. 129 Diffraction, 171 Diffraction-pattern sampling processor. 218-219 Digital optical processors, 220 Digital processors, 196-197 acousto-optic processors, 206-208 advantages, 196 computing techniques, 199-203 holographic processors, 204-206 integrated-optical processors, 208-212 number representation, 197-199 SLM-based processors, 203-204 DIRECT. 126-128 Direct search processing, 114 Distributed associative LOGic database machine, 127. 129 Distributed network data node. 112
E Economic analysis. 245 Effectiveness condition, management intelligence systems, 260-262 Efficiency constrain&. 12 Endo-architecture, 35 ENIAC, 117 Environmental intelligence, 250 Exo-architecture, 14, 24-26 Expert systems, see Knowledge-based systems Extension principle, fuzzy sets, 77-78
289 F
Falsification principle, 56 Fess system, 91-94, 102 Fiber optics, 170 interconnect, 171-172 Financial intelligence, 249 Flexible Manufacturing Systems, 247 Formal design paradigm, 40-47 axiom of assignment, 41-42 limitations, 45-47 postcondition, 40 precondition. 40 proof outline, 43-44 reconciliation with design evolution, 44-45 rules of inference, 42 Fourier analysis, 175 Fourier transformations, image content transformation, 183-186 Fourier transform processors, 175-178 Free space interconnect, 170, IR-175 Fresnel diffraction integral, 176-177 FRKB model, 99-101 Fully parallel associative architecture, low VLSI-compatible database machines, 120-121 Functionally specialized approach, 113-114 Fuzzy, definition, 69-70 Fuzzy expected interval, 82-87 additions, 84 mapping table, 84-87 Fuzzy expected value, 79-81 weighted, 88-90 Fuzzy Relational Knowledge Base model, 99-101 Fuzzy sets, 69-70 application to expert systems, 90-91 certainty of conclusion, 93 COFESS, 94-99 Fess system, 91-94 measure of belief and disbelief, 92-93 relational knowledge base, 99-101 bounded difference, 74 bounded sum. 74 complement, 74 elasticity of, 91 equality, 73 extension principle, 77-78 fuzzy expected interval, 82-87 fuzzy expected value, 79-81
SUBJECT INDEX Fuzzy sets (conrii.) grades of membership, 71-73 intersection, 74 left-square, 75 normalized, R s-function, 76 product, 74 set theoretic operations, 73-78 S-function, 76 versus theory of probability, 78-79 union of, 73 weighted fuzzy expected value, 88-90 Fuzzy set theory, history, 72
G Gas lasers, 163 Givens rotation, 211-212 GRACE, 140-141 Gratings, 193 Gulifoyle's matrix-vector multiplier, 207-208
H Hardware description language, data fragment, 11
Heuristics, 36, 38 Hierarchical memory, Delta, 142 High-level microprogramming languages, 5 High VLSI-compatible database machines, 130-143 ASLM, 135-137 Delta, 140-142 GRACE, 140-141 MIRDM, 142-143 parallel pipelined query processor, 137-140 systolic organization, 130-132 tree machines, 131-135 Hoare formula, 40-42 Holographic interconnects, 173-175 Holographic memories, optical computers, 157-161 Holographic processors. 204-206 Holographic system, schematic diagram, 158-159 Holography, 158 Hsiao's classification, database machines, 111, 113-114 Hybrid processors. 212-213, 220
advantages, 216 algebraic processors, 214-218 diffraction-pattern sampling processor, 218-219 general processor, 213 Hypothetico-deductive form of reasoning, 55-56 parallels with evolutionary scheme, 58-59 relation to theory of plausible designs, 61
I Ill-structured problems, 34-35 Image content transformation, 183-186 Image processors, 182-186 image content transformation, 183-186 shape transformation, 186 Imaging, 172 Indifference curve, 236, 267 Inductive model, 32 Information, 231 distinctions between knowledge and understanding, 238-239 Input-output devices, optical computers, 168-170 Integrated optical matrix-vector multiplier, 210-211 Integrated-optical processors, 208-212 Integrated-optical spectrum analyzer, 194-196 Intelligence, 236 analysis. 235 as computation, 240-241 definitions, 234-235 environmental, 250 essence of, 256 financial, 249 functional capabilities, 235 general requirements, 250-252 market, 243-246 natural, 238-240 nature of, 234-236 organizational, 236-238, 249 in products, 248 in services, 248-249 technology, 246-249 timeliness, 235 triangle, 242-243 see also Artificial intelligence; Management intelligence systems
291
SUBJECT INDEX Intelligence report, 235 Intelligent production, 248 Interconnection, optical computers, 170-175 Interconnection network subsystem, MIRDM, 143 Interface processor, Delta, 140 Interference, 171 Internal maps, 237,263-265
K KE! systems, see Knowledge-based systems Kerreffect. 162 Kihnian paradigm, 30 Knowledge, 230 distinctions between information and understanding, 238-239 Knowledge base, 35 as K-paradigm, 58-59 relation between size and effectiveness/cost ratio, 259-260 Knowledge-based systems, 2, 34, 36 application of fuzzy set theory, see Fuzzy sets certainty factor, 90-91 development, 258 objective, 90 K-paradigm, 30, 57, 59
L Large associative processor, 113 Large backend system, 112 Law of continuing change, 27 Law of increasing entropy, 27 Law of statistically smooth growth, 27 Light-emitting diodes, 206 Liquid crystal light valve, 203 Local maps, 264 Location-addressable memory, 202 Logic arrays, optical computers, 164-168 Logic functions, integrated optical implementations, 209-210 Logic-in-memory system, 113 Logic per track retrieval system, 121-122 Low VLS1-compatible database machines, 120-123
block-oriented database architecture, 121-123 fully parallel associative architecture, 120-121
M Mach-Zehnder interferometer, 209-210 Magneto-optic materials, 162 Maintenance processor, Delta, 142 Management Information Systems, 229-230 Management intelligence sys&ms, 227-228 architecture, 253-257 diagram, 254 question-answering programs, 255-256 question-asking programs, 255 screening function, 253 aspectslkinds of technology, 230-232 background, 229-230 basic principles, 260 communication, competition and cooperation, 270-272 development lifecycles, 257-260 effectiveness condition, 233-234,260-262 emergent properties and systemic intelligence,
M-m environmental intelligence, 250 financial intelligence, 249 general requirements for intelligence, 250-252 management in a MINTS environment, 269-270 management of, 267-269 management with, 269 market intelligence, 243-246 model for relating natural and artificial intelligence, 262-267 need for, 232-233 organizational intelligence, 236-238, 249 purpose, 228 requirements and uses, 242-243 role of machines, 266 technology intelligence, 246-249 see also Intelligence Mapping table, 84-87 Market analysis, 243-244 Market intelligence, 243-246 consumer analysis, 244 economic analysis, 245 market analysis, 243-244 market repositioning, 245-246
292
SUBJECT INDEX
Market intelligence (conru.) trade analysis, 244-245 Market repositioning, 245-246 Mass storage subsystem, MIRDM, 143 Master backend controller, MIRDM, 142 Material technology, 230 Matrix-matrix multiplication, 172, 214-215 Matrix-vector multiplier algebraic processors, 214-215 acousto-optic processors, 207-208 free-space interconnect, 172-173 integrated-optical processors, 210-211 Means-ends analysis, 36, 38 Methodology, definition, 30 Michigan Relational Database Machine, 142-143 Microcode synthesis system, 16-17, 36-38 Microoperation sequence, 37 Minimally encoded micro-instruction organization. 39 MIRDM, 142-143 Model-base management systems, 231 Modified signed-digit system, 198 Module dependency diagram, 13 Moore’s law, I17 Multiple processor directlindirect search, 114 Multiplication matrix-matrix, 172, 214-215 matrix-vector, see Matrix-vector multiplication optical computing, 156-157 Multiprocessor backend database machine, 126-128 MultiProcessor Combined Search, 114
N NAND-based holographical digital optical processor, 205-206 Natural intelligence, 238-240 as a process/performance, 239-240 model for relation to artificial intelligence, 262-261 Neural networks, 174 accuracy effect, 188-189 behavioral differences from brains, 181 implementation technologies, 187-188 Nonlinear processors, 186-189 Non-Von DBM project, 137
Number representation, digital processors, 197-199
0 Ontogenic design evolution, 28 Operating procedures, 231 Operational version, 49 Optical computers, 219 classification, 220 holographic memories, 157-161 input-output devices, 168-170 interconnections, 170-175 fiber optic interconnect, 171-172 free space interconnect, 172-175 motivation, 170-171 logic arrays, 164-168 assessment, 168 intended uses, 164-165 methods. 165-168 optical disks, 161-164 see also Analog processors; Digital processors; Hybrid processors Optical computing, 154-155 accuracy, 187-189 addition and subtraction, 155-157 advantages, 164 definition of, 154 domain, 181-188 multiplication, 156-157 spatial light modulator, 169-170 transmittance function, 155 Optical disks, optical computers, 161-164 Optical parallel logic device, 204 Optical programmable logic arrays, 208 Optical spectrum analyzer, 193-196 Optical threshold-logic gates, 200-201 Optical waveguides, 208-209 Optimal solution, computational cost, 15-16 Organizational intelligence, 236-238, 249 as a capability of competence, 237-238 as a process, 237 Overall certainty, definition, 93
P ?r-Function, 76 Paradigm shift, 57, 59
SUBJECT INDEX Parallel pipelined query processor, 137-140 Pascal, axiom and proof rules for, 41-42 Pattern combiner, 202 Patterned interconnect. I72 Pattern recognizer. 202 Pattern splitter, 202 Pattern substituter, 202 Phase-change optical materials, 162-163 Phylogenic evolution, 28, 34 Physical implementation, 49 Pipelining, 110-111 Plausibility dependency graph, 12-13 Plausibility state. SO verification, 51-52 Plausibility statement, 12. 50-51 Postcondition, 40 Preciseness, of design requirements, 6-7 Precondition, 40 Preprocessors, ASLM, 135 Problem space, 34 current design, 35-36 definition, 35 Processing cluster subsystem, MIRDM, 143 Proof rules, 42 Proximity relation, 101
Q Qadah‘s classification, database machines. 116
R RAP, 123-125 RAPID, 122 RARES, 123 Redesign, 33 Relational associative processors. 123 Relational database engine. Delta, 142 Relational knowledge base, fuzzy, 99-101 Residue number system, 197-198 RosenthaPs classification, database machines, 112 Rotating Associative Processing for Information Dissemination, 122 Rotating Associative Relational Store, 123 Rules of inference, 42
293 S
Satisfactorinesa, criterion, 15 Secondary storage interface, ASLM, 135 Semi VLSI-compatible database machines, 123-129 database computer, 125-126 DIALOG, 127, 129 DIRECT, 126-128 relational associative processors, 123 Set, definition, 71 Set theory, 71-72 S-function. 76 Shadow casting, 204 Shape transformation, 186 Signed-digit number system, 198 Similarity relation, 100 Single processor direct search, 114 Single processor indirect search, 114 S * M, design, 7-8 Smart peripheral system, 112 Sociotechnological aspect, 231 Software, 230 development, waterfall model, 33 phylogenic evolution, 34 Song’s classification, database machines, 114. 116 Sorting pipe, 139-140 Spatial filters, 178-182 Spatial light modulator, 155-156, 170 digital processors, 203-204 interconnect, I72 parameters, 169 Spherical lens. Fourier transform, 176 Standard procedure, 20 STARAN, 121 State-space, 263-264 Statistical analysis packages, 256 Stepwise refinement, 21 Storage hierarchy, 113 Subtraction, optical computing, 156 Surface acoustic wave transducer, 194, 196 Su’s classification. database machines, 114 Symbolic substitution, 202-203 Synthetic-aperture radar, 189-193 geometry, 191 image reconstruction. 192-193 image recording, 190-192 optical processor, 192-193 Systolic join, flow of data in, 131-132
294
SUBJECT INDEX
Systolic organization, high VLSI-compatible database machines. 130-132
T Table look-up, 200-202 Technology analysis, 246-247 Technology intelligence, 246-249 production, 248 in products, 248 in services, 248-249 technology analysis, 246-247 technology management, 247-248 Technology Transfer, 249 Theory of plausible designs, 11-12, 47-55, 62 characteristics, 54-55 constraints, 49 dependency graph, 53-54 evolutionary structure, 54 logical rules of partitioning, 52 paradigm structure, 52-54 plausibility statements, 50-51 plausibility states, 50 premises, 47-49 relation to hypothetico-deductive method, 61 verification, 51-52 Theory of probability, versus fuzzy set theory, 78-79 Thermoplastics, 163 Threshold gates, 200-201 Trade analysis, 244-245
Transmission, versus incident frequency, etalon, 166-167 Transmittance function, optical computing, 155 Tree machines, 131-135 data transmission, 133 organization, 131, 133 unmirror tree, 132, 134
U Understanding, distinctions between information and knowledge, 238-239 Uniprocessor computer, design, 35 Utility-theory concepts, 263
V Valid design problem, 4 Value-map, 263-264 Vander Lugt filter, 181-182 Videodiscs, as data storage systems, 162 von Neumann bottleneck, 199-200 von Neumann machine, 199
W Waterfall model, software development, 33 Weighted fuzzy expected value, 88-90 Well-structure design problems, 15-17
Contents 01 Previous Volumes Volume 1 General-Purpose Programming for Business Applications C‘ALVIN
c. ~ O T l . l E l 3
Numerical Weather Prediction A. PHILLIPS NOKMAN The Present Status of Automatic Translation of Languages YEHOSHUA BAK-f{ILLEL
Programming Computers to Play Games AKTHIJK L. S A M ~ J E L Machine Recognition of Spoken Words R I C H A K UFA.I.EHCHANL) Binary Arithmetic GI:OK(;I: W. REITWIESNEK Volume 2 A Survey of Numerical Methods for Parabolic Diflerential Equations JIM DOUC;LAS. JR. Advances in Orthonormalizing Computation PHILIP J. DAVISA N D PHILII’ RABINOWITZ Microelectronics Using Electron-Beam-Activated Machining Techniques KENNETH R. SHOIJLDEKS Recent Developments in Linear Programming SAUI.I . GLASS The Theory of Automalta: A Survey ROHEKTMCNAUGHTON
Volume 3 The Computation of Satellite Orbit Trajectories SAMUI:~. D. CONTF. Multiprogramming E. F. Cor>l, Recent Developments of Nonlinear Programming PHII.IP wOI.Fk Alternating Direction Implicit Methods GAKKET B i R K t m F i , RICHAKD S. VAKOA,ANI) D A V I DYOIJNO Combined Analog Digital Techniques in Simulation kiAKOLI> F. SKKAMSTAI) information Technology and the Law REEII C . L A W L O K Volume 4 The Formulation of Data Processing Problems for Computers W I L L I A C. M MCGEE All-Magnetic Circuit Techniques R . BI.NNION A N I ) HEWITTD. CRANE DAVID
296
CONTENTS
OF PREVIOUS VOLUMES
Computer Education E. TOMPKINS HOWARD Digital Fluid Logic Elements I]. H. GLAETT~I Multiple Computer Systems WILLIAM A. CURTIN
Volume 5 The Role of Computers in Electron Night Broadcasting JACKMOSHMAN Some Results of Research on Automatic Programming in Eastern Europe TURKSI WLADYSLAW A Discussion of Artificial Intelligence and Self-organization GORDON PASK Automatic Optical Design ORESTES N . STAVROUDIS Computing Problems and Methods in X-Ray Crystallography CHARLES L. COULTER Digital Computers in Nuclear Reactor Design ELIZAHETH CUTHILL An Introduction to Procedure-Oriented Languages LiAKRY D. HUSKEY
Volume 6 Information Retrieval E. WALSTON CLAUDE Speculations Concerning the First Ultraintelligent Machine lRVlNG JOHNG o O D Digital Training Devices CHARLES R. WICKMAN Number Systems and Arithmetic HARVEY L. GARNER Considerations on Man versus Machine for Space Probing P. L. BARCELLINI Data Collection and Reduction for Nuclear Particle Trace Detectors HERUEKT GELEKNTER
Volume 7 Highly Parallel Information Processing Systems JOHNC. MURTHA Programming Language Processors RUTHM. DAVIS The Man- Machine Combination for Computer-Assisted Copy Editing WAYNE A. DANIELSON Computer-Aided Typesetting WILLIAMR. BOZMAN Programming Languages for Computational Linguistics ARNOLDC. SATTERTHWAIT
CONTENTS OF PREVIOUS VOLUMES
Computer Driven Displays and Their lJse in Man- Machine Interaction ANIMIES VAN DAM
Volume 8 Time-Shared Computer Systems THOMAS N . PIKIT.J K . Formula Manipulation by Computer J E A N E. S A M M E T Standards for Computers and Information Processing T . B. STI~EL, JK. Syntactic Analysis of Natural Language NAOMISAGCK Programming Languages and Computers: A Unified Metatheory R. NAKASIMHAN Incremental Computation LIONELLOA . LOMHAHIN
Volume 9 What Next i n Computer Technology W. J . P(IIJPELI~A~JM Advances i n Simulation JOHN M c L ~ o i ) Symbol Manipulation Languages PAUI W. AHHAHAMS Legal Information Retrieval Aviiirui S. FKAENKIL Large-Scale Integration A n Appraisal L. M . SI~ANIX)KFEK Aerospace Computers A S. BUCHMAU The Distributed Processor Organization L. J . Koc7.1.1.~
Volume 10
.
H u ma n i s m . TechnoI o gy and Language CHAHl.ES [kC'AKI.O Three Computer Cultures: Computer Technology, Computer Mathematics, and Computer Science P F ~ E WEC~NI..K K Mathematics in 1984 The Impact of Computers BHYAN THWAI-IES Computing from the Communication Point of View E. E. DAVID.JK. Computer- Man Communication: Using Graphics in the Instructional Process FKIWKIC.K P. BH(H)KS,J K . Computers and Publishing: Writing, Editing, and Printing ANDKITS V A N DAMA N D DAVIDE. RICE A Unified Approach to Pattern Analysis ULF GRFNANDLH
297
298
CONTENTS OF PREVIOUS VOLUMES
Use of Computers in Biomedical Pattern Recognition S. LEDLEY ROHEKT Numerical Methods of Stress Analysis WILLIAM PKACiBK Spline Approximation and Computer-Aided Design J. H. AHLHERG Logic per Track Devices D. L. SLOTNICK
Volume 11 Automatic Translation of Languages Since 1960: A Linguist’s View HARKYH. JOSSELSON Classitication. Relevance. and Information Retrieval D. M. JACKSON Approaches to the Machine Recognition of Conversational Speech KLAUS W. OTTEN Man-Machine Interaction Using Speech DAVIDR. HILL Balanced Magnetic Circuits for Logic and Memory Devices R. B. KII:.HURTZ A N D E. E. NGWHALL Command and Control: Technology and Social Impact ANTHONY DEBONS
Volume 12 Information Security in a Multi-User Computer Environment JAMISP. ANIXKSON Managers, Deterministic Models, and Computers G . M . FEKKEKO DIROCCAFEHKEKA Uses of the Computer in Music Composition and Research HARRYB. LINCOLN File Organization Techniques DAVIDC. ROHEKTS Systems Programming Languages D. P. SHECHTER, F. W. TOMPA,A N D A. VAN DAM R. D. BI~KGEKON. J. D. CANNON. Parametric and Nonparametric Recognition by Computer: An Application to Leukocyte Image Processing JUDITHM. S. PREWITT
Volume 13 Programmed Control of Asynchronous Program Interrupts RICHAKII L. WEXELBLAT Poetry Generation and Analysis JAMESJOYCE Mapping and Computers PATKICIA FULTON Practical Natural Language Processing: The R E L System as Prototype AND BOZENAHENISZ THOMPSON FREDERICK B. THOMPSON Artificial Intelligence-The Past Decade B. CHANI)RASEKAKAN
CONTENTS OF PREVIOUS VOLUMES
299
Volume 14 On the Structure of Feasible Computations J . HARTMANIS AND J . SIMON A Look at Programming and Programming Systems T. E. CHEATHAM. JK. AND JUDY A. T0W"t:L.Y Parsing or General Context-Free Languages AND MICHAIII. A. ~ { A K K I S O N S ~ J S AL. N GRAHAM Statistical Processors W. J. POPPI:I.BA~IM Information Secure Systems [)AVII> K . HSIAOAND RICHARD 1. B A ~ J M Volume 15 Approaches t o Automatic Programming ALANw. BlEKMANN The Algorithm Selection Problem JOHN R. Rick: I'arallel Processing of Ordinary Programs DAVID J. K ~ K K The Computational Study of Language Acquisition LARRYH. REIXEH The Wide World of Computer-Based Education DONALD BITZI:.R Volume 16 3-11 Computer Animation CHARLES A. C'SUKI Automatic Generation of Computer Programs NOAM s. PRYWES Perspectives in Clinical Computing KEVINC. O K A N EAND EDWARD A. HALUSKA The Design and Doveliipment of Resource-Sharing Services in Computer Communication Networks: A Survey SANDKA A. MAMKAR Privacy Protection in Information Systems REINTLIKN
Volume 17 Semantics and Quantification in Natural Language Question Answering W. A. W(wws Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base NAOMISA
300
CONTENTS OF PREVIOUS VOLUMES
Volume 18 Image Processing and Recognition AZRIEL ROSENFELV Recent Progress in Computer Chess MONROEM . NEWBORN Advances in Software Science M. H. HALSTEAD Current Trends in Computer-Assisted Instruction PATRICK SUPPB Software in the Soviet Union: Progress and Problems S. E. GOODMAN
Volume 19 Data Base Computers DAVIDK. HSIAO The Structure of Parallel Algorithms H. T. KUNC Clustering Methodologies in Exploratory Data Analysis RICHARD D u n s A N D A. K. J A I N Numerical Software: Science or Alchemy? C. W. GEAR Computing as Social Action: The Social Dynamics of Computing in Complex Organizations ROB KLINGA N V WALTSCACCHI
Volume 20 Management Information Systems: Evolution and Status GARY W. DICKSON Real-Time Distributed Computer Systems W. R. FRANTA, E. DOUGLAS JENSEN, R. Y. KAIN,AND GEORGE D. MARSHALL Architecture and Strategies for Local Networks: Examples and Important Systems K. J. THURBER Vector Computer Architecture and Processing Techniques KAI HWANC,SHUN-PIAO SCJ,A N I ) LIONELM. NI An Overview of High-Level Languages JEANE. SAMMET
Volume 21 The Web of Computing: Computer Technology as Social Organization ROB KLING AND WALT SCACC'HI Computer Design and Description Languages SURRATA DASCUPTA Microcomputers: Applications, Problems, and Promise ROBERTC. GAMMILL Query Optimization in Distributed Data Base Systems GIOVANNI MARIASACCO A N D S. BINCYAO Computers in the World of Chemistry PETERLYKOS
CONTENTS OF PREVIOUS VOLUMES
301
Library Automation Systems and Networks JAMESE. RWH
Volume 22 Legal Protection of Software: A Survey MKMAELC. G E M I G N A N I Algorithms for Public Key Cryptosystems: Theory and Applications S . LAKSHMIVAKAHAN Software Engineering Environments I . WASSEKMAN ANTHONY Principles o f Rule-Based Expert Systems AND RICHARD 0. DUDA BKIK'EG . BUCHANAN Conceptual Representation of Medical Knowledge for Diagnosis by Computer: M D X and Relaled Systems B. CHANIIKASEKAKAN A N D SANJAY MITTAL Specification and Implementation of Abstract Data Types ALI'S T . BEKZTISS A N I > SATlSH THATTF
Volume 23 Supercomputers and VLSI: The Elfect of Large-Scale Integration o n Computer Architecture SNYIXK LAWKENCE Information and Computation J . 1". TKALIH A N I ) H . WOZNIAKOWSKI The Mass Impact of Videogame Technology THOMAS A. D I ~ F A N T I Developnients in Decision Support Systems B. WHINSTON ROI3I:KT 11. BON('%I:li. CI.YI>k:W. HOLSAPPII:, AN11 A N D K ~ W Digital Control Systems PETERDOKATO A N D DANIEL PETEKSEN International Developments in Information Privacy G. K. GUPTA Parallel Sorting Algorithms s. LAKSHMIVAKAHAN. S t I D A K S H A N K. DHAt.t.. A N D LESI-II.L. MILLER
Volume 24 Software Fiflort Eslirnation and Productivity S . D. CONTI:,1-1. E. DIINSMOKE. A N D V . Y. 9-11.~ Theoretical Issues Concerning Protection in Operating Systems M I C H A EA. L HARRISON Developments in Firmware Engineering SUllKATA D A S G l i P T A A N D BK~KID.SHRlV€iK The Logic of Learning: A Basis for Pattern Recognition and for Improvement of Performance R A N A NB. BANI.KJI The Current State of Language Data Processing PAUL L. GAKVIN Advances in Information Retrieval: Where Is That / # * & ( t r t Record? DONALU 1-1. KKAFT The Development of Computer Science Education WII.LIAM F. AT(.HISON
302
CONTENTS OF PREVIOUS VOLUMES
Volume 25 Accessing Knowledge through Natural Language A N D GOKDON MCCALLA NICKCEKCONE Design Analysis and Performance Evaluation Methodologies for Database Computers STEVEN A. DEMUKJIAN, DAVIDK . HSIAO,AND PAULAR. STKAWSER Partitioning of Massive/Real-Time Programs for Parallel Processing I. LEE,N. PKYWFS,A N D B. SZYMANSKI Computers in High-Energy Physics MICHAEL METCALF Social Dimensions of Office Automation ABBEMOWSHOWITZ
Volume 26 The Explicit Support of Human Reasoning in Decision Support Systems AMITAVA DUTTA Unary Processing A. DOLLAS.J. B. GLICKMAN. A N D C. O’TOOLE W. J. POPPELHAUM, Parallel Algorithms for Some Computational Problems ABHAMOITKAA N D S. SITHAKAMA IYENGAK Multistage Interconnection Networks for Multiprocessor Systems S. C. KOTHAKI Fault-Tolerant Computing WINGN. TOY Techniques and Issues in Testing and Validation of VLSI Systems H. K . REGHHATI Software Testing and Verification LEE J . WHITE Issues in the Development of Large, Distributed, and Reliable Software C. V. RAMAMOOKTHY, ATULPKAKASH, VIJAYGAKG, TSUNEOYAMAUKA, AND ANUPAM BHIIX
Volume 27 Military Information Processing JAMESSTAKKDKAPEK Multidimensional Data Structures: Review and Outlook S. SITHAKAMA IYENGAK,R. L. KASHYAP. V. K. VAISHNAVI,A N D N. S. V. RAO Distributed Data Allocation Strategies ALANR. HEVNEK AND AKUNA RAO A Reference Model for Mass Storage Systems STEPHEN W. MILLER Computers in the Health Sciences KEVINC. O K A N E Computer Vision AZKIELROSENFELD Supercomputer Performance: The Theory, Practice, and Results OLAFM . LUBECK Computer Science and Information Technology in the People’s Republic of China: The Emergence of Connectivity JOHNH. MAIEK