Advances
in COMPUTERS VOLUME 14
Contributors to This Volume
RICHARD I. BAUM T. E. CHEATHAM, JR. SUSANL. GRAHAM MICH...
47 downloads
1239 Views
13MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 14
Contributors to This Volume
RICHARD I. BAUM T. E. CHEATHAM, JR. SUSANL. GRAHAM MICHAEL A. HARRISON J. HARTMANIS DAVID K. HSIAO W. J. POPPELBAUM J. SIMON JUDYA. TOWNLEY
Advances in
COMPUTERS EDITED BY
MORRIS RUBINOFF Moore School of Electrical Engineering University of Pennsylvania and Pennsylvania Research Associates, Inc. Philadelphia, Pennsylvania
AND
MARSHALL C. YOVITS Department of Computer and Information Science Ohio State University Columbus, Ohio
VOLUME
14
ACADEMIC PRESS
New York
San Francisco London-1976
A Subsidiary of Harcourt Brace Jovanovich, Publishers
COPYRIGHT 0 1976, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDINQ PHOTOCOPY, RECORDINQ, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITINQ FROM THE PUBLISHER.
ACADEMIC PRESS, INC.
111 Fifth Avenue, New York, New York
~0003
United Kingdom Edition published by ACADEMIC PRESS, LNC. (LONDON) LTD. 24/28 Oval Road, London NWI
LIBRARY OF CONGRESS CATALOG CARD NUMBER: 59-15761 ISBN 0-12-0121 14-X PRINTED IN THE UNITED STATES OF AMERICA
Contents
CONTRIBUTORS .
PREFACE
ix xi
.
On the Structure of Feasible Computations J. Hartmanis and J. Simon
1. Introduction . 2. Feasible Computations and Nondeterminism . 3. Memory-Bounded Computations . 4. Nondeterministic Tape Computations and the Iba Problem 5. Random Access Machines . 6. Conclusion . References
.
1 6 17 27
30 40 41
A look at Programming and Programming Systems T. E. Cheatham, Jr., and Judy A. Townley
1. Introduction . 2. Some Background . 3. Classes of Programs . 4. Facilities for Small-Scale Programs . 5. The EL1 Language and ECL System . 6. Aids for the Nonexpert Programmer . 7. Aids for the Production of Complex Programs References
V
45 46
48 53
.
55 69
72 75
vi
CONTENTS
Parsing of General Context-Free languages Susan 1. Graham and Michael A. Harrison
1. 2. 3. 4. 5. 6.
Introduction . Preliminaries . The Cocke-Kasami-Younger Algorithm Earley’s Algorithm . Valiant’s Algorithm . The Hardest Context-Free Language . Bounds on Time and Space References
,
.
. . ,
.
.
77 79 107 122 140 176 181 184
Statistical Processors W. 1. Poppelbaum
Pros and Cons of Statistical Information Representation. An Overview of Time Stochastic Processing . Fluctuations and Precision of Stochastic Sequences . Generation of Random and Quasi-Random Sequences . Examples of Time Stochastic Machines . Bundle Processing and Ergodic Processing . Examples of Bundle and Ergodic Machines An Overview of Burst Processing . Preliminary Results in Burst Processing . . 10. Outlook in Statistical Processing References . 1. 2. 3. 4. 5. 6. 7. 8. 9.
.
187 190 194 195 197 205 211 216 224 226 228
Information Secure Systems David K. Hsiao and Richard 1. Baum
1. Prologue . 2. Introduction 3. Toward an Understanding of Logical Access Control Mechanisms
. .
231 234 241
vii
CONTENTS
4. Some Thoughts on Information-Theoretic Protection 5. Can We Build Information Secure Systems? . 6. Summary and Prospectus . References
AUTHORINDEX . SUBJECT INDEX . CONTENTS OF PREVIOUS VOLUMES
.
.
. . . .
. . .
254 256 270 271
273 276 284
This Page Intentionally Left Blank
Contributors to Volume 14 Numbers in parentheses indicate the pages on which the authors’ contributions begin.
RICHARD I. BAUM,Department of Computer and Information Science, The Ohio State University, Columbus, Ohio (231) T. E. CHEATHAM, JR., Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts (45)
SUSANL. GRAHAM, Computer Science Division, University of California at Berkeley, Berkeley, California (77) MICHAEL A. HARRISON, Computer Science Division, University of California at Berkeley, Berkeley, California (77)
J. HARTMANIS, Department of Computer Science, Cornell University, Ithaca, New York ( 1 )
DAVIDK. HSIAO,Department of Computer and Information Science, The Ohio State University, Columbus, Ohio (251) W. J. POPPELBAUM, Department of Computer Science, University of Illinois, Urbana, Illinois (187) J. SIMON,* Department of Computer Science, Cornell University, Ithaca, New York ( 1 ) JUDYA. TOWNLEY, Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts (45)
* Present address: Universidade Estadual de Campinas, Campinas, Sao Paulo, Brazil. ix
This Page Intentionally Left Blank
Preface
Volume 1 of Advances in Computers appeared in 1960. It is now my pleasure to edit and write the Preface for Volume 14, the intervening volumes having appeared regularly over the last 15 years. This serial publication is thus one of the oldest and most regular in this relatively new and dynamic discipline. Over the years a wide variety of diverse topics has been discussed] most of which have been timely and of considerable current, as well as long range, interest. Many of the chapters have had a significant effect on the future course of activities in computer and information science. Taken as a whole, these volumes and the chapters and topics comprising them can be considered an authoritative summary and review of computer and information science and its applications. The contributions have been written by authors who are generally considered experts in their fields. It is a great privilege for me to serve as a coeditor of this prestigious serial publication, a role I assumed with Volume 13 upon the retirement of Dr. Franz L. Alt. It is a particular pleasure for me to work with Dr. Rubinoff who has been an editor since Volume 3. Volume 14 treats several particularly important and significant topics. Each area is one in which much current activity is underway and one of central interest to computer science. Furthermore] the authors who have contributed chapters to this volume are held in particularly high regard by the community. In their chapter on the structure of feasible computations, Hartmanis and Simon of Cornell University discuss computational complexity over the last 15 years as a principal part of computer science. They go on to indicate a major connection between a wide class of problems, and they also identify some central problems in complexity theory. Solution of these problems would add considerably to the understanding of computation and should have considerable practical application. The chapter on programming and programming systems by Cheatham and Townley of Harvard University examines the programming process and considers the implications for programming systems. They point out that a language alone is not a sufficient goal inasmuch as a complete programming environment is necessary. Their comments are based on considerable experience with actual programming systems. Their experiences at Harvard are discussed in detail. xi
xii
PREFACE
Graham and Harrison of the University of California a t Berkeley discuss the parsing of general context-free languages. To be able to perform such parsing is, of course, important in computer analysis of language. This chapter focuscs on grammar rich enough to generate all context-free languages. They then discuss in detail three algorithms for parsing these grammars and compare their similarities and differences. They further discuss the time bound for each algorithm. In the chapter on statistical processors, Poppelbaum of the University of Illinois writes about statistical representation of information and suggests some alternative processing systems which utilize such representations. After analyzing, evaluating, and comparing these systems with more classical processors, he concludes that in the near future statistical processors will be practical, economical, and useful. They will, he believes, be serious alternatives to the classical binary methods. In the final chapter, Hsiao and Baum of Ohio State look at the very important problem of providing security for information. This is a growing problem which affects everyone. Of concern is the security in procedures, hardware, software, and data base. Hsiao and Baum are largely concerned with the access control problem and privacy protection. They believe that highly efficient information-secure systems will soon be realized in computer hardware. They further claim that their model can serve as a framework for the study of future information secure systems. I should like t o thank the contributors t o this volume who have given extensively of their time andenergy to make this a timely, important, and useful publication in the field of computer and information science. Editing this volume has been a most rewarding undertaking.
MARSHALL C. YOVITS
On the
Structure of Feasible Computations'
1. HARTMANIS and J. SIMON' Department o f Computer Science Cornell University Ithaca, New York
1. Introduction . 2. Feasible Computations and Nondeterminisin . 3. Memory-Bounded Computations . 4. Nondeterministic Tape Computations and the lba Problem 5. Random Access Machines . 6. Conclusion . References .
. .
.
.
. . .
.
.
1 6 17 27 30 40 41
1. Introduction
The theory of computational complrxity is the study of the quantitative aspects of computing, and it has been an active area of computer science research for some 15 years. During this time computational complexity theory has developed so rapidly that by now it is a principal part of thcoretical computer science and one of the most active research areas in all of computer science. The development of computational complexity theory can be divided roughly into two parts. The first phase of this developrncnt established the general properties of specific computational measures, such as computation time and memory used, and provided an elegant axiomatic approach for the systematic invcstigation of properties common to all computational complexity measures. For an overview of this work, see Hartmanis and Hopcroft (1971) and Borodin (1973). 'This research has been supported in part, by the National Science Foundation Grants GJ-33171X and DCR75-09433, and Grant 70/755 from Fundacao de Amparo a Pesquisa do Estado de Sao Paulo, and by Universidade Estadual de Campinas. * Present cLddress : Universidade Estadual de Campinas, Campinas, Sao Paulo, Brazil. 1
2
J. HARTMANIS AND J. SIMON
At the risk of oversimplifying, we can say that the first phasc of this research was dominated by the search for global properties of computational complexity measures and general rcsults about the complexity of computations. The study of the complexity of specific (practical) problems yielded some interesting results, but they were isolated cases and did not dominate the main effort during the first phase of this development. During the last five years, the emphasis in computational complexity theory has changed, and the research in this second phase of development is dominated by intcrest in thc computational complexity of specific problems and by a general desire to understand the less complex computations which can be considered as feasible and could yield practical insights about the complexity of computations. This research has progressed very rapidly and has increased our understanding of feasible computations dramatically. As a matter of fact, this work has revealed some very deep and unexpected connections in the range of fcasible computations and has exhibited a n unsuspected unity of this research area. Furthermore, this research has identified several suprr-problems and has shown their importance for the proper understanding of feasible computations. Thus these results have not only added to our understanding of feasible computations, but have also unified this field of research and are very likely going to determine its further development by creating a consensus about what problems are important and deserve t o be thoroughly investigated. The purpose of this paper is to give the reader an overview of this development in the study of feasible computations and to give him an insight into some of the recent results, as well as an appreciation of the unity and structure that have emerged in this area of research. To achieve these goals, this overview treats only a selected set of topics which, we believe, illustrate the major ideas and show the unity of the emerging understanding and emphasize the essential open problems. For other discussions of research in this area, see Aho el al. (1974), Borodin (1973), Borodin and Munro (1975), Hartmanis and Hunt (1974), Stockmeyer (1974) , and Meyer and Stockmeyer (1975). I n the first part of this papcr, we discuss the definition of feasiblc computations which leads to the two central families of languages which currently dominate the study of low-level complexity computations, P and N P . P is the family of languages which can be accepted by deterministic Turing machines whose running time is bounded by a polynomial (in the length of the input), and N P is the corresponding family of languages which can be accepted by nondeterministic Turing machines in polynomial time. Certainly, the languages or problems in P must be considered as feasibly computable. On the other hand, it turns out that many problems of practical importance are in N P ; that is, many important problems can
STRUCTURE OF FEASIBLE COMPUTATIONS
3
be solved in polynomial time if we permit nondeterministic methods (guessing and verifying). Clearly, the deterministic polynomial timeacceptable languages form a subsct of the nondeterministic polynomial N P ; on the other hand, it is not known time-acceptable languages, P whether this containment is proper. In spite of a n intensive study of this problem, we are still unable to prove or disprove that P = N P . That is, i t is not yet known whether any nondeterministic algorithm can be replaced by a deterministic one by increasing the computation time only polynomially. All known methods of converting nondeterministic algorithms to deterministic ones increase the computation time exponentially, but there is no proof that all methods must require an exponential loss of time. This problem is by now recognized as one of the most important open problems in theory of computation and is known as the P = N P ? problem. It was first formulated by Cook (1971), who encountered it and recognized its importance in the study of computational complexity of proof procedures in logic. Karp (1972) exhibited many practical problems which are in N P and added considerably to the understanding of these classes of problems. Cook (1971) also showed that there exist specific problems in N P which are complete; namely, there is a language L in N P such that
L isin P
iff
P
=
NP.
Surprisingly, many problems of practical importance are complete in NP. Furthermore, if we have a deterministic polynomial time algorithm for any complete problem in NP, then we can effectively translate this algorithm into deterministic polynomial time algorithms for all other problems in N P . One such complete problem in N P is to determine whether two regular expressions, written over 0, 1 and with the operations of set union and concatenation, designate different sets. The corresponding language is
LR = { (Ri, R,) 1 R,, Ri 0 , 1,
' 1
are regular expressions over
u,
( 1 )
and
L ( R J # L(R,)J,
where L ( R ) is the set of strings designated by the regular expression R. Several other NP-complete problems are mentioned in this paper, and many more are known. Many of them turn up in problems of practical importance, dramatically emphasizing the need to solve the P = NP? problem. Anothcr family of languages considered in this overview are those languages which can be recognized by Turing machines on polynomially bounded tape. This class is designated by P T A P E and we know th a t
P C N P C PTAPE.
4
J. HARTMANIS A N D J. SIMON
Furthermore, it follows from Savitch (1970) that the languages acceptable by deterministic and nondeterministic Turing machines on polynomial tape coincide: PTAPE = NPTAPE. On the other hand, it is not known whether N P = PTAPE or P = PTAPE, again indicating two very interesting open problems and exposing our lack of understanding of some very fundamental problems relating computation time and memory. The class PTAPE also has complete languages, and the situation is very similar to the NP case. For example, the problem of determining whether two regular expressions written over 0, 1 with the Operations of set union, concatenation, and Kleene star denote different regular sets, is such a problem. The corresponding language is LR*= { (R,, Rj) I Ri, Rj
are regular expressions over
0, 1, (,), U,
*,
*, and L ( R i ) f
L(Rj)).
We show that LR*is in PTAPE, and furthermore, Lpisin NP
iff
NP
LR* is in P
iff
P
=
PTAPE.
Similarly, =
PTAPE.
Though it is very unlikely that P = PTAPE, if we knew a deterministic polynomial time algorithm for the recognition of L R * , we could effectively translate it into deterministic polynomial time algorithms for all languages in PTAPE. It is interesting to note that in the languages L R and LR*,the Kleene star, *, seems to characterize the difference between nondeterministic polynomial time and polynomial tape computations, if there is any difference! A very similar situation arises when we consider random access machines with and without built-in multiplication, discussed later in this paper. Since the regular sets arc closed under intersection (n) and complementation (1) , we obtain an extended regular expression language if we add these two operations. Let
LR = { (Ri, Rj) I Ri,Rj 0, 1, (,),
are regular expressions over
., u, *, n,
1
a n d L ( R 4 Z L(RA1.
Though this looks like a minor extension of the descriptive power of regular expressions, the computational complexity results show that this is definitely not the case. It has been shown by Stockmeyer (1974) and Meyer and Stockmeycr (1975) that LR is not feasibly recognizable, since there is
STRUCTURE OF FEASIBLE COMPUTATIONS
5
no Turing machine which can recognize LR on
./
L!+(n)= 22 tape for any fixed integer k. This is a striking example of a simple-looking specific problem whose computational complexity grows so rapidly that even for strings of modest length (a few hundred symbols long) , we do not have a chance of computing whether the two regular expressions designate the same set, or whether a given regular expression designates the empty set. Thus the addition of complementation and intersection to the regular expression language increases the “expressive power” of the language to an extent that it permits a shortening of (some) regular expressions by any desired number of exponentials. The language LR*also plays an interesting role among the contextsensitive languages. For definitions and background material, see Hopcroft and Ullman (1969). We recall that it is not known whether the deterministic and nondeterministic context-sensitive languages are the,same, nor is it known whether the context-sensitive languages are closed under complementation. The language LR* is a context-sensitive language and LIP is a deterministic context-sensitive language iff the deterministic and nondeterministic context-sensitive languages are the same. Furthermore, the context-sensitive languages are closed under complementation iff the complement of LR*is a context-sensitive language. Finally, if a deterministic Turing machine is given which recognizes LR* on linear tape (i.e., using no more tape than required t o write down the input) , then we can find effectively for any nondeterministic Turing machine an equivalent deterministic Turing machine using no more tape than the nondeterministic one (provided it uses a t least a linear amount in the length of the input). For related topics, see Hartmanis and Hunt (1974). In the last section, we consider random access machines that are a somewhat more realistic model of a real computer than a Turing machine. We denote these machines by RAM if they do not have a built-in multiplication (which in one step can multiply the contents of two registers) and by MRAM if they do have built-in multiplication. It can easily be seen that the languages accepted in polynomial time by RAM’S are the same as the family of languages accepted in polynomial time by Turing machines, namely, P = PTIME-RAM and N P = NPTIME-RAM.
6
J. HARTMANIS AND J. SIMON
Thus thc classic P = N P ? problem reappears for the random access machines. A much more difficult proof shows that for random acccss machines with multiplication , the polynomial time recognizablc languages form cxactly the samc family as tJhosc recognizablc by Turing machines on polynomial tape: PTIME-MRAM PTAPE
and thcrcforc in polynomial time the deterministic and nondcterministic MRAM’s accept the samc family of languages,
PTIME-MRAM = NPTIME-MRAM = PTAPE. For related work, see Pratt et al. (1974) and Hartmanis and Simon (1974). Thus, from the previous results we read off that we can simulate multiplication in polynomial time by random acccss machines without multiplication iff l’ = PTAPE for Turing machine computations, that is,
PTIME-RAM = PTIME-MRAM
iff
P
=
PTAPE.
From this wc get tho strange and unexpcctcd result, that thc cquality of srts described by rcgular expressions can be recognized in deterministic polynomial time, L p in P , iff PTIME-MRAM = PTIME-RAM, which happens iff P = PTAPE for Turing machines.
2. Feasible Computations and Nondeterminism
From the early research on effective computability emerged in the first half of this century a consensus about the precise meaning of the concept “effectively computable.” This consensus is expressed in Church’s thesis, which in essence asserts that ‘‘a function is eflectively computable (or simply computable) if and only if there cxists a Turing machine which computcs it” (Rogers, 1967). The class of effectively computable functions contains functions which are practically computable as well as functions which require arbitrarily large amounts of computing time (or any other resource by which we measurc computational complexity) and thus these functions cannot be practically computed. So far therc has not emerged any real consensus as to which functions arc in principle practically computable. It is also not clear whether the concept of practically computable functions is in any sense fundamental and whether it has a mathematical invariance comparable to the class of effcctively computable functions. At the same time, there is already a general agreement that a process whose computation time (on a Turing machine) cannot be asymptotically
STRUCTURE OF FEASIBLE COMPUTATIONS
7
bounded by a polynomial in the length of the input data is not a practical computation. For example, any function for which every Turing machine computing it requires a t lcast a number of steps exponential in the length of the input is not practically comput,able. Thus we shall define below a computation to be feasible iff it can be computed in polynomial time on a Turing machine. This definition of feasibility of computations, besides being intuitivcly acceptable, has some very nice mathematical properties and shows a very rugged invariance under changes of the underlying computing model. The last point will be particularly emphasized in Section 5 when we study random access machines with different set’sof operations. To make these concepts precise and to simplify our notation, we will consider all through this paper the computational problem of accepting languages or solving suitably encoded problems with “yes” and “no” answers, such as “Does the binary input string represent a prime number in binary notation?” or “Does t’hc given regular expression designate the set of all finite binary sequences?” In all these problems t,he length of the input scqucnce is the number of symbols in the sequence, and we express the amount of computing resource used in terms of the length of the input sequence. Convention: A computation is feasible iff it runs in polynomial time (in the length of the input) on a deterministic Turing machine (Tm).
To be able to talk about the class of all feasiblc computations, we introduce thc following. Definition: Let P T I M E , or simply P , designate the family of languages accepted in polynomial time by deterministic Turing machines.
It is easily seen that the class of feasible computations, P T I M E , inincludes a widc variety of languages and solutions of problems and that it is quite invariant under changes of computing models. We will return t o the last point when we discuss random access machines in a later section and encounter the same polynomially bounded classes. On the other hand, there are very many other problems and languages whose feasibility is unknown, that is, no polynomial time-bounded algorithms have been discovered for them, nor has it been shown that such algorithms do not exist. Many of these problems are of considerable practical importance, and substantial effort has been expended to find deterministic polynomial timeboundcd algorithms for them. A wide class of such important practical problems has the property that they can be computed in polynomial time if the computations can be nondeterministic. Remember that a nondeterministic T m may have several possible transitions from a given state, and it accepts a string w if there is
8
J. HARTMANIS AND J. SIMON
a sequence of moves, starting with the initial configuration with input w and ending with a configuration in which the finite control is in a final state. The amount of rcsourcc used to accept w is the minimum over all such accepting sequences. Thus a nondeterministic Turing machine can guess a solution and then verify whether it has guessed correctly. For example, consider the set
I, = {wlw E l ( O U l ) * and w does not denote a prime number].
It is not known whether L is in P T I M E , but it is easily seen that L can be accepted in polynomial time by a nondeterministic T m which guesses an integer [a binary sequence in l ( 0 u 1)*not longer than w] and then tests deterministically whether the integer divides w. To give another, morc theoretical computer science oriented example, let R; and Rj be regular expressions over the alphabet consisting of 0, 1, U and the delineators (,) and let L ( R i ) denote the set of sequences designated by R;. Since we have not permitted the use of the Kleene star, *, in the regular expressions, we see that we can only describe finite sets, and that the longest string in the set cannot exceed the length of the expression. Furthermore, the language a ,
LR = {Rip Rj) I L ( R i )
f
L(Rj)
can easily be recognized in polynomial time by a nondeterministic T m which guesses a binary sequence w whose length does not exceed the length of the longest expression Ri or R j , and then verifies that w is in L ( R i ) or L (Rj) but not in both. So far no deterministic polynomial time algorithm has been discovered for this problem. The multitude of problems and languages of this type has led to the definition of the corresponding nondeterministic class of languages. Definition: Let N P T I M E , or simply NP, denote the family of languages accepted in polynomial time by nondeterministic Turing machines.
To emphasize the importance of this class of problems or languages, we list some such problems. In all these problems, we assume that we have used a straightforward and simple encoding of the problem. For a detailed discussion of such problems, see Cook (1971) and Karp (1972), and for an extensive list of N P complete problems and a good bibliography, see Ah0 et al. (1974). Fagin (1974) gives an interesting discussion of NP-complete problems in logic (see also Garey et al., 1974; Hunt and Szymanski, 1975). 1 . Given Ri, Rj regular expressions over 0, 1, ., U, ( ,), determine if the sets of sequences denoted by Rd and R, are different, i.e., if L (Ri) # L ( R j ) .
STRUCTURE OF FEASIBLE COMPUTATIONS
9
2. Given a formula of the propositional calculus involving only variables plus connectives (or a Boolean expression in conjunctive normal form), determine if it is “true” for some assignment of the values ‘‘false” and “true” to its variables. 3. Given a (directed) graph G, determine if the graph has a (directed) simple cycle which includes all nodes of G. 4. Given a graph G and integer k , determine if G has X: mutually adjacent nodes. 5 . Given an integer matrix C and integer vector d, determine if there exists a 0-1 vector 2 such that Cx = d. 6 . Given a family of sets and a positive integer k , determine if this family of sets contains k mutually disjoint sets. 7. Given an (n 1)-tuple of integers (al, a2, . , , , a,, b ) . Does there exist a 0-1 vector x such that Zaizi = b?
+
It is easily seen that all of these problems are in NPTZME by a straightforward “guessing and verifying” method. On the other hand, no deterministic polynomial time algorithm is known for any of these problems. This list of problems can easily be extended, and it is clear that it contains many practical problems for which we would very much like to have deterministic polynomial time algorithms. The question whether such algorithms exist is by now known as the
P
=
NP?
problem, and it has to be considered as one of the central problems in computational complexity (see Cook, 1971 ; Aho et al., 1974). Intuitively, we feel that P # N P , though all attempts to prove it have failed. As we will show below, to prove that P = N P we do not have to show that every problem in N P has an equivalent solution in P . All we have to show is that any one of the seven previously listed problems has a deterministic polynomial time bounded algorithm. This simplifies the P = NP? problem considerably, but it still seems quite unlikely that such a deterministic polynomial time algorithm could exist. Conversely, a proof than any one of these seven problems is not in P would prove that P # N P . On the other hand, the exciting thing is that if P = N P then its proof is very likely to reveal something fundamentally new about the nature of computing. To emphasize this fact, recall that prime numbers have been studied for over two thousand years without discovering a fast (i.e., deterministic polynomial time) algorithm for their testing. Since the set of binary strings representing primes is in N P , this is just one more instance of the P = N P problem (V. R. Pratt, unpublished manuscript, 1974). As an illustration
10
J. HARTMANIS AND J. SIMON
we recall that in 1903 F. Cole showed that Ze7 - 1 = 193,707,721 X 76,183,825,287
and claimed that it had taken him “three years of Sundays’’ to show that Z*’- 1 was not a prime, as previously conjectured. It is also striking how easily one can check whether the given factorization is correct, thus dramatically illustrating the difference bctween “finding a solution’’ and “verifying its correctness,” which is the essence of the P = N P problem (for related problems, see Hartmariis and Shank, 1968, 1969). A very important property of the class N P was discovered by Cook (1971) when he proved that there exists a language L in NP such that if there exists a deterministic polynomial time algorithm for the recognition of L, then P = N P and we can effectively find (from the deterministic polynomial time algorithm for L ) dcterministie polynomial time algorithms for cvury L‘ in NP. To make these concepts precise, we define complete languages in N P as those languages t o which all other languages in N P can be “easily” reduced. Note that the concepts of complete languages and reducibility will be used repeatedly in this study and that they play important roles in recursive function theory and logic (Rogers, 1967). Definition: A language L is NP-complete (or complete in N P ) iff I, is in NP and for all Li in N P there exists a function f:, computable by a deter-
ministic Tm in polynomial time, such that
w is in Li Proposition 2.1: N P.
iff fi(w) is in L.
If L is an NP-complete language, then L is in P iff P
=
Proof: To see this, note that P = N P implics that L is in P. On the other hand, if L is in P, then there exists a deterministic Tm, M , which in polynomial time accepts L. For any other Li in NP, there exists, by definition of NP-completeness, a deterministic Tm M i which computes a function f; such that w E Li iff fk(w)E L.
Let MD(;, be the deterministic Tm which on input w applies M : to compute fi(w)and then applies M on f,(w)to test whether fi(w)is in L. Clearly, the deterministic Tm MD(;,accepts Li and operates in polynomial time since M: and M do. Thus Li is in P , which completes the proof. Next we prove that NP-complete languages actually exist by constructing a “universal NP” language Lv.This language is somewhat artificial, but it reveals very clearly why NP-complete problems exist and dcmon-
11
STRUCTURE OF FEASIBLE COMPUTATIONS
strates a technique which has many other applications. After this proof, we show that there are also “natural” NP-complete languages. As a matter of fact, all the previously listed problems 1-7 arc NP-complete. Theorem 2.2:
There exist NP-complete languages.
Proof: We will show that LU defined below is NP-complete. Let
LU = (#Mi#CODE(xlxz. . .xn) #31M4t 1 21x2. . .xn is accepted by the one-tape, nondcterrninistic T m M iin time t ) where M i is given in some simplc quintuple form, I M i I designates the length of the representation of M i and CODE (~121.. .xn) is a fixed, straightforward, symbol by symbol encoding of sequences over alphabets of arbit.rary cardinality (the input and tape alphabet of M i ) into a fixed alphabet, say {O, 1, # ) ; with the provision that 1 CODE(xj) I 1 cardinality of the tape alphabet of Mi. It is easily seen that, a four-tape nondeterministic T m M can accept LU in linear time. We indicate how M uses its tapes: on the first sweep of the input M checks thc format of the input, copies M i from the input on the first working tape and #31Mil‘ on the second working tape. The third working tape is used to record the present state of M i (in a tally notation) It is seen that with the available during the step-by-step simulation of Mi. information on its working tapes M can simulate Mi on the input in time 2 I M iI t (for an appropriate, agreed-upon representation of Mi). Thus M operates in nondeterministic linear time and accepts Lu. Therefore, Lu is in N P and the assumption P = N P implies that LU is accepted by a deterministic T m M‘ operating in deterministic time n p . Then for any nondeterministic Tm M i operating in deterministic time n q , we can recursively construct a Tm Mu(;)operating in polynomial time as follows: 1.
For input x1x2. . .xn M,(i) writes down
#Mi#CODE(21x2. . .
~ ~ ) # “ ~ i ’ n ~
2. M,(i) starts the deterministic machine M’ on the sequence in (1) and accepts the input 21x2. . .x,, iff M’ accepts its input.
Clearly, M i and Mu(i)are equivalent, and furthermore M,(i, operates in time less than 2[3
I M i I nq + I #Mi#CODE(x1x2. . .xn) 11” 5
Cnpq.
Thus Mccil operates in deterministic polynomial time, as was to be shown. The previous proof shows that if Lu is in P, then we can recursively obtain for every M i running in time n q an equivalent deterministic T m run-
12
J. HARTMANIS A N D J. SIMON
ning in time 0Cn~q-J. Unfortunately, for a given T m we cannot recursively determine the running time and thus we do not know whcthcr Mi runs in polynomial time or not. Even if we know that M i runs in polynomial time, we still cannot recursivcly dctcrmine thc dcgrec of the polynomial. Our next result shows that, nevertheless, we can get a general translation result (for a related result, see Hartmanis and Hunt, 1974). Theorem: 2.3: P = N P iff thcrc cxists a recursive translation u and a positive integer k, such that for every nondeterministic T m Mi, which uses timc T;(n)2 n, Mu(,)is an equivalent deterministic T m working in time OITi(n)k]. Proof: The “if” part of the proof is obvious. To prove thc “only if” part, assume that P = NP. We will outline a proof that we can recursively construct for any Mi,running time Ti (n) 2 n, an cquivalent deterministic ] a fixcd k. T m Mu(;)operating in time O I T i ( n ) k for I n our construction, we usc two auxiliary languages:
B;’ = { #w#‘ I M ; accepts w in less than t t,irnc) Bi”
=
{ # w # ~I M ion input w takes more than t time).
Clearly, the languages Bi‘ and 2* - Bi” can be accepted in nondeterministic linear time. Therefore, by our prcvious result, wc can recursively and Mi”’ which accept B1‘and construct two deterministic machines Mi’ 2* - B;“, respcctivcly, and operate in time O[np]. From Mi‘“ wc can obtain a deterministic polynomial time bounded T m Mi“ which accepts Bit’, sincc if Z* - Bi” is in NP, it is in P (by hypothesis) , and P is closed under complements. From M,‘ and Mi“ wc can recursively construct the deterministic T m Mu(;,,which operates as follows: 1. For input w, M s ( i )finds the smallest to such that #w#%is not in Birr. This is done by checking with Mi” successively, #w#,# w # ~#, w # ~ , 2. starts Mi’ on input #w#h and accepts w iff Mi’ accepts #w#h.
....
Clearly, Mu(<,is equivalent to M iand M u ( i )operates in time
c P]
Ti(n)
O[
=
o[Ti(n)p+’].
1-1
By setting k = p
+ 1, we have completed the proof.
We conclude by observing that Lv is an NP-complete problem, as defined above.
13
STRUCTURE OF FEASIBLE COMPUTATIONS
Next we assert that there exist very many natural NP-complete problems and that finding fast algorithms for some of them is of considerable practical importance. We will prove that LR = { (Ri,Rj) 1 R;, R, are regular expressions over 0, 1,
*,
u, (,) a n d L ( R d
+ L(Rj))
is NP-complete. We have chosen to use this language since the proof utilizes a technique of describing Turing machine computations by means of regular expressions, and this technique has interesting further applications. Theorem 2.4: LR is NP-complete and so are all the languages associated with the problems 1-7. Proof: We prove only that the first problem on our list is NP-complete; for the other proofs, see Karp (1972) or Aho et al. (1974).
The proof that LR is NP-complete relies heavily on the fact (proved below) that regular expressions can be used to describe the “invalid computations” of nondeterministic Tm’s. More explicitly, for every nondeterministic Tm Mi operating in polynomial t*ime,there exists a deterministic Tm which for input x I z . , . xn in polynomial time writes out a regular expression describing the invalid computations of Mi on input 21x2 . . . x,. Note that the input xlxz . . . xn is accepted by Mi iff there exists a valid Mi computation on this input, which happens iff the set of invalid M i computations on 21x2 . . . x, is not the set of all sequences (of a given length). Thus the test of whether 21x2 . . . xn is accepted by Mi can be reduced to a test of whether the regular expression, describing the invalid computations of M i on 21x2 . . xn,does not describe all sequences (of a given length). This implies that if LR is in P then P = NP, and thus we see that LR is NP-complete. We now give the above outlined proof in more detail. Let M i be a onetape, nondeterministic Tm which operates in time nk (we assume without loss of generality that Mi halts after exactly nk steps). Let S be the set of states of Mi,qo the unique starting state, q t the unique accepting final statc, and let I; be the tape alphabet of Mi.An instantaneous description of Mi is a sequence in Z*(X X S)Z* which indicates the tape content of Mi, the state Mi is in, and which symbol is being scanned. VCOMP (XIXZ. . .x n ) denotes the set of valid computations of Mi on input 51x2 . . xn. A valid computation consists of a scquence of instantaneous descriptions
.
.
# ID1 # ID2 # ID, # . . . # ID,’ #
14
J. HARTiMANIS AND J. SIMON
such that (a)
IDl
=
(a, @)x&
. . . xnbnk-"
(b) IDn&E Z'(2 X (y,))Znk-'-' (c) IDi+l follows from ID,, 1 5 j that for all i, 1 ID, I = nk.
5 k - 1 by one move of Mi. Note
Define I' as
r
=
L: u z :
x s u I#)
and let the invalid set of computations be givcn by
NVCOMP(x122 . . . xn)
= I'nek+nk+l
- VCOMP(x1xz . . . zn).
We show next that there exists a deterministic Tm, MD, which constructs fur every input z1x2. . . xn in polynomial time a regular expression using only ., u, denoting thc set of sequences NVCOMP. To scc this, note that NVCOMP consists of
Rl = set of scquencrs which do not start correctly Rz = set of sequcnces which do not end correctly
R3 = set of sequences which do not have a proper transition from IDj to IDj+l* Thus
NVCOMP(x122 . . . z,~) = Ri
U I22 U
R3.
IJct $ dcsignate any y, y # x, y E I?. Note that the regular expression for 3 has length of the order of the sizc of rri.e., a constant. Then
R~ = #rn'k+nk R~ =
ur
{ r- (I:x
qo) {q,))
u p$2rn2k+nk-2 u .
rn2k+nk-1
)nzk+nk+l
u
pek+nk
I
,
u rnk+l$rnzk-l
(r - { # I )
P-0
wherc CORRECT(alaza3) is the set of correct M itransitions in one move from u1u~u3,in the following instantaneous description. These triples arc sufficient to specify the transitions, since in a single move only the square being scanned may be modified and the only other possible change in the ID is the position of the read/write head. Since the head moves a t most one square, the set CORRECT suffices to characterize
STRUCTURE OF FEASIBLE COMPUTATIONS
15
valid transitions. For example, if the Tm in state q, upon reading a 0 may either print a 1 or a 0 and move right, remaining in the same state, then C O R M K T (ni (0, q ) 6 3 ) CORRECT ( (0, Q) uz, 03) CORRECT (cia2 (0, q ) )
a ) ) , (a11( ~ 3 ,q ) ) ]
=
{ (a10(Q,
=
{ (0 (az, q ) ~
=
{ (aiazO), (ciazl) 1.
, (1(62,q ) ~ 3 1)
3 )
It is easily seen that for any given T m CORRECT is a finite set depending only on the alphabet and on the transition rules of the machine, but not on the input. A straightforward computation shows that R1 U Rz u RI can be written out for input XIXZ . . . xn by a deterministic T m in polynomial time in n. Thus the desired M D exists. If LR is in P , then we have a deterministic T m M c which in polynomial time accepts (Ri,Rj), provided L (Ri) # L (Rj).But then, combining MD with this Mc, we get a deterministic polynomial time T m which for input xlxz , , , xn writes out (using M D ) NVCOMP(2ixz. . . xn) and then checks (using M c ) whether the expressions are unequal, NVCOMP(xlxz . . . 2,) #
rnPk+nk+i.
Clearly, the rcgular expressions are unequal iff there is a valid computation of M i on ~1x2. . . xn, but that happens iff M i accepts this input. Since M B and MC operate in deterministic polynomial t,ime, the combined machine accepting L ( M i ) also operates in deterministic polynomial time. Thus LR in P implies P = N P . Clearly, P = N P implies that LR is in P since it is easily seen that LRis in NP. This completes the proof. For the sake of completeness, we mention that it is not known whether the language
Lp
=
( w ( wf l ( O U l ) * and w designates a composite number]
is an NP-complete problem. It is easily seen that LP is in N P , as stated before. A somewhat more difficult proof shows that, surprisingly,
LP =
{w1 w E l ( 0
U 1)*
and w designates a prime)
is also in N P (V. R. Pratt, unpublished manuscript, 1974). Thus the “guess and verify” method can be used to design (nondeterministic) polynomial time algorithms to test whether an integer is or is not a prime. Since LP and LP (or LP = (0 U 1 ) * - LP) are in N P , and for no NP-
16
J. HARTMANIS AND J. SIMON
complete problem L is it known that & is in NP, it seems unlikcly that either LP or &p could be NP-complete. A very recent result (Miller, 1975) shows that if the Generalized Riemann Hypothesis (GRH) holds, then tho set L p is in P. Unfortunately, there is no proof that the GRH holds. For results about the amount of tape required in the recognition of the set LP and L p , see Hartmanis and Shank (1968,1969), and Miller (1975). Note that, if we could show for an NP-complete problem L that E is not in NP, then we would have a proof that P # NP, since L’ in P implies that is in P and thus in NP. A proof of this type could possibly show that P j r N P without giving any insight into the actual deterministic time complexity of the class NP. Our current understanding of these problems is so limited that we cannot rule out either of the two extremes: (a) that P = N P and we need only polynomial time-bounded deterministic algorithms, or (b) that there exist L in N P which requirc an exponential amount of time for their recognition. As stated before, it appears that a proof that P = N P will have to reveal something new about the nature of computation. Similarly, a proof that for all L in NP, & is in NP, which could happen even if P # NP, would have to reveal something unexpected about the process of computation. To emphasize this, consider again the set of unequal regular expressions over 0, 1, - 1 u, (,I: LR = { (Rtl Rj) I L ( E ) f I J ( R , )1. As observed before, LR is easily seen to be in NP. On the other hand, it seems impossible (with our current statc of knowledge about computing) that this computation could be carried out in deterministic polynomial time. Similarly, it seems impossible that the set of pairs of equal regular expressions, L R =~ { (Rt, &) I L ( R , ) = L(%) 1, could be in NP, since in this case we would have to give a proof in nondeterministic polynomial time that there does not exist any sequence on which R , and R, differ. This appears to be a completely different situation than the proof that LR is in NP, and we do not know any methods which can exploit the power of nondcterminism to yield such a proof. It can be shown that L R - Ihas the same “completeness” property with respect to NPc, i.e., the set of languages L such that Z* - L is in NP, as LR has with rcspcct t o NP. We conclude this section by observing that if P # NP, and N P is not closed under complementation, then P and N P show a good low complexity level analogy to the recursive and recursively enumerable sets; P corresponding to the set of recursive sets and N P to the sets of recursively enumerable sets.
STRUCTURE OF FEASIBLE COMPUTATIONS
17
3. Memory-Bounded Computations
In the study of the complexity of computations, thrre are two natural measures: thc time or number of operations and thc tape or memory space used in the computation. It is strongly suspected that there exist interesting and important connections between these two complexity measures and that a central task of theoretical computer science is to understand the trade-offs between them (Minsky, 1970). In this section we discuss the problem of how much memory or tape is required for the recognition of the classes P and N P and some related problems. It will be seen that this study again leads us very quickly to some interesting open problems and reveals some intriguing analogies with previous problems which will be further pursued in Section 5, which deals with random access machines. In analogy to the time-bounded Tm computations, we define memorybounded complexity Classes. Definition: Let P T A P E ( N P T A P E ) denote the family of languages accepted by deterministic (nondeterministic) Turing machines in polynomial tapr.
Clearly, a Turing machine operating in polynomial time can visit only a polynomially bounded number of different tape squares, and therefore we have
P C PTAPE
and
N P C NPTAPE.
Furthermorc, any nondeterministic Tm M , operating in time nk can make no more than nk different choices. On polynomial tape a deterministic Tm can successively enumcrate all possible qnk scqucnccs of choices Mi can make and for each sequence of choices simulate deterministically on polynomial tape the corresponding M i computation. Therefore we obtain Proposition 3.1:
N P C PTAPE.
On the other hand, it is not known whether there exists a language L in P T A P E that is not in N P . Intuitively, one feels that there must be such languages, since in polynomial tape a Turing machine can perform an exponential number of operations before halting. At the same time, nobody has been able to prove that this exponential number of operations, restricted to polynomial tape, can be utilized effectively to accept some language not in N P . Thus we arc led to another central problem in computational complexity: is N P = P T A P E or possibly P = PTAPE?
18
J. HARTMANIS AND J. SIMON
At the present time we have to conjecture that N P # P T A P E , since nothing in our knowledge of memory-bounded computations suggests that P T A P E computations could be carried out in polynomial time. Furthermore, N P = P T A P E would have, as we will see latcr, some very strong and strange implications. As stated in the previous scction, it is not known whether P = N P , and this is a very important problem in complexity theory as well as for practical computing. The situation for tape-bounded computations is different (Savitch, 1970). Theorem 3.2: Let L ( n ) 2 log n be the amount of tape used by a nondeterministic Tm M,. Then we can effectively construct an equivalent deterministic T m using no more than [L ( n ) I 2tape. From this result wc get immediatdy Corollary 3.3:
PTAPE
=
NPTAPE.
I n Section 5, w e will give a new proof (without using Savitch’s result) that PTAPE = N P T A P E , as part of our characterization of the computational power of multiplication in random access machines. Next we will show that P T A P E has complete problcms, just as N P did, and thus to show that N P = P T A P E ( P = P T A P E ) ,we only have to show that one specific language which is complete in PTAPE is also in N P ( P ). Definition: We say that a language I, in P T A P E is tape-complete iff for every L , in P T A P E there exists a deterministic polynomial time computable function f l such that
w E L, From this we immediately get
iff f,(w)E L.
N P = P T A P E ( P = P T A P B ) iff there exists a tapecomplete problem in N P ( P ). Wc now show that tape-complete problems exist. In ordcr to emphasize the similarities with the N P case, we consider the following two languages: Proposition 3.4:
LuT
=
{#M,#CODE(z1z2. . . xn)#IM~lr I M , accepts z1z2. . . xn using no more than 2 I M , I 1 CODE(zls . . . zn)I
+
+ t tape squares}
LR*
=
+
( R , I R, is a regular expression over 0, 1, U, a n d L ( R , ) # (0 U 1)*)
a,
*,
(,)
LUT and LR* are tape-complete languages. Thus iff JJUT E N P ( P ) N P = PTAPE ( P = P T A P E ) iff LR* E N P ( P I .
Theorem 3.5:
STRUCTURE OF FEASIBLE COMPUTATIONS
19
Proof: It is easily seen that LuT is accepted on the amount of tape needed to write down t,he input, if we permit nondeterministic operations. Thus LUTis in PTAPE. Furthermore, if Li is in PTAPE, then there exists a deterministic Tm M i which accepts Li in nk tape, for some Ic. But then thcrc exists a Tm, M,(i), which for input zlz2 . . . zn writes
#Mi#CODE(zlz2 . . . zn)#IMilnk on its tape in deterministic polynomial time. Designate the function computed by MCci) by fi.Then w is in Li iff fi(UI)is in LUT,and we see that LUTis tapc-complete. To prove that LR* is tape-complete, observe that a nondeterministic T m can guess a sequence and then on a linear amount of tape and using standard techniques (McNaughton and Yamada, 1964) check that the sequence is not in Ri. Thus LR* is in P T A P E ; as a matter of fact, LR* is a contextsensitive language, as is Lup. Now we will again exploit the power of regular expressions to describe invalid T m computations efficiently. For a Tm M i which opcrates on nktape, we define VCOMP(zlz2.. . x,)
=
#IDl#ID2#.
. . #IDHALT#
just as in Section 2, with 1 I D j I = nk.Since we now have the Hleene star available in our regular expressions, we define
NVCOMP
=
r* - VCOMP.
Thus NVCOMP(XIX~ . . . 2),
=
Ri U Rz
U R3,
where R1, R2, and R3 represent the sets of strings which do not start right, which do not end right, and in which there is a,n incorrect transition from IDi t o IDi+l, respectively. The details are quite similar to the proof in Section 2, and we write down the expressions for R1, R2, and R3 to indicate the use of the *, which was not available in the other proof. It should be pointed out that M i could perform an exponential number of steps before halting and therefore we cannot use the techniques of the previous proof. This proof makes an essential use of the Kleene star. Again Z denotes any y, y # x,and y in r = Z U { # ) U Z X S, qo the initial state and qt the final state, q f Z 4 0 ,
[# U #[ (21,U)(21, q o ) [Zz U z2[f3 U zQ[.. . U $1. . .IF* R2 = - (Z x f q r I ) l * u r*(r - { # I > R3 = r * u l a 2 u 3 ~ - ~r3 [ - CORRECT (uiuZu3)]I?* Ri
=
where CORRECT(uluzu3) is a correct sequence of the next ID if in the previous I D in the corresponding place appears u1u2u3,including #'s.
20
J. HARTMANIS AND J. SIMON
.
It is easily seen that a deterministic Tm exists which for input X g a . .xn writes out the regular expression R1 U Ra U Rson its tape in deterministic polynomial time. If we denote this function by f i , we see that iff fi(w) = RI U R2 U Rs # I'* since w is accepted by M iiff there exists a valid computation of Mi on w, but this happens iff NVCOMP(w) # I"*. This concludes the proof that LUTand LR*are tape-complete languages, and we see that N P = PTAPE (P = PTAPE) iff LE*or LUTis in N P (P). It should be pointed out that LR* is just one example of tape-complete problems about regular expressions. We can actually state a very general theorem which characterizes a large class of such problems or languages (Hartmanis and Hunt, 1974) , which shows that, for example, ( R I R regular expression and L ( R ) # L (R*)1 { R I R regular expression and L ( R ) is cofinite) are two such tape-complete languages; many others can be constructed using this result. It is interesting to note that w is accepted by Mi
LR = ( (Ri, Rj) I Ri,Rj regular expressions over 0 , 1, U, *,
(,I
andL(Ri) f L ( R j ) )
is an NP-complete problem, and when we added the expressive power of the Kleene star, *, the language
LR* = { (Ri, Rj) I Ri, Rj regular expressions over 0, 1, U,
., *, (,I and L(R4
f
L(Ri))
became a tape-complete language. Though we cannot prove that N P f PTAPE, we conjecture that they are different, and therefore the Kleene star made the decision problem more difficult by the difference, if any, b e tween N P and PTAPE. It should be noted that without the Kleene star, we cannot describe an infinite regular set and with , U, *, all regular sets can be described. From this alone, we would suspect that the decision problem (recognition of) LR*should be harder than for LR. Whether it really is harder, and by how much, remains a fascinating and annoying open problem. Another interesting tape-complete combinatorial problem related to the board game of HEX is given by Even and Tarjan (1975). To emphasize a further analogy between N P and PTAPE, we take a quick look at logic. Recall that all the expressions in propositional calculus which for some assignment of variables become true form an NP-complete language. Thus we have from Cook (1971) ,
STRUCTURE OF FEASIBLE COMPUTATIONS
21
Theorem 3.6: The problem of recognizing the satisfiable formulas of the propositional calculus is an NP-complete problem. Similarly, the set of true sentences (tautologies) of the propositional calculus is a complete language for N P .
The next simplest theory, the first-order propositional calculus with equality (1EQ) is a language that contains quantifiers, but no function symbols or predicate symbols other than = . The following result characterizes the complexity of this decision problem. The problem of recognizing the set of true sentences for the first-order propositional calculus (1EQ) is complete in P T A P E . Theorem 3.7:
Proof: See Meyer and Stockmeyer (1973).
Again we see that the difference in complexity between these two decision problems is directly related to the difference, if any, between N P and P TAPE. Next we take a look at how the computational complexity of the decision problem for regular expressions changes as we permit further operations. We know that all regular sets can be described by regular expressions using the operators u, -, *. At the same time, regular sets are closed under set intersection and set complementation. Therefore we can augment our set of operators used in regular expressions by n and 1 , and we know from experience that these two operators can significantly simplify the writing of regular expressions. The surprising thing is th a t it is possible to prove that the addition of these operators makes the decision problems about regular expressions much harder (Meyer and Stockmeyer, 1972, 1975; Hunt, 1973a). I n particular, as we will see, the addition of the complementation operator makes the decision problem for equivalence of regular expressions practically unsolvable (Stockmeyer, 1974; Meyer and Stockmeyer, 1972, 1975). Theorem 3.8:
The language
ER = { (Ri,Rj) I Ri, Rj regular expressions over 0, 1, -, U,
*,
(,) and L(Rd # L(Rj)J
1,
cannot be recognized for any k in
. .
I
"/ %
22
J. HARTMANIS AND J. SIMON
tape. I n other words, LR cannot be recognized on tape bounded by an elcmcntary function. The basic idea of thc proof is very simple: if we can show that using extended rcgular expressions (Le., with ., u, *, -I ) we can describe the valid computations of Tm's using very large amounts of tape by very short regular expressions that are casily obtained from the T m and its input, then the recognition of E R must rcquirc very large amounts of tape. To see this, note that t o test whether w is in L ( M i ) we can either run M ion w, using whatever tape M i requires, or else write down the regular expression, R = VCOMP (w) , describing the valid computations of M i on w,and then test whether L ( R ) = 4. Since w is accepted by M i iff there exists a valid M i computation on w,if the expression R is very short and the recognition of z R does not require much tape, then this last procedure would save us a lot of tape. This is impossible, since there exist languages whose recognition requires arbitrarily Iarge amounts of tape, and these requirements cannot be (essentially) decrcascd (Stearns el al., 1965; Hopcroft and Ullman, 1969), Thus either method of testing whether w is in L ( M i ) must require must a large amount of tape, which implies that the recognition of require a large amount of tape. The reason the addition of complementation permits us to describe very long Turing machine computations economically is also easy to scc, though the details of thc proof arc quite messy. Thc doscriptive power is gained by using the complement to go from regular expressions describing invalid computations t o regular expressions of (essentially) the same length describing valid computations. For example, consider a T m Mi which, on any input of length n, counts up to 211 and halts. Using the techniques from the proof of Theorem 3.5, we can write down
NVCOMP(z1zz.. .x,) for this machine on cn tape squares, where c is fixed for Mi. But then
NVCOMP(z1sz. . 2,) = VCOMP(zlz2. - . 2,) and we have a regular expression of length 5 en 1 < Cn,describing a computation which takes 2" steps, and thus VCOMP (z1z2, , , z,) consists of a single string whose length is >22n. Next we indicate how the above regular expression is used to obtain a short regular expression for NVCOMP(zIz2 . . . 2), of Tm's using 2" tape squares. A close inspection of the proof of Theorem 3.5 shows that NVCOMP(zlzz.. . 2,) = Ri U R2 U Ra and that the length of R1 and Rz grows linearly with n and does not depend -I
+
STRUCTURE OF FEASIBLE COMPUTATIONS
23
on the amount of tape used by the Tm. Only R3 has to take account of the amount of t'ape used in the computation since R3 takes care of all the cases where an error occurs betxeen successive instantaneous descriptions. In the proof of Theorem 3.5, we simply wrot,e out the right number of tape symbols between the corresponding places where the errors had to occur. Namely, we uscd the regular expression rnk-z
as a "yardstick" to keep the errors properly spaced. The basic trick in this proof is t o have short rcgular expressions for very long yardsticks. As indicated above, by means of complements we can write a regular expression for VCOMP(zlzz . . . z,) of M i which grows linearly in n and consists of a single sequence of length 22.. With a few ingenious tricks and the accompanying messy technical details we can use this regular expression as a yardstick to keep the errors properly spaced in a T m computation using more than 2" tape. Thus a regular expression which grows linearly in length can describe T m computations using 2" tape. By iterating this process, we can construct for any k a regular expression whose length grows polynomially in n and which describes computations of Tm's using more than
.
"/ k
tape squares for inputs of length n. From this we conclude b y the previously outlined reasoning that tR cannot be recognized by any T m using tape bounded by an elementary function. For the sake of completeness, we will mention a result (Hunt, 1973a) about regular expressions without -I ,but with u, , *, and n.
.
Theorem
3.9: Let LEA = {
(Ri,Rj) I Ri,Rj regular expressions over 0, 1, U,
., *, n, (,I
andL(R4 Z L(Rh)
Then the recognition of LR requires tape L ( n ) 2 2"'12. From the two previous results, we see that the additional operators in regular expressions added considerably to the descriptive power of these expressions, in that the added operators permitted us to shorten regular expressions over U, ., * and that this shortening of the regular expressions is reflected in the resulting difficulty of recognizing LR and La4, respec-
24
J. HARTMANIS AND J. SIMON
tively. Thus, in a sense, these results can also be viewed as quantitative results about the descriptive powers of regular expressions with diff erent operators. Finally we note the rather surprising rrsult that for unrestricted regular expressions over single letter alphabets the equivalence is decidable in elementary tape. Theorem 3.1 0: The language
LgLA = { (Ri, Rj) 1 El,, Rj regular expressions over 1, U,
., *, 7 ,(,)
and L ( R i ) = L ( R j ) )
can be recognized on
L ( n ) 5 2"" tape. Proof: See Range1 (1974) for an e1emcnt)ary bound, which can be improved to the above result (Meyer, personal communication).
In the previous proofs me established the complexity of the recognition of unequal pairs of regular expressions by the following method: we described valid or invalid Tm computations by regular expressions and then related the efficiency of describing long T m computations by short regular expressions to the complexity of the decision problem; the more powerful the descriptive power of our expressions (or languages) the harder the corresponding decision problem. We can actually state this somewhat more precisely as follows. Heuristic Principle: If in some formalism one can describe with expressions of length n or lcss T m computations using tape up to length L (n),
then the decision procedure for equality of these expressions must be of a t least tape complexity L (n). For example, if a formalism enables us to state that a T m accepts an input of length n using tape a t most 2", and the length of such an expression is n2,then any procedure that decides equality of two such expressions will have tape complexity of at least 2n1". In Hunt (1973a) it is shown that regular expressions over 0, 1, U, n, * are such a formalism-from which Theorem 3.9 follows. Clearly, this principle also implies that if the formalism is so powerful that no computable function L (n) can bound the length of tape used in T m computations which can be described by expressions of length n, then the equivalence problem in this formalism is recursively undecidable. Thus this principle gives a nice view of how the expressive power of languages a,
STRUCTURE OF FEASIBLE COMPUTATIONS
25
escalates the complexity of decision procedures until they become undecidable because the length of the T m computations is no longer recursively bound t o the length of the expressions describing them. Thus a formalism in which we can say “the ith T m halts,” so that the length of this formula grows recursively in i, must have an undecidable decision problem. Very loosely speaking, as long as we can in our formalism make assertions about T m computations, without describing the computations explicitly, we will have undecidable decision problems. As long as we must describe the T m computations explicitly, the equivalence problem will be soluble, and its computational difficulty depends on the descriptive power of the formalism. Some of the most interesting applications of this principle have yielded the computational complexity of decision procedures for decidable logical theories. The results are rather depressing in that even for apparently simple theories the decision complexity makes them practically undecidable. We cite two such results. We first consider the decision procedures for Pressburger arithmetic (Pressburger, 1930). We recall that Pressburger arithmetic consists of the true statements about integer arithmetic which can be expressed by using successor function S, addition, arid equality. More formally, the theory is given by the axioms of first-order predicate logic augmented by
(x = Y)
-+
(Sb)
S(z) = S ( y )
---f
= S(Y))
(x = 2/)
S(5) # 0
x+o=x x
+ S(Y)
=
S(x
+ ?/I
where x is not free in A and AIr](y) means y substituted for every occurrence of x in A (induction scheme). The theory can express any fact about the integers that does not involve multiplication. For example, by writing,
S ( S . . . S ( 0 ) . . .)
iti;nea we get a formula that denotes the integer i; by writing
x+x+
. . . +x
n time8
26
J. HARTMANIS AND J. SIMON
where x is a formula, we may denote ns (i.e., multiplication by a constant) z = t ] , we express the fact t,hat s 5 t , and by writing by writing (3s)[s
+
( 3 z ) [ ( r= s f z + s
+ . . . + s),V
(s = r
+z +x+
. . . + s)] <
n time8
n times
we are stating that r = s (mod n ) . It is a famous result of Pressburger (1930) that this theory is decidable: basically the reason is that all scntenccs of the theory can be put effectively into the form of a collection of different systems of linear diophantine equations, such that the original sentence is true iff one of the systems has a solution. Since linear diophantine equations are solvablc, the theory is decidable. The transformation into and solution of thc equations is costly in terms of spacc and time: the best known algorithm (Oppen, 1973) has an upper bound of 2 ~ log n n
22
on the deterministic time and storage required for a sentence of length n ( p is a constant greater than 1). Recently it has been shown (Fischer and Rabin, 1974) that any decision procedure will require a t least a super-exponential number of steps. More precise1y, Theorem 3,l 1 : Lct M be a Tm (possibly nondet,erministic) recognizing the true theorems of the Pressburger arithmetic. Then there exists a constant c > 0 and an integer no,such that for all n > no there is a formula of length n which requires M to perform 22'" steps to decide whether the formula is true, The proof of this rcsult is technically quite messy, but again follows the principle of describing by short formulas in Pressburger arithmetic long T m computations, thus forcing the decision procedure t o be complex. Next we look a t a surprising result due to Meyer (1973) about the decision complcxity of a dccidable sccond-order theory. A logical theory is second-order if we have quantifiers ranging over sets in the language. It is weak second-order if set quantifiers range only over finite sets. All sccond-order theories have, in addition to first-order language symbols, a symbol, e.g., C , t o denote set membership. The weak monadic second-order theory of one successor has the two predicates
[x
= S(y)] ( o r s = y
+ 1)
and
[y
E
XI
with the usual interpretation. It was shown to be decidable b y Buchi (1960) and Biichi and Elgot (1959). We shall abbreviate the theory by WSlS, and the set of sentences of its language by LSIS.
STRUCTURE OF FEASIBLE COMPUTATIONS
27
Theorem 3.12: Let M be a Tm which, started with any sentence of Lsls on its tape, eventually halts in a designated halting state iff the sentence is true. Then, for any k 2. 0, there are infinitely many n, for which M’s computation rcquires more than
steps and tapc squares for some sentence of length n. I n other words, the decision procedure is not elementary recursive. These results actually hold already for small n (of the order of the size of the Tm) . Since they hold for the amount of tape used, the same bounds apply to lengths of proofs, in any reasonable formalism. Therefore, there are fairly short theorems in thesc theories (less than half a page long) that simply cannot be proved-their shortest proofs are too long ever to be written down! The implications of these “practical undecidability” results are not yet well understood, but their philosophical impact on our ideas about formalized theories may turn out to be comparable to the impact of Goedel’s undecidability results (Rogers, 1967). 4. Nondeterministic Tape Computations and the Iba Problem
It is known, as pointed out before, that P T A P E = N P T A P E and that a nondeterministic L (n)-tape-bounded computation [ L (n) 2. log n ] can be simulated deterministically on L 2 ( n ) tape (Savitch, 1970). On the other hand, it is not known whether we cannot do better than the square when we go from deterministic to nondeterministic tape-bounded computations. As a matter of fact, we do not know whether we cannot eliminate nondeterminism in tape-bounded computations by just enlarging the tape alphabet and not the amount of tape used. This problem of how much memory we can save by using nondeterministic computations has been a recognized open problem since 1964 when it first appeared as a problem about context-sensitive languages or linearly bounded automata (Myhill, 1960; Landweber, 1963; Kuroda, 1964; Hartmanis and Hunt, 1974). For the sake of completeness we recall that a linearly bounded automaton (Iba) is a one-tape Turing machine whose input is placed between endmarkers which the Tm cannot go past. Thus all the computations of the
28
J. HARTMANIS AND J. SIMON
lba are performed on as many tape squarcs as are needed to write down the input, and since the Iba can have arbitrarily large (but fixed) tape alphabet, we sec that the amount of tape for any given lha (measured as length of equivalent binary tape) is linearly bounded by tho length of the input word. If the Tm defining the lba operates deterministically, we refer to the automaton as a deterministic lba, otherwise as a nondeterministic Eba or simply an lba. Since the connection betwen linearly bounded automata and contextsensitive languages is well known (Hopcroft and Ullman, 1969), we will also refer to the languages accepted by nondeterministic and determinktic lbn’s as nondeterministic and deterministic contcxt-sensitive languages, respectively. Let the corrcsponding families of languagm bc denoted by N D C S L and DCSL, respectively. Thc lba problem is to decide whether N D C S L = DCSL. It is also an open problcm to decide whether the nondeterministic context-sensitive languagcs are closcd undcr complementation. Clearly if N D C S L = DCSL, then thcy arc closed under cornplcmentation, but it still could happen that NDCSL # DCSL and that the context-scnsitive languagcs are closed under complementation. We now show that thcra exist time and tape hardest recognizable context-sensitive languages. That is, the family N D C S L has complete languages and, as a matter of fact, we have already discussed such languages in this paper. Recall that
LB*
=
{ R l I €2, regular expression over 0, 1, U,
*,
(,) and L(R1) # (0 IJ l)*j
and let JJLBA
=
(#M,#CODE(nzz . . . z,)#
I z1z2. . , z,
is accepted by lba Mi).
Theorem 4.1:
1. DCSL = N D C S L
iff LR* is in D C S L iff
LLBA
is in DCSL.
2. L in N D C S L implies L in N D C S L
3. DCSL
E NP(P)
-
iff LLBAis in NDCSL.
iff L L ~isAin N P ( P )
iff LRis in N P ( P ) .
The proof is quite similar to the previous proofs that LUTand LR*are complete in PTAPE. It is interesting to note that if LLBA of LR* can be rccognized on a deterministic lba, then all nondeterministic tape computations using Li ( n ) 2 n
STRUCTURE OF FEASIBLE COMPUTATIONS
29
tape can be replaced by equivalent deterministic computations using no more tape. Furthermore, there is a recursive translation which maps the nondeterministic Turing machines onto the equivalent deterministic Turing machines.
DCSL = N D C S L iff there exists a recursive translation such that for every nondeterministic T m M i which uses Li ( n ) 2 n tape, Me(ilis an equivalent deterministic Tm using no more than Li (n) tape. Corollary 4.2:
u
Proof: The proof is similar to the proof of Theorem 2.3 (for details, see Hartmanis and Hunt, 1974).
From the above results we see that if DCSL = N D C S L then all other deterministic and nondeterministic tape-bounded computations using more than a linear amount of tape are the same. On the other hand, we have not been able t o force the equality downward. For example, we have not been able to show that if all deterministic and nondeterministic tapebounded computations using L i ( n ) 2 2" tape are the same, then D C S L = NDCSL. Similarly, it could happen that DCSL = N D C S L , but that the logn tape-bounded deterministic languages are properly contained in the nondeterministic log n tape-bounded computa,tions. Note also that a t the present time, we do not know whether P G N D C S L or N D C S L C P or whether these families of languages are incomparable under set inclusion. Similarly, we do not know any set containment relations between the families N P and N D C S L or DCSL. We only know that thcse families cannot be equal (Book, 1974). Theorem 4.3:
N D C S L # P , DCSL # P , N D C S L # N P , and DCSL #
NP. Proof: We will prove only the case N D C S L # P ; the others follow by similar, if somewhat more complicated arguments. Assume that N D C S L = P. Then the universal NDCS language LLBAis acccptable by a deterministic T m MLBAin time nko for some integer ko > 0. Let' I, be any language in N D C S L accepted by the nondeterministic lba M . Then the mapping
..
~ 1 ~ 2 X . .
. . . x,)#
+ #M#CODE(X~XZ
reduces L to L L B A . Furthermore this reduction can be performed by a deterministic T m in linear time. Combining this T m with MLBA,we get a deterministic T m which accepts L in time c.nko. Since L was an arbitrary member of N D C S L , we see that all context-sensitive languages are recognizable in time nkofl. Because there exist languages acceptable in time nko+*,but not in time nko+*,we conclude that P # NDCSL, as was to be shown.
30
J. HARTMANIS AND J. SIMON
It is worth mcntioriing that Greibach (1973) has exhibited a contrxtfree language (cfl) which plays the same role among contcxxt-free languages as i l L n n docs for contcxt-sensitive languages. Namely, this context-free language is the hnrdcst time and tape recognizable cfl, and there also exist two rccursivc translations mapping contcxt-free grammars into Turing machines recognizing the language generated by thc grammar in the minimal time and on the minimal amount of tape, respectively, though a t this time we do not know what is the minimum time or tapr required for the recognition of context-free languages.
5. Random Access Machines
In this section we study random access machines, which have been proposed as abstract models for digital comput,ors and which reflect many aspects of rc'a1 computing more directly than Turing machincs do. On the other hand, as will be seen from the results in this section, the study of tho computational powcr of random access machincs witjh differcnt instruction sets leads us right back to the central problems which arose in the study of Tm computations. Thus, quite surprisingly, we will show that the differonce in computing powcr of polynomially time-bounded R A M's with and without multiplication is characterized by the diffcrcncc between P T I M E and PTAPEforTm's. More specifically, it is known that thc computation time of random acccss machincs without multiplication is polynomially rclatcd to thc cquivalent T m computation time, and vicc versa. Thus the question of whether the dctcrministic and nondeterministic polynomially time-bounded random access machine computations arc the same is equivalent to the question of whethcr P = N P for Tm computations, a problem we discussed before. I n contrast, when we consider random access machines with the powcr to multiply in unit time, the situation is completely different,. We show that for thesc deviccs nondeterministic and deterministic computation time is polynomially related and therefore for random access machines with built-in multiplication, P = N P (Simon, 1974; Hartmanis and Simon, 1974). Furthermore, wc give a complcte characterization of the computational powcr of these devices: thc family of languages acceptcd in polynomial time by random acccss machines with multiplication is exactly PT A P E , the family of languages accepted by Tm's in polynomial tape. Thus the additional computing power that a random access machine with multiplication has over such a machine without multiplication is characterized by the difference between P T I M E and P T A P E for T m computations. Recall that we do not know whether P T I M E # P T A P E and therefore,
STRUCTURE OF FEASIBLE COMPUTATIONS
31
multiplication could be simulated in polynomial time by addition and Boolean opera,tions iff PTIME = PTA P E ; again, an open problem which we have already discussed. For related results about other random access machine models and for more detailed proofs than given in this paper, see Hartmanis and Simon (1974) , Pra$$t et al. (1974) , and Simon (1974). To make these concepts precise, wc now describe random access machines, RAM’S, with diff went operation sets and step counting functions. Note that, we again consider thesc devices as acceptors. Definition: A RAM acceptor or RAM with instruction set 0 is a set of registers Ro, R1, . . . , each capable of storing a nonnegative integer in binary represent>ation,together with a finite program of (possibly labeled) O-instructions. If no two labels are the same, we say that the program is deterministic; otherwise it is nondeterministic. We call a RAM model deterministic if we consider only deterministic programs from the instruction set.
Our first instruction set consists of the following: 0 1
Ri+ Rj ( = k ) Ri +- (Rj)and ( R i )+- R j Ri +- Rj -I- Rk Ri +- R, A R k Ri +- Rj boo1 Rk if Ri comp Rj label 1 else label 2 accept reject
(assignment) (indirect addressing) (sum) (proper substraction) (Boolean operations) (conditional jump)
comp may be any of <, 5 , =, 2 , >, # . For Boolean operations, we consider the integers as bit strings and do the operations componentwise. Leading 0’s are dropped a t the end of operations: for example, 11 nand 10 = 1. boo1 may be any binary Boolean operat.ion (e.g., A , V , eor, nand, 3,etc.) . accept and reject have obvious meanings. An operand of = k is a literal and the constant k itself should be used. The computation of a RAM starts by putting the input in register Ro, setting all registers to 0 and executing the first instruction of the RAM’S program. Instructions are executed in sequence until a conditional jump is encountered, after which one of the instructions with label “label 1” is executed if the condition is satisfied and one of the instructions with label “label 2” is executed otherwise. Execution stops when an accept or reject instruction is met. A string x E {O, 1) * is accepted by the RAM if there is a finite computation ending with the execution of an accept instruction.
32
J. HARTMANIS AND J . SIMON
The complcxity measures definchd for R A M’s arr :
(Unit) time measure: the complexity of an accepting computation is the number of instructions executed in the accepting sequcncc. The complexity of thc R A M on input z is the minimal complcxity of accepting computations. Logarzthmic, or length tame measure: the complexity of an accepting computation is the sum of the lengths of thc operands of the instructions executed in the accepting sequence. When there are two operands, we take the length of the longer; uhcn an operand has length 0 we usc 1 in the sum. The complexity of the R A M on input x is the minimal complexity among accepting computations. Memoriy measure: the maximum number of bits used a t any time in the computation. (The numbcr of bits used a t a given time is the sum of the number of significant bits of all registers in use a t that time.) Unless otherwise statcd, time measure %ill mean unit time measurc. We shall call RAM’s with instruction set 0 1 lZAM1’s, or simply RAM’s. For a discussion of R A M complexity m(’asures, see Cook (1972) or Aho et al. (1974). We will consider another instruction sot O2: 0 2
is O1 plus the instruction
.
R, t R, Izk (product) which computes the product of the t u o operands (which may be literals) and stores it in R,. RAM’S with instruction sot 0 2 will be called MRAM’s (M for multiplication). We denote by PTIME-MRAM and by N P T I M E - M R A M , respectivcily, the families of languages acccpted in polynomial time by deterministic and nondeterministic MRAM’s. We shall outline below the proof of the main results about MRAM’s. Theorem 5.1:
P T A P E 2 NPTIME-MRAM.
Theorem 5.2:
PTIME-MRAM 2 P T A P E .
Thus for MRAM’s 11.c have that deterministic and nondeterministic polynomial time computations are the same. Corollary 5.3:
PTAPE
=
NPTAPE.
This follows from the fact that the proofs of Theorems 5.1 and 5.2 actually imply that P T A P E 2 NPTIME-MRAM and that P T I M E - M R A M
2 NPTAPE. We now sketch a proof of Theorem 5.1.
33
STRUCTURE OF FEASIBLE CONPUTATIONS
Supposc the M R A M M opcratcs in timc nk,wherr n is the length of the input. Our Tm simulator T will write out in one of its tapes a guess for the srquence of operations exccuted by M in its accepting computation and check that the scquencc is corrcct. Thc scqucncc may be written down deterministically, by enumerating all such sequenccs of length nk in alphabrtical order. Since t h r number of instructions of M’s program is a constant, thc sequence will be of length cnk for somr constant c. To vcrify that such a scqucncr is indeed an accepting computation of M wc need to check that onr step follows from thc previous one when M’s program is executed-which is only a problem in th r caw of conditional instructions, when wr must find out thc contents of a register. We shall define a function F I N D ( r , b, 1 ) which will return th r value of the bth bit of register T at time t . Our throrem will b r provrd if this function is computable in polynomial tapr-the subject of the rrmainder of this part. Note that sincc we arr testing for an accepting scquencr, it does not matter whcther we are simulating detrrministic or nondrtcrministic machines. First, let us prove that thc arguments of F I N D may be written down in polynomial tape. Note that in t operations, the biggest possible number that may be genrrated is azt, produced by successive multiplications: a, a2, a2.~,?= a4, a4.a4 = as, . . . , aZt,wherc a is the maximum of x and the biggest literal in M’s program. To address a bit of it, we need to count up to its lmgth, that is, up to logz(aZ1)= 2 t logza, which may be done in space logZ(2$logza). In particular, for t = nk, space nk+lwill suffice, so that b may be writtcn down in polynomial tapc. Clearly, t may also be written down in polynomial tapr. Therc is a small difficulty with r : due to indirect addressing, M might use high-numbcrcd registers, even though it uses only a polynomial number of them. Howrvrr, by using a symbol table, a t a cost of a squaring of thc running time, n e may assume that a machine operating in timc t uses only its first t registers. It is clear that in that case r may be written down in polynomial tapc. Now Ict us describe F I N D and prove that it operates in polynomial tapr. Informally, F I N D works as follows: FIND(r, b, 0 ) is casily computed given the input. We shall argue inductively. F I N D ( r , b, 1 ) will be computed from previous values of FIND-clearly the only interesting case is when r was altered in the previous mow. For example, if the move a t t - 1 was T + p V s, thrn F I N D ( r , b, t ) = F I N D ( p , b, t - 1) V F I N D (8, b, t - 1). This recursion in time does not cause any problrms, because we may first compute F I N D (p , b, t - 1) and then reuse the tape for a call of F I N D (s, b, t - 1),so that if Llis the amount of tape needed to compute FIND’S for time up to t - 1, we havc the recurrence I t = It-1 c(Z0 = cnk+l) which has the solution 1, = c‘nk+l
+
34
J. HARTMANIS AND J. SIMON
In the case of multiplication of two Z-digit numbers, we may have to computc up to 2 factors and get the carry from the previous column in ordcr to obtain the desired bit. Sincc 1 may be Pk,we must be able to takc advantage of the regularity of operations in order to be able to compute within polynomial tape. Also, the carry from thc previous column may be quitc large: in thc worst case, when we multiply (1)t by (1)i the carry may be 1. This is still managcable, since in time nk,Z 5 2nk1a n accumulator of length nk will suficc. Wc also need to gencrate up to I pairs of bits, multiply thcm in pairs, and add them up. This may be done as follows: we store the addresses of the two bits being computed, compute each of the two bits of the product scparatcly, multiply the two results and update the addresses to get the addrcsses of the two bits of the next product. The product is added to an accumulator and the process is repeated until all product terms have been computed. Then we need the carry from thc previous column. We cannot compute this carry by a recursive call of FIND, because since the length of the register may be exponential, keeping track of the recursion would take exponential tape. Instead, we compute thc carrics explicitly from the bottom up-i.e., we first compute the carry a t the rightmost column (finding thc bits by recursive calls of FIND on pairs and multiplying them), and then, with that carry and FIND, we compute the carry from the second rightmost column, and so on. The spacc needed is only for kecping track of which column we are at, one recursive call of FIND, one accumulator, and one previous carry holder. Each of thesc may be written down in spacc nk+l,so that we have the recursion
It =
Zr-l
+ cnk+'
with la = n
which implies I, 5 ~ n * ~ and + l , the simulation of multiplication may be is similar but much carried out in polynomial spacc. The argument for easier, since only 2 bits and a carry of at most 1 are involved. With the above comments in mind, it is easy to write out a complete simulation program and see that it runs in polynomial space. This ends the proof of our theorem, i.e.,
+
Theorem 5.1: Polynomial time bounded nondeterministic M R A M recognizable languages arc recognizable in polynomial tape by Turing machines.
Now we sketch the ideas behind the proof of Theorem 5.2. They are basically a set of programming tricks that enable us to do operations in parallel very efficiently. To simplify our proof we will use a special R A M model referred to a s CRAM (for concatenation). A C R A M is a R A M with the ability to con-
35
STRUCTURE OF FEASIBLE COMPUTATIONS
catenate the contents of two registers in one operation, and also has the operator SUBSTR(A, B ) which replaces A by the string obtained from A by deleting the initial substring of the length 1, where I is the length of B. It can be seen that CRAM computations may be simulated easily by MRAM’s and that SUBSTR is not essential to the construction. For any given T m T operating in polynomial tape on input 2, a CRAM can first generate all possible configurations of this T m computation (a configurat,ion of T on input x consists of the state of T , the contents of the work tape and the positions of T’s heads). From this set of all possible configurations, the CRAM can obtain the matrix of the relation “follow in one move”-i.e., if A is the matrix of the relation, then aif = 1 iff T passes from the ith t o the j t h configuration in one move. Clearly, 2 is accepted by T iff ub*, = 1, where A* is the transitive closure of A and b and e are initial and accepting final configurations, respectively. First we indicate how to compute efficicntly the transitive closure of a matrix A . We suppose that initially the whole matJrixis in a single register. Remember that A* = I v A v A* V A 3 . . . V A n V . . . , where A is n by n and A i is the i t h power of A in the “and-or’’ multiplication (i.c., if C = A .B, cij = a i k A b k l ) . Moreover, we may compute only
cE==l
( I V A ) , ( I V A ) ’ , ( I V A ) ’ . ( I V A ) ’ = ( I V A ) 4 ,. . . where the exponent of ( I V A ) is a power of 2. Since there are only log n of these [ ( I V = ( I V A ) ”1,transitive closure of n by n matrices can be done in time log n t,imes the time for multiplication. Throughout this proof, “multiplication” will mean “ A ” and “multiplication of matrices,” “and-or” multiplication. Also, for simplicity, we assume n to be a power of 2. To multiply two matrices efficiently, we observe that if we have several copies of the matrix stored in the same register in a convenient way, we can obtain all products in a single ( ‘ A ’ ’ operation: all we need is that for all i, j , and k , a ; k be in the same bit position as bk,. For example, if we have (row 0 of A)”(row 1 of A ) n . . . (row n - 1 of A ) ”
- (ao,oa0,1. * . ao,n--l)”(al,oal,l. . . al,n-l)n (an-l,Oan-l,l
...
. . an-l,n-l)n
in one register [where (row i)” means n-fold concatenation] and
. . . (column n ( b o , d , ~. . h - l , l ) . .
[(column 0 of b ) (column 1 of B ) - [ ( ~ o , o ~ I , *o
- . bn-l,o)
*
- 1 of (b0,n-l
B)]n *
- - bn-I,n-1)ln
in the other, t’he( ( A ’ ’ of the two registers yields all terms a i k A b k j . Supposing we are able to produce these forms of the matrices easily, all we have
36
J. HARTMANIS AND J. SIMON
to do is collect terms and add ( V ) them up. To collect terms, if we are able to take advantage of the parallel operations at their fullest, we should not have to do more than log n Operations, since each cij is the sum of n products. Note that in our case c0.0 is the sum of the first n bits, co,l of the next n, and in general c i j is the sum of bits in2 j n to in2 ( j 1)n - 1. We use the following idea: to add up a row vector of bits, take the second half of thc row, add it in parallel to the first half and call the procedure recursively for the new first half. The reader is encouraged to write a routine, using the mask M’ = On’21n’2 to select the second half (n is the length of the vector) and prefixing strings of 0’s to registers to get proper alignment. It is possible to design the algorithm in such a way that this procedure may be done in parallel to several vectors, stored concatenated to each other in a single register. In particular, if one starts with n2 copies of thc mask M‘, then the following procedure obtains all terms of the matrix product C = A B from all the products aik A b k , :
+
+ +
ADDUP:PROC
M
= (On/z.ln/2)n2
K
=
n/2
while K
1 1 do
B = A A M A
=
( ( O K - A )V B ) A M
K
=
K/2
M
=
(OK*M)A M end
end:ADZ)UY ADDUP uses Ox and K / 2 as primitive operations, but K / 2 = SUBSTR ( K , 1) and OK may be obtained by successive concatenations of a string with itself: after p steps we get a string of 0’s of length 2p. I n order to perform matrix multiplication, one must be able to expand matrices from some standard input form into the two forms we need in forming the product. In addition, for transitive closure, we must “pack” the result back into standard form. We do not give the details: the sort of programming is illustrated by ADDUP. Basically one uses masks and logical operations to get the required bits from their original places, then using concatenations one “slides” a number of them simultaneously to where they belong. The masks and “sliding rules” are updated and the process
STRUCTURE OF FEASIBLE COMPUTATIONS
37
repeated. It can be shown that all these operations require only time polynomial in the logarithm of the size of the matrix. I n fact, transitive closure of n x n Boolean matrices may be found in 0 (log2n) CRAM moves. We still have to convince the reader that given a polynomial tape bounded Tm with input x, we can obtain the matrix of the "follow in one move" relation easily. We shall do this in an even sketchier way than our exposition of the method for computing transitive closures. If a Tm operates on an input of length n in tape nk,there are a t most 0 ( 2 c n k ) different configurations. Let us take a convenient encoding of these in the alphabet { 0 , 1 ] and interpret the encodings as integers. By convenient cncoding we mean one that is linear in the length of the tape used by the machine, where the positions of the heads and the state may be easily found, and which may be easily updated. Then, if we generate all the integers in the range 0 to (2cnk - 1) (where c depends only on the encoding) we shall have produced encodings of all configurations, together with numbers that are not encodings of any configuration. The reader might amuse himself by writing a CRAM program that produces all integers between 0 and M = 2 p - 1 in time p . Now, in the operation of the Tm, the character under the read-write head, the two symbols in the squares immediately to the right and left of it, the state of the finite control and the position of t.he input head uniquely determine the next configuration [ie., CORRECT (u102u3)]. This is the sort of localized change that may be checked by Boolean operations. More precisely, it is not hard to write a CRAM routine that checks that the configuration c; follows from configuration c j in 0 (I) moves, where 1 is the length of the configurations. Moreover, the operations executed by the program do not depend on the contents of c; or ci-in particular, it may be adapted to check whether for vectors of configurations c i t t = 0 , 1, . . . , p , cjk k = 0 , 1, . . . , p , C j k follows from cil still using only O ( 1 ) moves. Now the way to generate the transition matrix in time O(nZk), where n is the length of the input, is (a) we generate all integers in the range 0 - (2nk - 1) and call these configurations c;. (b) As in the matrix product routine, we form ( c o ) (cl) . . . ( ~ ~ - 1m,) where m = 2nk and (c;) means m-fold concatenation, and (cocl . . . cm-l)m in 0 (log m) = 0 (nk) operations, and in 0 (nk)operations determine simultaneously for all i and j whether c j follows from ci (i.e., obtain a vector of bits which is 1 iff c j follows from c i ) . This completes the description of our simulation algorithm: put,ting everything together we have a procedure which runs in polynomial time, since the matrix may be computed in 0 ( (log 2cnk) z, moves and its transitive closure in 0 ( (log 2enk)z, = 0 (nzk)moves. This completes the outline of the proof for the special CRAM used.
38
J. HARTMANIS AND J. SIMON
Let us restate the results of this section. We defined a rcasonuble R A M model-the MRAM-that has multiplication as a primitive operation and proved two important facts about its powcr as a recognizer: I . Deterministic and nondeterministic time complexity classes arc polynomially related, i.e., PTIME-MRAM = N P T I M E - M R A M . 2. Time-bounded computations are polynomially related to T m tape, i.e., PTIME-MRAM = P T A P E . Sincc it can be proved that R A M timc and Tm timc are polynomially related, we have also proved 3. R A M running times with and without multiplication are polynomially related if and only if Tm time and tape measures are polynomially related, i.e., P T I M E = P T A P E iff PTIME-MRAM = PTIME-RAM.
This last observation is interesting, since it scems to imply that the elusive difference between time and memory measures for Tm’s might perhaps be attacked by “algebraic” techniques developed in “low level” complexity theory. We also note that RAM’S may simulate MRAM’s in polynomial time, as long as MRAM’s operate in polynomial space and time. Therefore, MRAM’s are morc powerful than RAM’S if and only if the unit and logarithmic time measures arc not polynomially related-i.e., if (in our “polynomial smearing” language) the two are distinct measures. Many “if and only if” type corollaries follow, in the same vein, from 1, 2 , and 3. For example: Corollary 5.4: The set of regular expressions whose complements are nonempty (i.e., La* of Section 3) is accepted in polynomial time by a deterministic T m iff every language recognized by an M R A M in polynomial time is rccognizcd by a deterministic R A M in polynomial time.
The reader may write many of these; some of them sound quite surprising a t first. Minsky (1970) suggcsted that one of the objectives of theoretical computer science should be the study of trade-offs (e.g., between memory and time, nondcterminism and time, etc.) . Our constructions trade exponential storage for polynomial time (simulation of Tm’s by MRAM’s) and polynomial tape for exponential time in the other simulation. Whether this trade-off is rral or thr rrsult of bad programming is not known, since P = PTAPE? is an open problem. If P # P T A P E , then P T A P E would provide us with a class of languages which have a trade-off property: they may be recognized either in polynomial time or in polynomial storage, but not simultaneously
.
STRUCTURE OF FEASIBLE COMPUTATIONS
39
Corollary 5.5: PTIME # PTAPE iff there exists a language L which can be recognized by MRAM’s in polynomial time and polynomial memory, but not simultaneously.
Note that if such a n L exists, any tape-complcto problem may be chosen to be it, for example LR*. As we saw, if MRAM’s are different from RAM’S, they must use more than a polynomial amount of storage (in our simulation, it was an cxponential amount). This suggests asking whether it is sufficient to have a RAM and exponential tape to get an MRAM’s power, or, equivalently, to look a t operations that make RAM-PTIME classes equivalent to PTAPE. The answer is that almost anything that expands the length of the registers fast enough will do, as long as we have parallel bit operations: multiplication, concatenation, and shifting all have this property. I n particular, concatenation, tests, and parallel bit operations (no indirect addressing) will do. On the other hand, adding more and more powerful operations (indirect addressing, shifts by shift registers, division by 2, SUBSTR, multiplication, integer division) do not make the model more powerful, once we have a fast memory-augmenting device. The stability of this class of RA M’s makes them a nice characterization of memory-bound complexity classes. See Pratt et al. (1974) for related results. Since we believe that P # N P (and therefore PTIME-RAM # NPTIME-RAM) but PTIME-MRAM = NPTIME-MRAM it seems interesting to ask what happens to the P = NP? question in an abstract setting, when we allow a fixed but arbitrary set of recursive operations in a single step. The surprising result (Baker et al., 1975) is that there are instruction sets for which P = N P and instruction sets for which P # NP-in other words, the problem becomes meaningless when asked in such a general setting. Since we have argued that P = NP? is a central problem of theoretical computer science, the result appears t o us to be a general warning that, by becoming too general too soon, we can “generalize away” the problems of interest to computer science, and wind up with uninteresting abstractions. I n particular, a proof that P # NP would have to deal somehow with the nitty-gritty combinatorics of the problem. We note that our technique for proving inclusion among sets-diagonalization-is usually insensitive to such details. The requirement that diagonal arguments be extremely efficient is peculiar to computer science, and the discovery of such a technique may be as big a breakthrough as the discovery of priority methods (nonrecursive but recursively enumerable diagonal methods) was in recursion theory.
40
J. HARTMANIS AND J. SIMON
6. Conclusion
The results described in this paper reveal some deep connections between a wide class of problems and identify somc central problems in computational complexity theory. Besidcs many other problems this work has revealed, the questions about
P I NP
PTAPE
must now be recognized as among the most outstanding problcms in quantitative theory of computing. The situation in this area of research has changed radically. Previously it was easy to state many unsolvcd problems which appeared to be hard and somc of which had remained unsolved for many years. On the other hand, it was hard to know which ones were important, and there was no real consensus about which ones should be singled out for a thorough investigation. Now wc know a t least a few outstanding problems and see their importance very clearly. We can cxpect that these problems will focus research effort on thcmselves in a way no other problems in theoretical computer science have been able to do bcforc. We are also convinced that their solution will add considerably t o our understanding of the quantitative nature of computational processes and may influence practical computing. With our present knowlcdge, we have to conjecture that P , N P , and P T A P E arc all different classes of languages. On the other hand, it would bc very exciting if this were not the case, since a proof to this effect would have to rcveal something quite uncxpccted about thc nature of computing. It may also turn out that the P = NP? problem cannot bc solvcd a t all and that eventually it will be shown, for example, that it is independent of the axioms of number theory. We regard this as an unlikely possibility, but so far it cannot be excludcd. Our gcncral belief is that during the next decade these problems will be much better understood, but it is conccivable that in this time the P = N P ? problem will not be solved in spite of a concentrated research effort. The P = PTAPE? problem seems more likely to bc resolved in the not too distant future. It is also interesting to note that the P = NP? problem cannot be gcneralized (abstracted) very much. Already for random access machines with multiplication we have that PTIME-MRAM = NPTIME-MRAM and, as mentioned in the text, there exist recursive operations which, added to the random access machine opcrations, guarantee that the corresponding P and N P classes are different. Thus we see that the P = N P ? problem becomes meaningless under too much “abstraction.” This may bc a gcncral warning for computer science research that too much generality in problem formulation may throw away the essence of the problem that
STRUCTURE OF FEASIBLE COMPUTATIONS
41
one is trying t o understand. We do not doubt that abstraction and generalization will play an important role in theoretical computer science. At the same time, the P = NP? question shows that this problem becomes uninteresting if generalized too far. On the other hand, it is impressive that the introduction of the classes P and N P and polynomial time reducibility revealed a beautiful unity among a wide range of combinatorial problems which had not been understood before. We believe that this case illustrates very nicely the need for finding in theoretical computer science just the right level of generality or abstraction so that one can see the unity of the field and gain generality and power in results, without allowing the essential computer science problems and the original motivation to slip out during the abstraction of the problem. Finding the right mathematical models to investigate in computer science remains one of the most important problems, and we have no doubt that the P = NP? problem is a successful formulation of a very important question. One should also note that the Turing machine model was appropriate for this research and that the P = NP? problem remains invariant under substantial changes of the underlying computer model. It is rewarding to note that certain topics studied in automata theory, such as regular expressions, gain new importance, and we see that a deep understanding of their computational nature can solve the P = N P = PTAPE? problem. Thus, even in this very new field of theoretical computer science, we see that research guided by the “inner logic” of the field can lead to central problems, whose practical importance may be recognized only later. We also believe that the results about the complexity of decision procedures of decidable mathematical theories will play an important role in future developments of computer science. It is quite surprising that these results were not sought after earlier by logicians and that they emerged from computer science. I n computer science, we expect, they will eventually lead to a much better understanding of what aspects of programming can and cannot be automated and how these tasks must be structured to facilitate their understanding and possible automation. We have had a reasonably good understanding of what is and what is not effectively computable; now we have seen the initial results of a theory of feasible decisions, which is likely to become an important part of theoretical computer science.
REFERENCES Aho, A., Hopcroft, J. E., and Ullman, J. D. (1974). “The Design and Analysis of Computer Algorithms.” Addison-Wesley, Reading, Massachusetts. Baker, T., Gill, J., and Solovay, R. (1975). Relativiaation of the P = ? N P questions. SOC.Ind. A p p l . Math. J . Comput. 4, 431442.
42
J. HARTMANIS AND J. SIMON
Book, V. R. (1974). Comparing complexity classes. J . Comput. Syst. Sci. 14, 213-229. Borodin, A. €3. (1973). Computational complexity-theory and practice. I n “Currents in the Theory of Computing” (A. V. Eho, ed.), pp. 35-89. Prentice-Hall, Englewood Cliffs, New Jersey. Borodin, A. B., and Munro, I. (1975). “Computational Complexity of Algebraic and Numeric Problems.” Amer. Elsevier, New York. Buchi, J. R. (1960). Weak second order arithmetic and finite automata. Z . Math. Logik Grundagen Math. 6 , 66-92. Buchi, J. R., and Elgot, C.C. (1959). Decision problems of weak second order arithmetics and finite automata. Part I. Amer. Math. SOC.Notices 5 , Abstr. 834. Cook, S. (1971). The complexity of theorem-proving procedures. Proc. Srd Annu. Ass. Comput. Mach. Symp. Theor. Comput. pp. 151-158. Cook, S. (1972). Linear time simulation of deterministic two-way pushdown automata. Injmm. Process. 71, 75-80. Even, S., and Tarjan, R. E. (1975). A combinatorial problem which is complete in polynomial space. Proc. 7th Annu. Amer. Math. SOC.Symp. Theor. Comput. pp. 66-71. Fagin, R. (1974). Generalized firsborder spectra and polynomial-time recognizable sets. SIAM-AMS (SOC.fnd. Appl. Math.-Amer. Math. SOC.)Proc. 7, 43-73. Fischer, M. J., and Rabin, M. 0. (1974). Super-exponential complexity of Pressburger arithmetic. SIAM-AMS (SOC.Ind. Appl. Math.-Amer. Math. SOC.)Proc. 7, 2 7 4 1 . Garey, M. R., Johnson, D. S., and Stockmeyer, L. (1974). Some simplified NP-complete problems. Proc. 7th Annu. Ass. Comput. Mach. Symp. Theor. Comput. pp. 47-63. Greibach, 8. (1973). The hardest context-free language. SOC.Ind. Appl. Math. J. Cornput. 2,304-310. Hartmanis, J., and Hopcroft, J. E. (1971). An overview of the theory of computational complexity. J . Ass. Comput. Mach. 18, 444-475. Hartmanis, J., and Hunt, H. B., 111. (1974). The 1ba problem and its importance in the theory of computing. SIAM-AMS (SOC.Ind. Appl. Math.-Amer. Math. SOC.) PTOC. 7, 1-26. Hartmanis, J., and Shank, H. (1968). On the recognition of primes by automata. J . Ass. Comput. Mach. 15, 382-389. Hartmanis, J., and Shank, H. (1969). Two memory bounds for the recognition of primes by automata. Math. Syst. Theory 3, 125-129. Hartmanis, J., and Simon, J. (1974). On the power of multiplication in random access machines. IEEE Conj. Rec. 15th Symp. Switching Automata Theory, 1974 pp. 13-23. Hopcroft, J. E., and Ullman, J. D. (1969). “Formal Languages and their Relation to Automata.” Addison-Wesley, Reading, Massachusetts. Hunt, H. B., 111. (1973a). On time and tape complexity of languages. Ph.D. Dissertation, Cornell University, Ithaca, New York. Hunt, H.B., 111. (1973b). On time and tape complexity of languages. Proc. 5th Annu. Ass. Comput. Math. Symp. Theor. Comput. pp. 10-19. Hunt, H. B., 111, and Szymanski, T. (1975). On the complexity of grammar and related problems. Proc. 7th Annu. Ass. Comput. Math. Sump. Them. Comput. pp. 54-65. Karp, R. (1972). Reducibilities among combinatorial problems. I n “Complexity of Computer Computations” (R. Miller and J. Thatcher, eds.), pp. 85-104. Plenum, New York. Kuroda, S. Y. (1964). Classes of languages and linear bounded automata. Inform. Contr. 3, 207-223. Landweber, P. S. (1963). Three theorems on phrase structure grammars of type 1. Inform. Contr. 2 , 131-136.
STRUCTURE OF FEASIBLE COMPUTATIONS
43
McNaughton, R., and Yamada, H. (1964). Regular expressions and state graphs. In “Sequential Machines: Selected Papers” (E. F. Moore, ed.), pp. 157-175. Addison-Wesley, Reading, Massachusetts. Meyer, A. (1973). “Weak Monadic Second Order Theory of Successor Is Not Elementary Recursive,” Proj. MAC T M 38. Massachusetts Institute of Technology, Cambridge. Meyer, A., and Stockmeyer, L. (1972). The equivalence problem for regular expressions with squaring requires exponential space. IEEE Conf. Rec. 13th Symp. Switching Automata Theory, 19’7.2 pp. 125-129. Meyer, A., and Stockmeyer, L. (1973). Word problems requiring exponential tape. Proc. 5th Annu. Ass. Comput. Mach. Symp. Theor. Comput. pp. 1-9. Meyer, A., and Stockmeyer, L. (1975). “Inherent Computational Complexity of Decision Problems in Logic and Automata Theory,” Lecture Notes in Computer Science. Springer-Verlag, Berlin and New York. Miller, G. L. (1975). Riemann’s hypotheses and tests of primality. Proc. ‘7thAnnu. Ass. Comput. Mach. Symp. Theor. Comput. pp. 234-239. Minsky, M. (1970). Form and content in computer science. J . Ass. Comput. Mach. 17, 197-215.
Myhill, J. (1960). Linearly bounded automata. W A D D Tech. Note 60-165. Oppen, D. C. (1973). Elementary bounds for Pressburger arithmetic. Proc. 5th Annu. Ass. Comput. Mach. Symp. Theor. Comput. pp. 34-37. Pratt, V., Stockmeyer, L., and Rabin, M. 0. (1974). A characterization of the power of vector machines. Proc. 6th Annu. Ass. Comput. Mach. Symp. Theor. Comput. pp. 122-134. Pressburger, M. (1930). tfber die Vollstandigkeit eines gewissen Systems der Arithmetik ganzer ZaNen, in welchem die Addition als einzige Operation hervortritt. C . R. Congr. Math. Pays Slues, lst, 1930 pp. 92-101. Rangel, J. L. (1974). The equivalence problem for regular expressions over one letter alphabet is elementary. IEEE Conf. Rec. 15th Symp. Switching Automata Theory, 19’74 pp. 24-27. Rogers, J., Jr. (1967). “Theory of Recursive Functions and Effective Computability.” McGraw-Hill, New York. Savitch, W. J. (1970). Relations between nondeterministic and deterministic tape complexities. J . Comput. Syst. Sci. 4 , 177-192. Simon, J. (1974). “On the Power of Multiplication in Random Access Machines,” T R 74-205. Department of Computer Science, Cornell University, Ithaca, New York. Steams, R. E., Hartmanis, J., and Lewis, P. M. (1965). Hierarchies of memory limited computations. IEEE Conf. Rec. 6th Symp. Switching Automata Theory, 1966pp. 179190.
Stockmeyer, L. J. (1974). “The Complexity of Decision Problems in Automata Theory and Logic,” Proj. MAC T R 133. Massachusetts Institute of Technology, Cambridge.
This Page Intentionally Left Blank
A Look a t Programming and Programming Systems T.
E. CHEATHAM, JR., and JUDY A. TOWNLEY
Center for Research in Computing Technology Horvard University Combridge, Massachusetts
1. Introduction . 2. Some Background . 3. Classes of Programs. . 3.1 Small-Scale Programs . 3.2 Medium-Scale Programs . 3.3 Large-Scale Programs . 4. Facilities for Small-Scale Programs . . 4.1 Extensible Languages . 4.2 Experience with the PPL System . 5. The EL1 Language and ECL System . 5.1 Some Preliminaries . 5.2 The Basic Language . . 5.3 Procedures in ECL . 5.4 Casestatement . 5.5 Extended Modes ; User-Defined Mode Behavior 6. Aids for the Nonexpert Programmer . . 6.1 Tailoring and Packaging Software . 6.2 Program Reduction: Closure . 7. Aids for the Production of Complex Programs . References .
. . . . . .
. . 9
,
. . . . . . . ,
. .
45 46 48 49 50 51 53 53 54 55 56 58 63 65 66
69 69 70
72 75
1. Introduction
Over the last several years considerable effort has been directed toward understanding and developing a spectrum of programming languages and systems. We have been involved in several such efforts, but now find ourselves maintaining a wholly different perception of programming and programming languages than we had at the outset. In this chapter we will examine this perception of the programming process and consider the implications for programming systems. 45
46
T. E. CHEATHAM, Jlt. AND JUDY A. TOWNLEY
To be somewhat more specific, we feel that a language alone (regardless of how well designed) is clearly not a sufficient goal. For any user of computers, whether with essentially toy problems or highly complex ones, a complete programming environment (including a variety of system aids and facilities) is wanted. The technology is available, but a slightly diffcrent approach to the process of program-building is required. Our comments are derived from experience with a number of modern programming systems, one of which we will describe in some detail. We have seen a very rapid growth in the number of uses and users of computers over the past few years. Many of these users are what we might term “casual”-in the sense that their primary interests are outside of computer science as a discipline. Some are students just learning about the computer and computing. Others are using the computer t o aid in the solution of a particular problem, the solution of which might be difficult or impossible without automatic computation. Along with this growth in the number of users, there has been mounting concern for the high cost of software, particularly as regards the development, and maintcnance of large programs or program systems. For example, it is now a widely accepted fact that the cost of software development and maintenance far exceeds that of hardware, and the ratio is growing rapidly. Also, it is now a matter of record that a large number of software development projects have simply failed-at least in the sense of even coming close to their original design goals.
2. Some Background
Over the past few years the notion of structured programming (or top-down programming) has gained currency as a method for alleviating many of these problems. Perhaps there is little that is basically new in this notion (most large programming projects which have been accomplished in a timely and economic fashion have used the notions of structured programming, a t least implicitly), but some excellent recent work (Dahl et al., 1972; Dijkstra, 1968) has gone far in demonstrating its viability and suggesting techniques which might aid in the enforcement of structured programming practices. The explicit incorporation of these notions in a management structure and approach to program construction has paid off quite handsomely (Baker, 1972; Mills, 1972). In essence, the idea of structured programming is to start with a clear and succinct statement of the desired effect of some program and then t o develop the detailed program by modularization and refinement. During the process of program development, a given part or module may be
PROGRAMMING AND PROGRAMMING SYSTEMS
47
divided into subparts or submodules, or a given part may be explicated by defining actual data representations and operations to implement the abstract objects and operations. The process is continued until a t some stage a version sufficiently detailed for machine execution is obtained. Once an executable version of the program is confronted with data, it may well be that one has to fall back and make different choices of representation or means of realizing certain operations in order to attain the efficiency required. With explicit modularization it is to be expected that this LLtuningl’ process-falling back to make more efficient choices-will be simplified. One issue that does not always come out clearly in the structured programming literature, but which is surely implicit in this approach, is that one wants to deal insofar as possible with the behavior which the program and its parts are to exhibit, and to suppress the detail of just how this is t o be accomplished, demanding of course that the (‘how” be consistent with (that is, correctly implement) the desired behavior. Our plan is as follows. Section 3 is devoted to a discussion of the various classes of users of a computing system: we will characterize three basic user classes and discuss the kinds of language and system facilities suggested by the needs and desires of each class. In Section 4 we will explore the particular needs of the “small” user and describe a language and system which seem quite appropriate for this group, calling on some actual experience with such users at Harvard. Section 5 is devoted t o a technical discussion of a particular language and system-the EL1 language and ECL system-which have been the basis of much of our work and experience in attending to the needs of certain medium- and large-scale users. The development of ECL began as an effort to implement an extensible language and system, but we now see it as providing a framework for developing various kinds of programming tools of the future. Section 6 describes our current research efforts to develop the tools which will aid in the production of medium-scale programs, the specializing of general program packages, the development of means for semiautomatically selecting data representations, and, finally, will help in dealing with “optimizing” and verifying the resulting programs. I n Section 7 we describe a collection of tools to create an appropriate environment for the development of complex or large programs. There is a bias in our view of modern computing which intrudes throughout the discussion-that the user of a computer, particularly one developing, debugging, documenting, and maintaining programs, approaches the computer via the medium of a terminal attached to an interactive system which has a reasonable file system for housekeeping the volumes of program text, program documentation, and data with which he deals. Although
48
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
only a minority of users approach the computer this way today, the trend is clear. To make use of a language and system of the sophistication we think necessary and feasible, operating in the “batch” mode is out of the question: the ability to view each session with the computer as a process of updating files-modifying a few things, inputting some new data or text, making some test to verify functionality or to gather performance data, and so on-is really quite necessary. To try to operate in this fashion with a batch-only system is, at best, very cumbersome. To conclude our introductory remarks, let us consider a n analogy which may provide a focus for one of the fundamental difficulties in programming, and especially in the development of a large program system. Suppose that we have two groups of people, one group assigned to design and construct a bridge over some river and another assigned to design and construct some program, say an operating system for a small computer. At some point prior to completion of their respective projects each group might well report that its project is 95oJ, complete. With the bridge builders, it is relatively easy to verify whether or not their estimate is reasonable-we simply go to the site and look for a bridge. Now it is pretty hard to fool someone as regards a bridge-either it exists or it does not, and expertise in bridge-hood is not really required to so assess the project. Of course the question of whether it is a reliable bridge or a cost-effective bridge is not easily determined by inspection, but that it is in fact a bridge, and a well structured one at that, is clear. With our group developing the operating system, determining that their estimate is reasonable is rather more difficult. Unlike “bridge-hood” there just is no notion of “operating-system-hood” with respect to which almost anyone could say “sure, it is about 95% complete.” And this is perhaps the heart of the matter: in order to develop successful, economic, and reliable software we need to develop the kinds of languages, tools, and facilities which, to the greatest extent possible, let us perceive what is going on in a programming project in much the same way a bridge builder does.
3. Classes of Programs
We want to discuss three basic classes of programs (or programmers) which have quite different profiles (and whose needs regarding just the appropriate language and system differ as well). Of course there are no hard and fast lines which permit such a division to be made for every particular undertaking, but we do think that the distinction provides a useful way of viewing the programming process. We classify programs as small, medium, and large in scale. By smaZZ we mean small both in size and
PROGRAMXING AND PROGRAJIMLNG SYSTEMS
49
in the amount of time devoted to the program development; typically there is a single person involved and the lifetime of the program produced is very short. By medium we again mean medium as regards size and time required. Here there might be more than a single individual involved and the lifetime of the product might be somewhat longer. By large we wish to include programs that are large in size, in amount of time needed to create it, and in the number of people involved. We also wish to include those which, irrespective of size, have a long lifetime and thus will require maintenance or modification, or both, for a variety of different computer systems. We will discuss these in turn. 3.1 Small-Scale Programs
Perhaps a good way to characterize this class of programs is as ‘(student” programs. It is not only students who write such programs, but the term has an appropriate connotation. For these programmers we think there are quite special requirements for a language and system. In most instances, what is important is not the efficiency (in terms of time or space) of the finished product, but the efficiency of creation. Affecting the “human” efficiency are such things as smooth facilities for the input, editing, and execution of a program; convenient means for combining several program parts into a whole; good facilities for testing and monitoring the resulting program; and so on. Currently, most people in this group confront the computer through the medium of a FORTRAN or BASIC system. This is not a very happy circumstance, since the systems they confront are seldom truly interactive (the editing may be, but the compile and execute stage seldom is). Furthermore, and perhaps rather more important, these languages are quite constrained in terms of what one can directly talk about. The programmer is essentially restricted to the world of integers and reals, plus arrays of, and arithmetic operations over, these entities. If one just happens to have a problem whose representation is naturally in terms of numbers, arrays of numbers, and the usual arithmetic operations, then things may be fine. But more and more users of the computer-particularly those who are not engineers or physical scientists-have problems whose natural realm has little to do with numbers, but that instead involve constructs like strings of text, graphs, or other rich data structures. To have to “shoehorn” such a problem into FORTRAN or BASIC guarantees that much of the real intent will be hidden under the programming language, rather than expressed in it, and that the possibilities for error will be large. Consequently, the probability that such a user will continue to turn to the computer for assistance decreases rapidly to zero. APL has been heralded as the solution for the small user. It has been the case that many who deal with an APL system (particularly if they arrive
50
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
after the usual frustrations with FORTRAN or BASIC) do find APL most attractive. Indeed, the availability of APL has resulted in a large group of people (often termed “APL nuts”) who find in APT, the ultimate fulfillment of their needs. While this may be so for those dealing strictly with the world of numbers, APL is surely less than ideal for the rest. There is an important lesson to be learned from APL, however, and that is the importance of structural simplicity and an excellent-smooth and comfortablesystem surrounding the language. 3.2 Medium-Scale Programs
Examples of programs which are medium-sized include many of those generally referred to as scientific and engineering calculations, as well as simulation models, statistical analyses, and a wide variety of so-called “data processing” applications. We think that, in terms of the facilities required, it is useful to divide the users of such programs into two subclasses: the first consists of those who have a truly unique program to develop and hence must “start from scratch;” thc second consists of those users who could take advantage of some existing program package-that is, they could specialize a general program t o their particular application. Examples of the latter group include those requiring matrix computations, solution of many kinds of differential and integral equations, use of a particular kind of statistical analysis, a particular instance of a general simulation, and SO on. For this group, the ideal facility might well be a program package with which one can interact to provide the specifics of the application at hand, the result being a program custom-tailored to the current needs. Indeed there exist a number of such program packages today, varying from reasonably trivial affairs which merely assemble a selected subset of a collection of subroutines, to such sophisticated packages as sort generators, which are highly parameterized and are capable of generating an extremely large number of quite efficient customized sorting programs. What is needed for this group of users is perhaps more and better program packages. As we shall discuss more fully in Section 6, it is our opinion that the developments of the very near future will provide us with the means to easily generate highly sophisticated packages for a wide variety of applications. Let us then return to consider the poor fellow who has a unique application (or who at least cannot get his hands on just the right package). Chances are that this user is not a professional programmer-his expertise probably lies in a particular application area and his use of the computer is probably sporadic. Nevertheless, he may have a need for efficiency in the resulting program which is far more crucial than would he the case for the
PROGRAMMING AND PROGRAMMING SYSTEMS
.5 1
small-scale user discussed above. If his needs are directly accommodated by numbers and arrays, and arithmetic operations on them, then the languages and systems available today probably provide a reasonable tool. If, on the other hand, the world which he wants to explore via his program is not that of numbers, but involves, for example, sets and set operations, simulation of an assembly line, developing a concordance, exploring some sort of network, routing pipes to minimize cost, and so on, he is not particularly well served by the conventional languages and systems : they simply do not permit him to deal directly with the objects with which he is concerned, but demand that he encode his world into that of numbers and arithmetic. That is, if he must use a language like FORTRAN or BASIC he may have serious difficulty in representing his world naturally, and/or he may be hard-pressed to attain the efficiency of representation and execution his budget demands without employing an arcane approach wherein the true nature of what he is manipulating is hidden in some collection of numbers, arrays, and arithmetic processes over them. That is, the only way our nonexpert can get the efficiency he requires, using the few techniques known to him, is to produce a very unnatural program. This is just the sort of consequence that runs contrary to the tenets of structured programming and produces a result that is difficult to understand or verify, and probably impossible to change or maintain. It is this individual for whom a structured programming system-one which provides data and operations natural to his application and sophisticated means for choosing effective representations through refinement-and experiment will be very important. We will return t o this in Section 7, characterizing a programming system that would be quite appropriate for this user. 3.3 Large-Scale Programs
At the present time, the world of large-scale programs divides rather neatly into several parts. First, there are the large business applications, which are normally programmed in COBOL. These programs, while large, are mostly quite straightforward, except for some small “core” or “nucleus” (handling the allocation of resources, directing data flow, and calling appropriate “transaction” modules) that is highly intricate and complex. Normally a large part of the problem with such programs derives from their sheer size and the attendant necessity for large amounts of bookkeeping of data names, module specifications, and so on. There are several things that seem t o contribute to tJhecost and difficulty of these programs. First, there is COBOL itself. While it is undeiiiably a widely used language and not likely t o be supplanted soon, the usual implementations of it thoroughly discourage modularity-a feature which, if the structured programming
52
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
advocates are to be believed, is at the heart of effective programming. A mild alteration of COBOL and more sensible COBOL systems could alleviate this problem. Another problem that is very important is that most COBOL programmers have available to them computers and systems which are simply too small for program deveEopment-the systems may have been carefully designed to be nearly optimal for the application intended, but are poor t o inadequate for program development. Again, some mild alterations of COBOL would permit COBOL systems to be developed which would alleviate these problems somewhat, and the availability of networks of computers, which permit one easily to employ one machine for development and another for production, will surely be an asset. There is presently underway a development effort to provide the mechanisms necessary for just this kind of distributive program development [called the National Software Works (Warshall, 1975)l. Another class of large-scale programs are those usually referred to as systems programs; examples would include operating systems, compilers, data management systems, transaction systems (e.g., airline reservations or banking) , and so on. Here, until recently, there has been very little help available (although a number of reasonable systems programs have been developed in FORTRAN, ALGOL, and PL/I); the current collection of “systems programming languages” does provide a not unreasonable linguistic medium, but the host systems in which these are imbedded are often weak, leaving the burden of interfacing, cross referencing, documentation, and the like to the user. Again, a system that provides programming aids for collecting information and keeping the records, which permits one to determine easily how he got where he is, what modules depend on what others, and so on, would help this user considerably. There is one particular application area which is not widely known but has recently come to our attention. It dramatizes the problems inherent in all large-scale programs. This is the area of programming Automatic Test Equipment (ATE). Modern Automatic Test Equipment typically consists of a minicomputer to which is attached a variety of sensors and sources (for example, digital voltmeters whose readings are input to the mini and voltage sources whose outputs are controlled by the mini). To test some piece of equipment, say a truck, the sources and sensors are attached to appropriate points and the mini emits probes and records data, the digestion of which results in a report that the vehicle is in fine shape or, for example, that the rings need to be changed. Other applications involve testing complex electronic equipment, space vehicles, controlling various industrial processes, and so on. What makes this problem particularly interesting, and may make language and system facilities appropriate for it applicable to largescale programming in general, derives from three things :
PROGRAMMING AND PROGRAMMING SYSTEMS
53
(1) the final program is to be run on a mini (the main point being that there is n minimal run-time environment) , (2) there may be many levels of specification of the test, each expressed in terms of different linguistic concepts, and (3) the same basic test may be applicable to several pieces of equipment which differ in minor respects. For example, to illustrate the second point, one aspect of a test may be described by a design engineer in terms of the truck’s “idling smoothly,” whereas the mechanic may conceive of it as a sequence of actions, involving starting the engine, getting the engine rpm to 5000, reading the manifold pressure, and so on. The instructions to the mini, however, would be something like a request to transmit a current from pin A to pin B for a certain period of time. Thus we find that a single test is expressed in the vocabulary of a variety of people and machines, and that the concepts included in the test must be traceable from one level of specification to another. With regard to point ( 3 ) , we find a similar situation, except that the variation in the test derives from minor variations in the vehicle being tested or the computer being used. The application thus seems to be a natural for exploitation of the goals of structured programming: the original program is given in terms of operations and data relevant to, say, trucks and their components, while the final refinement is a program administering a collection of voltage sources and sensors to provide data which are subjected to simple arithmetic massaging. Further, a given “high-level” program might have any number of different refinements, each tailored to a particular ATE and test environment. It would appear that this application demands a structured programming system. 4. Facilities for Small-Scale Programs
In this section we want to discuss the language and system facilities appropriate for small-scale programs. It is our thesis that the technology for providing such facilities for this domain of problems is already at hand. Indeed, we will presently cite some experience in using a particular system meeting these goals, the PPL System developed by Standish and his colleagues at Harvard (Standish, 1969; Taft, 1972) for student programming. Our experience strongly suggests that this system provides a much better medium for small-scale programs than the usual FORTRAN or BASIC systems, particularly in situations where the application is not naturally concerned with numbers and arithmetic. 4.1 Extensible Languages
First, however, we would like to make a brief historical comment about the work of the past several years carried out under the general rubric of
54
T. E. CHEATHAW, JR. AND JUDY A. TOWNLEY
“extensible languages.” During the mid-sixties, primarily as a reaction to languages like I’L/I, ALGOL-68, and LISP-2, a number of people started working on the notion of an “extensible” language. The idea was that, rather than throw every kind of data structure and operation which might be of use in some application area into one big pot, thus producing a very large language and probably necessitating a very large and expensive compiler, we should try to find some base or nucleus language and provide means for extensions of the data, operations, and syntax. One would thus extend from the base to provide a language specifically tailored t o each application area. The hope was that with such languages we could develop variants for any class of applications, but have a common system, common compiler, and so on, underlying all the variants. In each application area the user would “see” only the facilities appropriate for him and would not be burdened with the problem of just how to model his world in a language either confined by certain domains or committed to interpretations not necessarily consistent with his needs. We will not recite the history of the various attempts to develop extensible languages, but simply note that both PPL and the EL1 language to be described in the next section are results of this work, and further note that the work on these languages has resulted in a rather different view of the whole programming process, which will be discussed in later sections. 4.2 Experience with the PPL System
PPL was built on the foundation provided by APL; as a system, the P P L system resembles the APL system very strongly. Thus, program creation and editing, files of modules ( “workspaces”), ease of program change, elaborate debugging tools, and so on are all provided. The basic difference between APL and PPL is that the bias toward numbers and arrays has been eliminated, and the programmer is provided with a variety of basic data types (integers, reds, characters, text strings, booleans, symbols, and so on) plus the ability to construct composite data objects by employing homogeneous arrays (i.e., vectors) and nonhomogeneous arrays (called “structures”), possibly recursively. He can also define functions; new operators can be introduced, and any oper:ttor can have a variety of meanings, the particular one being dependent on the data types of its operands. Using PPL, it is very easy, for example, to introduce symbolic algebraic expressions and t,o redefine the usual arithmetic operators to produce symbolic results when confronted with symbolic operands, while retaining the ability to produce the usual arithmetic results when confronted with numbers. Now it is not our intention here to “sell” P P L ; it may well sell itself, and the reader is invited to explore the excellent
PROGRAllIRIING AND PROGRAMMING SYSTEMS
55
programming manuals available (Taft, 1972). Rather, our point is that this kind of language and system provides a distinct advantage over the likes of FORTRAN and BASIC, particularly if the application is nonnumeric. The evidence for this derives from the use of PI’L in an undergraduate course in introductory computing a t Harvard over the past several years. This particular course (labeled Natural Sciences 110) is aimed a t the undergraduate who is neither primarily a physical or mathematical scientist (or trying to become one) nor intends t o major in computer science. Some 300 to 400 students take this course each year, and the students’ majors range from history to social relations to biology. The examples employed for classroom instruction or homework include some primarily numerical applications, but the majority are taken from such applications as simple simulation models (e.g., of cell growth), trivial word-substitution translators (e.g., from English to French), and so on. Prior t o using PPL for the course, BASIC and FORTRAN were employed. When we started using PPL, we found it possible to take a deeper look a t computing, cover more material, and motivate the students much more strongly to employ the computer subsequently in various projects in their field of major interest. Indeed, since the introduction of P P L as the programming language, the number of students who demand some kind of follow-on course, where they can apply computers in their particular area of interest, has grown dramatically. While it used to be the case that almost all of them were probably happy when the course was over, some 15-20y0 now want to continue seriously using the computer. 5. The E l l language and
ECL
System
In this section we want to provide a general introduction to the EL1 language and the ECL system.’ We lead the reader down.this path of a seeming diversion for a couple of reasons. First, and primarily, we feel that the tools and ideas presented in the succeeding sections can best be understood and evaluated after the reader has seen ECL. Familiarity with the system permits the reader to better put the notions into perspective and to understand where the complexities lie. Second, having looked a t ECL, the reader will quickly see the kinds of biases with which we approach the problems of programming. For those who want a more detailed discussion of ECL, the ECL Manual (1974) is available. The ECL system is the programming system which implements the EL1 language and also includes numerous tools and facilities for program development, debugging, and execution. We will often use ECL to mean both the language and system, hopefully without ambiguity.
56
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
The EL1 language was first described in early 1970 (Wegbreit, 1970) and the ECL system was designed in late 1970; the system was first operational in late 1971. Since that time it has enjoyed considerable use at Harvard and elsewhere, with numeroils modifications and improvements being made during that time. 5.1 Some Preliminaries
Before launching into details of the language and system, let us present three of the basic goals we had for EL1 : (1) it would be possible to compile highly efficient machine code; (2) the language and system were to be machine-independent insofar as was sensible;aand (3) the language was to be as simple as possible, consistent with providing features which were deemed essential if advantage were to be taken of the usual machine facilities. Put another way, we did not try to minimize the number of constructs in the language (arguing that anything desired could be obtained by extension), but tried to strike a sensible balance between providing only a minimal number of constructs and providing less primitive constructs which were deemed to be susceptible of efficient implementation on modern computers. There are several basic features of the language and system which follow from these decisions and which are quite important to its understanding. The more important include the following. 1. The size of any data object (for example, an array) is fixed at the time that the object is created (allocated) and does not change throughout the lifetime of the object, There are means for achieving variadic behavior by extension (for example, flexible arrays or variable length text strings) and these will be discussed below. 2. The run-time storage consists of a stack and a heap.3 Those data objects which represent local variables in a block and actual parameters to a procedure are kept on the stack and all other data are kept in the heap. The run-time environment of an ECL program (that is, the collection of identifiers and associated values which have meaning when an expression is being evaluated) includes: (a) a set of variables, called the top-level variables, which may be thought of as globals, and (b) the set of variables introduced as locals in a block and as formals of a procedure. The variables a The only serious machine dependence is in the representation of, and arithmetic over, integers and reals; not employing the built-in machine facilitiea for these surely runs counter to (1) and if true machine independence is desired a common means of representing numbers and their arithmetic can be obtained by extension. That is, a zone of storage administered by a storage allocator and garbage collector.
PROGRAMMING AND PROGRAMMING SYSTEMS
57
in this latter set come and go as blocks and procedures are entered and exited and are kept on the stack; the global variables persist for a complete session with ECL and are kept in the heap. Given any particular identifier, say X, its value is found by searching back through the stack (i.e., for the most recent binding) and, failing to find X in the stack, taking its top-level binding. Thus, there is no lexical scope; scope is dynamic (much like LISP, except that there is no special handling of functions in ECL as there is in LISP). 3. The mode or data type of each data object is fixed at the time the object is created (allocated) and remains fixed throughout the lifetime of the object. This property distinguishes EL1 from most other extensible languages (as well as from ALGOL-68, which permits a union data type). We think that there are overwhelming arguments for it, however. For one, compilation of efficient code is considerably simpler (particularly for generic operators-operators which are defined for a variety of operands). A second advantage is that assignment can be done without an act of storage allocation (or implementing assignment via sharing, surely a poor idea). In a more subjective vein we can argue that a user is often more comfortable (or makes fewer errors) when he can state just what kind of beast a given identifier is to represent, and is then admonished if he tries to break this rule. In any event, one can achieve “typeless” behavior for variables by extension, so that the base system being “hard typed” is no real restriction.4 4. The storage mechanisms are “fail safe” in the sense that one can never have a pointer to some object which has ceased to exist (and whose original memory space thus has garbage in it) ; both ALGOL-68 and PL/I are not safe in this sense, surely a severe defect. 5. The system has an interpreter and a compiler which are fully compatible. That is, the only reason for compiling is simply to obtain a representation of an expression which permits more rapid evaluation-one need not compile a program in order to execute it. Put another way, one compiles some module when he thinks that the increased efficiency of its subsequent execution will pay for the cost of compiling it. 6. All expressions (often referred to as “forms”) in the language have a value, and when a value of some particular sort is required, any expression evaluating to a value of that sort will do. In particular, there are procedure‘ A somewhat more technical argument is that to achieve typeless behavior the system must, in effect, employ pointers, automatic storage allocation, and garbage collection to manage storage for all objects whose size may vary from time to time. There are numerous schemes for doing this, differing widely in flexibility and efficiency. With ECL, one is free to implement the scheme of his choice to accommodate typeless data.
58
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
valued and mode-valued expressions in the language, and these are, in a very strong sense, no different from, say, integer-valued expressions. In this regard the language very much resembles LISP. 7. There are three different representations for executable code (procedures), referred to as SUBRs, CEXPRs, and EXPRs. SUBRs are the built-in procedures employed by the interpreter (e.g., for addition or constructing a mode value) ; they are so constructed that highly efficient linkage to them is possible (i.e., low procedure call overhead). CEXPRs are “compiled expressions,” and are the result of compilation. Again they are (machine code sequences which are) relatively efficient. Finally, EXPRs are an interpretable internal representation of a program ; program text input to the system is parsed into this internal representation and the compilcr’s business is to translate from this form of an expression into the CEXPK. form. With these preliminaries out of the way, let us now proceed to a description of the language. 5.2 The Basic Language
Variable names are constructed in the usual fashion as a sequence of alphanumeric characters (with the initial character being alphabetic) and also as a sequence of “operator” characters terminated by a “nonoperator” (frequently ~1 blank), Examples are? A ALPHA ZllG
+ + + . , . ?*?
We will presently discuss how variables are introduced into a program. There are several primitive or built-in modes; with each there are certain character strings representing the constants of that mode. Some examples f 0110w . Mode Sample Constants
INT REAL ROOL CHAR NONE REF MODE
1 36 9273 1.0 1.5EG TRUE FALSE
%A %% %. NOTHING NIL INT REAL NONE MODE
Note particularly that MODE is a mode and there are several constants of 6 We will employ upper case words to represent lexemes in EL1 ; the system permits both upper and lower case but restriction to upper case here will hopefully render the text more readable.
PROGRAMMING AND PROGRAMMING SYSTEMS
59
that mode; there are also expressions evaluating to values of mode MODE, as we shall see presently. The mode NONE is included to permit expressions which evaluate to nothing (e.g., the constant NOTHING) ; objects of mode REF are essentially pointers to some data object (and in that sense NIL is the pointer to NOTHING). There are several built-in mode constructors (procedures which evaluate to modes). Let m, m l , . . . , mk ( k 2 1) be expressions evaluating to r ~ l e s ; il, . . . , i k be identifiers; and n be an expression evaluating to an integer. Then
STRUCT ( i l : m l , . . . i k : m k ) is a mode; an object of this mode has k components of modes m l , . . . , mk, respectively. If X names an object of this mode then X.ij and X[j] select the j t h component (and permit the use of or assignment to this component). The expression
VECTOR (n, m) is a mode and an object of this mode has n components, each of mode m. If X names an object of this mode then X[j] selects the jth component.
SEQ(m) is also an expression evaluating to a mode. An object of the resulting mode, when constructed, will have some fixed number of components, all of mode m (patience, we will describe the means presently). If X names an object of this mode, then again X[j] selects the j t h component. Any mode which includes a component of SEQ(. . .) mode is said to be length unresolved, the resolution of the length always being made a t the time an object of the mode is created (and therefore fixed for the lifetime of the object). The reason for employing both VECTOR(. . .) and SEQ(. . .) is that the internal representation of an object of mode VECTOR(n, m) requires exactly n times the space required for an m, while that for SEQ(m) where, say, the length is resolved to be n, requires n times the space required for an m plus a word to store n. Thus VECTOR(5, BOOL) requires 5 bits, while SEQ(BO0L) requires a t least one word, no matter how many BOOL’s are in the sequence.
PTR(m1, . . . , mk) is a mode, and an object of this mode points to an object whose mode is (exactly) one of ml, . . . , mk. Thus PTR(m1, . . . , mk) can point only to an m l o r . . . or an mk, while a REF can point t o anything. One reason for employing P T R ( . . .) in addition to REF is so that the programmer can restrict the class of objects pointed to and be assured that any violation of
60
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
this will result in an error indication. An additional technical reason for employing both PTR(. . .) and REF is that the internal representation of a R E F requires two addresses (one being that of the object pointed to and the other indicating its mode), while a PTR(m1, . . , mk) requires an address (of the object pointed to) plus log 2 ( k 1) bitss to indicate which type of object is currently being pointed to. In order to construct objects of some mode the built-in procedure CONST is employed. Some examples:
+
.
CONST(SEQ(1NT) OF 1, 2, 3 ) constructs a sequence of three integers, namely 1, 2, and 3.
CONST (STRUCT (RE :REAL, IM :REAL) OF 0.0, 1.0) constructs a structure with two components, both reds, the first (second) having the value 0.0 (1.0).
CONST(VECTOR(3, INT) O F 1 , 2 , 3 ) constructs a vector of three integers. There are default values for objects of every mode, and failure to initialize the value of an object results in its value being the default value for that mode. For the built-in modes these values are 0 for INT, 0.0 for REAL, FALSE for BOOL, the null character for CHAR, NOTHING for NONE, NIL for REF, and NIL for MODE (for technical reasons). Thus
CONST (VECTOR (3, INT) ) constructs a vector of three components, each zero.
CONST (SEQ(INT) SIZE 62) constructs a sequence of 62 integers, each having value zero. Without the SIZE indicator, the length of the sequence would have been zero.
CONST (PTR (INT, REAL) ) constructs a pointer whose value is NIL, but constrained to point to an object which is either an integer or a real.
CONST (STRUCT (L: INT, V :SEQ(REAL) ) SIZE 4) 6 This is not quite true. The heap in ECL is paged and some pages are restricted to objects of a certain mode; in this case themode of the object pointed to can be determined from its address, and thus fewer bits may be required to represent certain F'TR(. . .)'a.
PROGRAMMING AND PROGRAMMING SYSTEMS
61
constructs a structure with two components; the first (L) has value zero and the second (V) has as value a sequence of four reals, each having value 0.0. Multiple selection is possible; for example, if FOO is the value produced by thc preceding CONST then
FO0.L FO0.V FOO.V[l] select, respectively, the value of L (the second component of FOO), the value of V (the second component of FOO, a sequence of four integers), and the value of the first component of the second component of FOO. Blocks similar to those in ALGOL-60, ALGOL-68, or PL/I are included, and consist of a sequence of statements separated by semicolons and bracketed either by BEGIN and END or by [>and (3. Declaration statements may be included, and the result of a declaration is the creation of a local variable whose lifetime coincides with (control being in) the block, as in the ALGOLs. However, unlike most programming languages, declarations are not constrained to appear first, but may be intermingled with other statements. Declarations may, optionally, include an initial value for the variable declared; in the absence of this the default value for objects of that mode is used. Some declarations (statements) are given below.
DECL I :INT the local variable I is introduced and initialized to zero.
DECL X:REAL BYVAL Z f l . 0 the local variable X is introduced and initialized to the current value of Z incremented by one.
DECL V:SEQ(INT) SIZE 10 the local variable V is introduced and initialized to be a sequence of ten integers, each initialized to zero. Blocks also differ from the conventional ALGOL blocks in the sense that they provide the mechanism for conditionals. That is, a block may include a statement of the form
P*e where p is some expression (e.g., a block) evaluating to mode BOOL and e is an arbitrary expression. The evaluation of p e proceeds aa follows: p is evaluated and if it does not produce a value of mode BOOL an error is
*
62
T. E . CHEATHAM, JR. AND JUDY A. TOWNLEY
signaled. If it does produce a value and its value is TRUE then e is evaluat.ed, the block is exited, and the value of the block is that of e ; otherwise ( p evaluates to FALSE) e is not evaluated and evaluation continues with the next statement. If control “falls off the END” of a block, the value of the block is that of the last statement of the block. There is also a “nonexit” conditional (statement) of the form p-+e
whose evaluation is as follows: p is evaluated and if the result is not of mode BOOL an error is signaled. Otherwise, according as p is TRUE or FALSE e is or is not evaluated and, in either event, control goes t o the next statement in sequence. Consider the following mode-valued expression : BEGIN SCALAR 3 INT; DRCL N:INT BYVAL [) DOUBLE KNOWN * VECTOR (N,INT) ; SEQ(1NT) END
* 20; 10 (3;
The value of this block, depending on the values of SCALAR, DOUBLE, and KNOWN is SCALAR
DOUBLE KNOWN
TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE
TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
Value INT INT INT INT VECTOR (20,INT) SEQ (INT) VECTOR( 10,INT) SEQ(1NT)
Iteration is based on a special kind of block, the repeat block, which has the form REPEAT s l ; . . . ; sn E N D where the s j are statements (including declarations). When control “falls off the END” of a repeat block, the evaluation of the whole block is
PROGRAMMING A N D PROGRAMMING SYSTEMS
63
repeated. Exit (via a conditional) terminates a repeat block. An example: BEGIN DECL V: REAL BYVAL 10; DECL VN:REAL; REPEAT VN t (X/V+V)/2; ABS(VN-V) LT 1.OE-6 VtVN END
VN ;
computes an approximation of the square root of X by the classical Newton-Raphson method. A repeat block may be prefixed by a for-clause which provides for conventional iteration; for example,
+
FOR I TO N REPEAT S t S X[I] END sums ( assuming proper initialization of S) the components of an array X. One may also control the initial value and step size of the iteration variable. Thus a repeat block may be exited either through evaluation of an exitconditional or the exhaustion of the values prescribed for the iteration variable. The variable I has as lifetime the scope of the repeat block. Note that the WHILE p DO s and DO s UNTIL p iterators are modeled by REPEAT NOT p
* NOTHING; s END
and REPEAT s; p
* NOTHING END
5.3 Procedures in ECL The general form for constructing a procedure is: EXPR(zl:ml,
. . . , zn:mn; mr) e
where the zl, . . . , zn are the formal parameters, constrained to have modes ml, . . . , mn, respectively. mr is the result mode and e the body. The application of this procedure to a set of actual parameters, say al, . . . , an, has the same effect as a block which consists of n declarations of $1, . . . , zn of modes ml, , . . , mn initialized to al, . . . , an, followed by the evaluation of e ; the result of this evaluation (insured to be of mode mr) is the result of the procedure.
64
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
EXPR (X: INT; INT) X + 1 which has as value “that procedure which adds one to its integer argument.”
EXPR(X:INT, B:BOOL; INT)[)B
=+ X + l ; X-l(]
is “that procedure which adds (subtracts) one to (from) its first argument according as its second argument evaluates to TRUE (FALSE) .” To accommodate procedures which may take a variety of argument types (for example, the procedure for addition that we might want to define for, say, integers, reals, and complex quantities), there is a further mode, namely,
ONEOF(m1,. . . , mn) This says, in effect, that such an object, when created, is (exactly) one of an ml, . . . , or an mn. There are no actual objects of this mode; that is, the declaration
DECL X:ONEOF(INT,REAL,F00,FUM) BYVAL Z will have the following result: Z is evaluated, and if its mode is none of INT, REAL, FOO, or FUM an error is signaled, Otherwise, the declaration is exactly as though one had declared
DECL X:m BYVAL Z where m (one of INT, REAL, FOO, FUM) is the mode of the result of evaluating Z. Thus ONEOF(. . .) is specifically not a mechanism for mode union, as is present, for example, in ALGOL-68. In ECL one may simulate mode union in any number of ways; the PTR(. . .) mode provides mode union except for a level of indirection which is quite straightforward to suppress linguistically. The mode ANY allows for the handling of an object of any mode whatever. Noting that MD(. .) is the built-in function returning the mode of its argument and LENGTH (. . .) is that function returning the number of components of an object, consider the following procedure :
.
EXPR (X:ONEOF(INT,SEQ(INT)), Y: ONEOF (INT,SEQ(INT) ) ;ONEOF (INT,SEQ(INT) ) ) BEGIN MD(X) = INT AND MD(Y) = INT * X + Y ; MD(X) = MD(Y) AND LENGTH(X) = LENGTH(Y) * BEGIN DECL RES:SEQ(INT) SIZE LENGTH(X) ; FOR I TO LENGTH (X)
65
PROGRAMMING A N D PROGRAMMING SYSTEMS
REPEAT RESCI] END; RES; END BREAK( ) ; END
+--
X[I]+Y[I]
Suppose this procedure is applied to two arguments, A and B. Then, if A and B are both integers, their sum results. If A and B are both sequences of integers and of the same length, then the result is another sequence whose components are the sums of corresponding components of A and B. Otherwise the procedure named BREAK is called with no argument. (BREAK is a built-in procedure whose effect is to terminate computation, preserving the environment in which it was called-the environment in this instance including the local variables X and Y bound, respectively, to the values A and B). 5.4 Case Statement
Significant improvement in clarity of the above procedure can be obtained through the case statement. The case statement permits selection of code bodies on the basis of an arbitrary predicate applied to the actual arguments to CASE and the form(s) inside the square brackets in the CASE body. The example above makes a choice on the basis of mode, but any type of check is possible. We might rewrite the preceding procedure body in the following way with formal arguments to CASE being MD(X) and MD(Y) and predicate, EQUAL. CASE[MD(X), M(Y)] (EQUAL) [INT, I N T I =+ X + Y ; [SEQ(INT), SEQ(INT)] LENGTH(X) = LENGTH(Y) BEGIN DECL RES :SEQ(INT) SIZE LENGTH (X) ; FOR I TO LENGTH(X) Y[I] END; REPEAT RES[I] +- X[I] END: TRUE BREAK( ) ; END;
*
+
Each bracketed list of actual values to be tested against the bracketed formals may be supplemented by an arbitrary predicate [as in LENGTH(X) = LENGTH(Y) above]. Or the application of the predicate may be defaulted (as in the last line of the CASE body above). Thus
66
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
the EL1 case statement is very general, yet it provides sufficient structuring to enhance readability and assist in efficient compilation and code selection. The more standard case statement may be obtained in EL1 by the following construction : CASE [I] [l] + A l ; C!2] * A2;
...
TRUE + OUT\OF\BO JNDS (I); END; where we take advantage of the fact that EQUALS is the default predicate for CASE. 5.5 Extended Modes; User-Defined Mode Behavior
The final feature of the language that we want to discuss is the extended mode facility (also referred to as the user-defined mode facility). This facility is unusual and quite powerful; it constitutes the basis for many of the sophisticated extensions to which we have alluded earlier and which we shall discuss in later sections. The notion of mode, as discussed thus far, provides for a storage representation for data objects as well as for certain procedures for manipulating objects of that mode. Thus, if X has, say, the mode m = STRUCT (I :INT, B: BOOL) than we have the following expectations : 1. An assignment, X + e, is meaningful exactly when e evaluates to an object of mode m,in which case that value (since all objects of the same mode are represented in the same way) replaces the old value of X (in storage). 2. Selection of either of the two components of X is defined, with X.1 or X[1] selecting the first and X.B or X[2] selecting the second. Any other selection (e.g., X.FOO or X[lOO]) is undefined and will result in an error. 3. Generation (or construction) of a value of mode m is possible in three ways : (a) default generation [in response to CONST ( m )3 yielding an m whose components are the default values 0 and FALSE, respectively; (b) Construction out of its parts, that is of an integer and a boolean [e.g., CONST(m OF 6 , TRUE)]; and (c) construction as a copy of some value of mode m [e.g., CONST(m BYVAL FOOP]. 4. There would be no automatic conversion of a value of mode m in the event that some value of that mode were supplied where a value of some other mode was required. 5. Some default print format would be provided for a value of mode m.
PROGRAMMING AND PROGRAMMING SYSTEMS
67
The thing that modes do not provide is any finer control over the behavior of an object of that mode than indicating, as in the above example, that the data object is a pair, one component of which is an integer and the other a boolean. If we are truly to represent the behavior of some object, more is needed; in particular, we require an ability t o separate the issue of the underlying (memory) representation from the intended behavior. By behavior we mean such things as the meaning of assignment, selection, generation, conversion, and printing. For example, were we to represent an integer constrained to range between 0 and 100, we might well employ I N T as the underlying representation, but we also need to insure that the behavior (the integer lies in the range [0, 1001) is guaranteed. We would want t o make sure that assignment of a value outside that range is not permitted, for example, and that instances of the “small integers” cannot be generated which violate the condition. Another example is that of representing a rational number. Here we might well take a pair of integers, the numerator and denominator, as the base or underlying representation, but the behavior we might wish t o insure would be that induced by the constraints that (a) zero have a unique representation (e.g., numerator of zero and denominator of one), (b) the denominator be positive (so that the sign is carried with the numerator), and (c) that the numerator and denominator be reduced to lowest terms. Among other advantages this constraint gives us a unique representative of the equivalence class for each rational value. T o achieve this behavior and t o provide certain niceties, we might want to provide the following kinds of facilities: ( a ) that generation of a rational from its parts (the values for numerator and denominator) insure the constraints (e.g., construction from the two parts 2 and -4 would result in a numerator of - 1 and denominator of 2, etc.) ; (b) that assignment of either a rational or an integer to a rational be permitted; (c) that printing of a rational use some particular format; (d) that to protect some imagined group of users, selection of components not be permitted; and (e) that conversion of a rational occurring where an integer or real is required be done in the obvious manner. One accomplishes these kinds of things in ECL by employing the extended mode facility. Defining an extended mode, say that for the rationals discussed above, involves supplying three things : (1) an underlying mode to be employed for the basic (memory) representation; (2) an identifier which names the extended mode; and (3) specific functions for conversion, assignment, selection, printing, and generation in cases where those automatically supplied by the system (as discussed above) are not to be used. Thus t o implement rationals as described above we could provide the following: ( 1 ) the mode STRUCT(N:INT, D :IN T ) to be used for the
68
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
underlying representation; (2) the identifier RATIONAL to be used to name the extended mode; and (3) the following procedure to be used for assignment (just to give one example) : EXPR(L :RATIONAL, R :ONEOF(RATIONAL,INT); RATIONAL) CASE[MD(R)] [INTI =+[)L.N + R ; L.D +-- 1 ; L (1; [RATIONAL] LIFT(LOWER(L) t LOWER(R), RATIONAL) END Some comments on this procedure are in order. 1. The assignment function will always be supplied two arguments, expressions on the left- and right-hand sides of the assignment operator; the mode of the left-hand side will be RATIONAL (that is, this particular function is called exactly when we are evaluating an assignment whose left-hand side has mode RATIONAL) and the modes permitted for the right-hand side are the users choice (here RATIONAL or I N T ) . 2. The behavior when an I N T occurs on the right-hand side, is to set the numerator to that integer and the denominator to 1 (returning the left-hand side, L, as the result of the assignment is consistent with systemdefined assignment). 3. The behavior when a RATIONAL occurs on the right-hand side is simply to do the assignment. Note, however, that we could not have written [RATIONAL]
* L +-
R
as this would cause a call on the assignment function for RATIONALS, inducing an endless loop. Rather, we lower the left- and right-hand sides; that is, LOWER(L) and LOWER(R) are the same quantities as L and R but their modes are STRUCT(N:INT,D:INT), t,he mode of the underlying representation, for which system assignment can be used. Once the assignment is made [LOWER(L) t LOWER(R)] resulting in an object [LOWER(L)] whose mode is STRUCT(N:INT,D:INT) we Zijt the mode to be that of the extended mode, RATIONAL, via LIFT(LOWER(L)
t
LOWER(R), RATIONAL)
which simply changes the mode from STRUCT(N: INT,D :INT) to RATIONAL, again without affecting the value. The situation as regards the other functions which might be provided is similar [for example, the selection function is called when any selection is encountered and it takes two arguments, the object and the selector
PROGRAMMING AND PROGRAYMING SYSTEMS
69
(an integer or the field name), and so on]; further details are given in ECL Manual (1974).
6. Aids for the Nonexpert Programmer
Having introduced ECL in the previous section, we can now go on to describe some of the current efforts, using ECL, to develop mechanisms for the class of users we associated with medium-scale programs. We will turn to the problems of the construction of large progr%msin the next section. Recall our description of programs which might be termed “medium” in scale, as those involving, for example, engineering, scientific, or mathematical computations, statistical analyses, certain data-processing applications, and so on. In many instances we imagine that programs which come close to meeting the needs of this group already exist. The problem is that when they do not quite “fit,” it is usually not a simple matter to transform or tailor them. We belie-se significant gain can often be achieved by providing general program packages for, say, solutions of linear or differential equations, or matrix computations, which can then be specialized by the engineer or mathematician for the particular type of data and processing appropriate. This is only feasible if (1) the programs are written at a high enough level that their meaning is clear to a nonexpert programmer and (2) they are susceptible to fine tuning so as eventually to produce efficient programs. 6.1 Tailoring and Packaging Software
The first problem logically to be solved is the development of a language or languages which provide a sufficiently rich linguistic medium. This is a problem which a number of people and conferences (see Proc. Symp. Very High Level Lung., 1974) have addressed, and which we believe is essentially solved. Once a very abstract program has been written, however, representations for the data must be carefully chosen. We believe that through dialogue with the user about his data and inspection of the program, this choice can be made semiautomatically. Given information about the data and perhaps some sample input, we believe that a system is able to make quite a good choice mechanically from an available collection of representations (Low, 1974). To expand on the idea a bit, suppose we consider a domain such as sets or matrices. It is straightforward to develop notation and the basic objects and operations appropriate to the domain (for example, via simple extension of EL1). It is also possible to construct a library of underlying
70
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
representations for the objects. Each representation and basic operation upon that representation can be given a space and time cost figure and some information regarding applicability. Then, given a particular collection of programs operating in this domain, we gather information about the way the data are used. We usually require assistance from the programmer in this phase. Providing facts about the data to be processed is not always a simple matter. There are systems which solve this problem by requesting that a stream of parameters be provided by the user to characterize the data. But this is surely not the ideal solution. The system could engage the user in a question-answering session. The answers would essentially lead the system through the search tree of possible representations. The user may also be able t o provide information via predicates or simple assertions, specifying such things as the density and range of dimensions of the matrices. The system still may not be able to come up with what appears t o be an optimal solution, implying that further experimentation with actual data and some measurements may be needed. An analysis is made of the program to determine which basic operations it employs. For example, it may be reasonable to choose several different representations for the same object being used in a large program if functionally different uses are made of the object, but account must be taken of the cost of multiple copies or the transformation from one representation to another. In certain settings knowledge about the data could greatly affect the eventual algorithm. We could remove certain loops, for example, if the matrices were sparse, or we could eliminate membership checking if a set union operation never attempted to add a redundant member. If possible, advantage should be taken of the information, and the algorithm tailored correspondingly. All this is absorbed and used in the selection of the representations and the results are presented to the user in a meaningful manner for further improvement if necessary.
6.2 Program Reduction: Closure Finally, given satisfactory choices for the data representation, the program is viewed in its extended form, with each step broken down into the basic operations with an eye toward “optimization.” Here we mean by optimization a source-to-source process of simplification and reduction. Our name for it is “closure.” Given information about the representation (mode) of the data, facts about other variables (e.g., the size of sequences), an indication of which arm of a CASE statement will be selected, and so on, the closure mechanism tries to eliminate checks and reduce or simplify a program. The following are the kinds of simplifications which closure can make.
PROGRAMMING AND PROGRAMMING SYSTEMS
71
1. The call of a constant procedure applied to constant arguments can be replaced by the result of the application. 2. A procedure call can be replaced by the equivalent block, consisting of a declaration of each formal initialized to the actual, followed by the procedure body. 3. The call of a constant generic procedure applied to arguments with known constant modes could be replaced by a block which included declarations of the formals initialized to the actuals, followed by the particular arm of the CASE body. 4. A declaration like
DECL X:ONEOF(. . . , m, . . .) BYVAL Z could, if the mode of Z is known to be m, be replaced by DECL X:m BYVAL Z 5 . A declaration like DECL X : m BYVAL Z where Z is known to have mode m, could be eliminated, if X is never modified, by substituting Z for occurrences of X. 6. The block [) . . . ; TRUE =+ e ; . . . (1could be replaced by [) . . . ; e (1and the statement TRUE + e by e ; similarly the statements FALSE * e or FALSE + e could be eliminated. 7. If a block contains a single statement, the block can be eliminated. 8. Unnecessary computations (of the sort CONST(m O F a, b ) [l] if it is side-effect free) could be eliminated and common subexpressions removed. In actuality, closure is not just a straightforward process of doing substitutions followed by simplifications, as there are a number of rather subtle questions which arise (Wegbreit, 1973). The idea, however, is quite straightforward, and a simple closure mechanism is currently available in the system and is always used prior to using the code generator. ECL has a considerable impact on the work described above in several ways. The ability to defer commitments to particular underlying representations and delay describing how a procedure is to achieve its results is most helpful in writing the initial, very high level programs. The generic modes and CASE construct permit the construction of clear, clean programs. Yet given information about actual arguments, the CASE statement permits
72
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
simplifications more readily than general blocks. With hard-typing it is possible to determine exactly how much space an object will take and what its accessing privileges will be. The mode behavior routines provide a powerful, yet concise way of defining the behavior of objects. They also provide a way of neatly packaging the relevant information about a particular representation, considerably simplifying the process of data representation selection. And, of course, the ability to manipulate programs and extend EL1 to include the concepts of other domains is fundamental.
7. Aids for the Production of Complex Programs
For the class of nonexpert programmers who require unique, yet efficient and readable programs, often contending with nonarithmetic objects, and for that large class of system builders (compilers, intricate transactionoriented systems, and so on), we envision yet another collection of tools, to provide a “structured programming environment.” By the use of the word complex in the title we mean to imply that there may be considerable trial-and-error and experimentation in the development of the program and that the road t o an adequate solution may not be a t all clear from the start. Of real assistance to these two classes of users are mechanisms which rccord the names and representations chosen t o model each concept in the program, the subdivision of the problem into its logical components, a history of the refinements of the program and data objects, the documentation to be associated with each module of the program, cross-referencing facilities, and so on. As we noted above, much has been written recently lauding the principles of structured programming. Perhaps the greatest impact of structured programming t o date is the increased attention to, and general acceptance of, the need to create clear, clean (even if sometimes slightly less efficient) programs. This practice has led t o some remarkable events. People are actually able to pick up the programs of others and understand and debug them. In fact, usually far fewer errors are made in the first place. The disappointment with some of the proposals acclaimed as good structured programming practices is caused by a tendency to be concerned with surface niceties (e.g., syntactic constraints) rather than substantive enhancement of the mechanisms available t o the programmer. Other efforts (Baker, 1972) which have produced very impressive results are of a primarily administrative and people-structuring nature. But the problems being tackled by this group are usually quite straightforward. Administrative aids fall woefully short when the problems are not of this sort, when the search space for solutions is very large, and the organization of the solution
PROGRAMMING AND PROGRAMMING SYSTEMS
73
is not clear from the start. And in many programming situations the world is complicated in just this sense. We intend to describe in the rest of this chapter some tools and facilities for improving the lot of those with complex programming problems. An important factor affecting the ease of implementation is the ability to derive a sufficiently abstract version of the problem. This implies the availability of a rich linguistic medium as well as an ability to delay making commitments about the details of function definition and data representation. Thus the first step in program creation is to derive an understanding of the intended behavior of the functions and data and to understand the constraints and be able to make both explicit. Actually, in addition to having the information explicit, we would like a programming system to be able to check decisions made in the program against our stated constraints. These statements would also serve as the basis for documentation of the program. The efficiency of a program (and this is not restricted to just these two groups of users) is often dependent upon the choice of data representation. Arriving at an appropriate choice, however, is seldom a straightforward matter; rather it may require substantial experimentation. Inherent in the notion of experimentation is a collection of tools for the analysis of a program. We do not mean simply a mechanism which takes some special action each time a function is entered, recording the values of the function’s arguments and the value of the result. Rather, we expect information about the behavior of the data, profiles of its dynamically changing size, monitoring of “activities” at a finer level than function call, and a characterization of the interdependence of activity and data. (Activities here might be selection or alteration of a component of a data object, updating a file, adding a node to a tree, and the like.) These monitoring features are certainly feasible, though not necessarily trivial. For instance, detecting an instance of a monitored activity which bridges several functions is difficult. One can argue that perhaps this particular problem would arise infrequently, or not at all, if the program were well-structured. Even more difficult,however, is deriving a handle on the effects the various representations of the data and function specification have on each other. But this is not an insoluble problem by any means. Another feature implied by the notion of experimentation is the ability to easily change decisions made about the data and routines. A combination of mechanisms and a history-keeping facility are key concepts here. When in the business of improving programs it is particularly useful to be shown every place that a change will have an effect. Thus we should like to provide a programmer with information about all occurrences of the relevant concept (whether data reference or routine) in the program-the
74
T. E. CHEATHAM, JR. AND JUDY A. TOWNLEY
location of the occurrence and the effect of the change. The effects of a change (particularly in data representations) are usually spread throughout a program, contrary to the beliefs of most structured programming advocates. The intention here, however, is to present the effects as a unit and at least preserve the appearance of locality of effects. In addition, with assurance of all such occurrences being detected and information about the effects, validity of the change can be checked. A general pattern-replacement facility, possibly triggered by predicates, would be of assistance in making the changes to a program mechanically. We would want the history of the decisions and changes, as well as the original information about the behavior of the data and routines and constraints among them recorded. If we maintain a data base of all this information, we could also store in it results of experimental runs for future reference. A few words about the nature of the constraints we are talking about may be useful at this point. A constraint might indicate the (maximum) dimensions of an array or the search time permitted to find a particular element. It may be necessary to update an object in a fixed amount of time or constrain two data objects which reflect opposite states of the program to be updated simultaneously. Such relationships between variables would routinely be recorded in the data base. Relations like “is used by” or “uses” are important in determining the effects of a change and provide assistance in the debugging and maintenance of a program. A useful way of encoding constraints on an object is via its mode (and the extended mode facility described in Section 5.) Additional information which can be used for verification exists in the program : information, for instance, about the modes of the objects used in calls to a procedure or the use of a result of a procedure. The information can be derived mechanically and may suggest additional constraints upon the procedures or data: for example, that certain procedures must be defined for the types of objects used in the various calls. The information about these new constraints can be recorded in the data base and brought to the attention of the user for verification. Some of the verification can be done mechanically, using the knowledge already in the data base. TOsummarize, we have proposed a number of features to assist in the production of complex programs, some of which already exist in ECL, others of which are being developed. These include: 1. A mode-behavior definition facility which allows a user to define the behavior of objects of a certain mode by specifying particular assignment, conversion, generation, selection, and printing functions which are to be called when an object of that mode is involved.
PROGRAMMING AND PROGRAMMING SYSTEMS
75
2. A general “rewrite” facility which permits convenient refinement of certain expressions, essentially by conditional macro replacement. 3. A “checker” which performs such tasks as verifying, where possible, that the modes of the arguments to a procedure are correct, finding all occurrences of use of some concept (identifier, mode, and so on), verifying constancy of certain identifiers or expressions over some scope, and so on. 4. A collection of facilities which aid in the development of appropriate environments for the execution of a module or collection of modules. 5. A variety of metering and measuring facilities which may be employed in experimental executions of modules to develop the statistics which will direct the user’s choice of efficient representations for various data structures and operations. 6. An audit trail mechanism, particularly useful in the debugging and maintenance of programs, indicating the who, why, and when for changes made t o a program. 7. A collection of testing aids, including a monitor to check that all parts of the program have been executed during a test and to provide some visual assistance in determining the flow of control. The results of various test runs should also be maintained for purposes of comparison with later tests after changes have been made.
Many of these facilities already exist and we are engaged in an effort to bring them together to form a coherent “programming system.” But these kinds of facilities are just indicative of the type of environment we perceive as being of substantive assistance in the process of developing programs. REFERENCES Baker, F. T. (1972). Chief programmer team management of production programming. IBM Syst. J . 11, 56-73. Cheatham, T. E., Jr., and Townley, J. A. A proposed system for structured programming. Center Res. Comput. Technol. Rep. Harvard University, Cambridge, Massachusetts. Cheatham, T. E., Jr., and Wegbreit, B. (1972). A laboratory for the study of automating programming. PTOC. AFZPS 40, 11-21. Cheatham, T. E., Jr. et al. (1971). Papers prepared for the international symposium on extensible languages. T R No. 12-71, Center Res. Comput. Technol., Harvard University, Cambridge, Massachusetts. Dahl, 0-J, Dijkstra, E. W., and Hoare, C. A. R. (1972). “Structured Programming.” Academic Press, New York. Denning, P. J. (1974). Is “structured programming” any longer the right term?. SIGPLAN 9. Dijkstra, E. W. (1968). A constructive approach to the problem of program correctness. BIT 8, 174-186.
76
T. E. CHEATHAM, JR. A N D JUDY A. TOWNLEY
Earley, J. (1974). High level iterators and a method for automatically designing data structure representation, Memo. ERGM425, Electron. Res. Lab., Univ. California, Berkeley, California. Gottlieb, C. C. and Tompa, F. W. (1974). Choosing a storage schema. Acta Inform. 3, 297-319. Hoare, C. A. It. (1969). An axiomatic approach to computer programming. Comrnun. ACM (Ass. Comput. Mach.) 12, 39-45. Knuth, D. E. (1974). Structured programming with go to statementa. Comput. Sum. 6. Korfhage, R. R. (1974). On the development of data structures. SICPLAN Not. 9. Low, J. H. (1974). Automatic Coding: Choice of Data Structures, Ph.D. thesis, Stanford Univ., Stanford, California. Mills, H. 1). (1972). Mathematical foundations for structured programming, FSD Rep. No. FSC 72-6012, IBM, Gaithersburg, Maryland. Standieh, T. A. (1969). Some features of P P G A polymorphic programming language. Proc. Extensible Lang. Symp., SIGPLAN Not. 4. Taft, E. A. (1972). “PPL User’s Manual.” Center Res. Comput. Technol., Harvard University, Cambridge, Massachusetts. Warshall, S. (1975). “An Overview of the National Software Works.” Massachusetts Comput. Ass., Wakefield, Massachusetts. Wegbreit, B. (1970). Studies in Extensible Programming Languages, Ph.D. thesis, Harvard University, Cambridge, Massachusetts. Wegbreit, €3. (1971). The ECL programming system. Proc. Fall Jt. Comput. Conf. 39, 253-262. Wegbreit, B. (1973). “Procedure Closure in ELI.” Center Res. Comput. Technol., Harvard University, Cambridge, Massachusetts. Wegbreit, B. (1974). The treatment of data types in EL1. Commun. ACM (Ass. Comput. Mach.) 17, 251-264. Winograd, T. (1975). Breaking the complexity barrier again. Proc. SZPLAN-SZGZR Interface Meet., SZGPLAN Not. 10. Wirth, N. (1971). Program development by stepwise refinement. Commun. ACM (Ass. Comput. Mach.) 14, 221-227. Wirth, N. (1973). “Systematic Programming: An Introduction,” Prentice-Hall, Englewood Cliffs, New Jersey. ECL Programmer’s Manual (1974). No. T R 23-74, Center Res. Comput. Technol., Harvard University, Cambridge, Massachusetts. Proc. Symp. High Cost Software (1973). Air Force Off. Sci. Res., Rep. AFOSR/ARO/ ONR, Washington, D.C. Proc. Symp. Very High Level Lang. (1974). SZGPLAN Not. 9.
Parsing of General Context-Free languages SUSAN 1. GRAHAM and MICHAEL A. HARRISON Computer Science Division University o f California af Berkeley Berkeley, California
Introduction .
3.
4.
5. 6.
. .
.
1. Preliminaries . 2. The CockeKasami-Younger Algorithm
.
Introduction . . 2.1 The Recognition Algorithm . 2.2 The Parsing Algorithm . 2.3 A Turing Machine Implementation . 2.4 Linear Grammars-A Special Case . Earley’s Algorithm . Introduction . . 3.1 The Recognition Algorithm . 3.2 Correctness of the Algorithm . 3.3 The Time and Space Bounds . 3.4 The Parsing Algorithm . Valiant’s Algorithm. . Introduction . 4.1 Recognition as a Transitive Closure Problem . 4.2 Strassen’s Algorithm and Boolean Matrix Mult.iplication 4.3 Valiant’s Lemma . 4.4 Computing D+ in Less than O(n3)Time . 4.5 An Upper Bound for Context-Free Parsing . . The Hardest Context-Free Language . Bounds on Time and Space . References .
77 79 . 107 . 107 . 107 . 112 115 119 . 122 . 122 . 122 . 126 . 128 136 . 140 . 140 141 . 145 . 149 . 166 176 . 176 . 181 . 184
. .
. .
.
Introduction
One of the major advances both in the study of natural languages and in the use of newly defined languages such as programming languages came with the realization that one required a formal and precise mechanism for generating the infinite set of strings of a language. Both programming linguists and natural linguists independently formulated the notion of a 77
7s
SUSAN L. QRAHAM AND MICHAEL A. HARRISON
context-free grammar as an important generative schema. If we regard a context-free grammar as the defining mechanism for the syntax of a language, the problem of recognizing that a given string is generated by the grammar is important, because this is clearly part of the task of computer analysis of the language. I n the present chapter, we shall focus on this recognition problem and its related problem of “parsing,” which mcans to find a derivation tree of a string in the language. A variety of methods are now known for parsing classes of context-free grammars. I n some sense the crudest method is systematic trial-and-error ; that is, a deterministic simulation of the nondeterministic choice of next step in a derivation. However, such a simulation can require a number of steps, which is exponential in the length of the string being analyzed. I n this chapter, our attention will be focused on those classes of grammars which are rich enough to generate all the context-free languages. (The interesting and important case of the linear time parsable languages will be treated in a sequel.) We shall concentrate on three algorithms for parsing classes of context-free grammars. We show that each method parses a class of grammars sufficiently large to generate all the context-free languages. Furthermore, each method has a time bound which is shown to be a t worst cubic in the length of the string being parsed. The three methods are presented within a consistent framework and notation, so that it is possible to understand both their similarities and their diff crenccs. The first method is applicable only for grammars of a special form, in which the trees have essentially binary branching. I n general it requires an amount of time proportional to the cube of the length of the string being parsed. However, it is shown that this time bound can be reduced for certain special cases. The second method has a time bound of the same order of magnitude in general, but is applicable for all context-free grammars. Again, the time bound can be improved in special cases. The third method is again applicable only for grammars of a special form, but has a subeubic time bound. We also present a context-free language which, up to constant factors, requires as much time to parse as any other context-free language. In some sense, it is a hardest context-free language. The view that we have taken here is to regard parsing as a special problem in the analysis of algorithms. We shall be concerned with time and space bounds and with differences in the underlying models of computation. I n the restricted space available to us it was not possiblc to cover all the many contributions to this subject. Nonetheless, the three main methods discussed here represent important lines of investigation into the problem which should be widely known.
79
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
The chapter is organized in the following way. We first present in Section 1 , some mathematical preliminaries used in subsequent sections. Section 2 is devoted to the parsing method due independently to Cocke, Kasami, and Younger (see Kasami, 1965; Younger, 1967). In Section 3 we present the method due to Earley (1968, 1970). Section 4 contains an exposition of the method due to Valiant (1975). In Section 5 we present the “hardest” context-free language, introduced by Greibach (1973). Finally, Section 6 describes some additional time and space bounds for context-free parsing under various circumstances. For the most part, the results in Section 6 are only sketched, but they do indicate our comparative ignorance. We do not know any language that requires more than linear time for its off-line recognition and we do not know any way to parse faster than essentially n2.81.That represents a great gap.
1. Preliminaries
In this section we describe the problem we are studying, establish our notation, and introduce the results from the theory of formal languages that we need in order that this chapter be self-contained. We begin with some basic notions from mathematics. Throughout this chapter we shall be concerned about sets. If S is a set, then by I S I we will mean the number of elements in S. The empty set will be designated 0. We shall need to deal with relations between sets. When X and Y are sets, then any set p E X X Y is a relation (between Xand Y ) .Let p E X X Y a n d a C Y X Z. We define pa =
and, if X
=
po = pn+l
{ (x, z )
1 zpyaz
for some y ) ,
Y, {
( 2 , 2)
= P”P,
p* =
V
1x E
X)
(the diagonal),
n 2 0, pn
(the reflexive and transitive closure of
p),
n>O
p+ = p*p
(the transitive closure of
p).
Next, we need the usual grammatical concepts. Definition.
A context-free grammar (hereafter a grammar) G is a
4-tuple.
G = ( V , 2, p , S), where V and L‘ arc two alphabets. L’ C V (letters in 2 and in N = V - Z
80
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
are called teminals and nonterminals, respectively), S E N is called the start symbol, and P is a finite relation, P C N X V* (the set of productions or rules) As usual, we write A -+ a is in P instead of ( A , a ) E P. Definition. Let G = ( V , Z, P , S) be a context-free grammar. We define a relation =+ 5 V* X V* as follows. For any a,p E V*, a * p iff a = alAaz, 0 = alua2 and A -+ u is in P for some A E N and all az,u E V*. In particular, if a1 E 2" or a2 E 2* we write a =+L /3 or a JR p , respectively. For any n 2 0, ai E V*,where 0 5 i 5 n, we call a0 * a1 * . . . =+a n a derivation of a n from (YO. A derivation a0 J L a1 =+L . . . *L a n ((YO =+R a1 ==+R . . . =+R a,) is leftmost (rightmost). Any a E V* is called a (canonical) sentential form iff S +* CY ( S =+* a ) . The language generated by G is the set of strings
L ( G ) = (W E Z * ( S = + * W } , Two grammars are called equivalent iff they generatc the samc language. = ( V , 2 , P , S ) and some a,p E V*, u such that for all az, u E V*, = a1ua2. In this case we say that A -+ u determines &p
If for some context-free grammar G
a
* @,then there is some production A -+
a1Aa2and p The determining production need not be unique. For example, if a grammar contains rules A -+ Ab and A bA then the relationship xAAy =+ xAbAy is determined by either rule. However, any step a <* L a;+l or at J R a,+1in a derivation 010 =+ a1 + . . , 3 an is determined by a unique rule. Consequently, we can represent leftmost or rightmost derivations by a sequence of determining productions. This leads to the following definition : a =
---f
Definition. Let G = ( V , Z, P, S) be a context-free grammar. Let the productions of P be numbered 1, 2, . , . , p and let the j t h production be designated r j . The sequence j ~j z,, . . . , j,, where for 1 2 i 5 n, ji E { 1, 2, . . . , p } is a left parse (right parse) if there is some leftmost derivation (rightmost derivation) a0 3 L a1 =+L . . . =+L an (a0 * R a1 JR . . . =+R an) such that for 1 5 i 5 n, uji determines ai-1- ai. Some examples may help to clarify these issues. Consider a grammar
1. 2. 3. 4.
S+uASb S+SS S+AA S 4 b
5. A 4 AS 6. A - + A A 7. A d a
'LetXandY besetsofstrings.LetXY = (xyIxEX,yEY), wherexyktheconeatemtion of x and y. Define XO = (A], where A is the null siring. For each i 2 0, define X" = XiX and X* = uii0Xi. Let X + X*X. If z is a string, let Ig(s) denote the length of I which is the number of occurrences of letters in I.
-
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
81
z = ( a , b ) , V - Z = N = ( S ,A ] , the start symbol is S, and the production set is shown above. It is more compact and notationally equivalent to combine all rules with the same nonterminal on the left into one BNF-like2 rule using the meta symbol I ”. For example, the grammar above can be written S--+aASb I X S I A A I b A--+ASIAAla Some sample derivations from the grammar are as follows:
S=+SS=+AAS Thus A A S is a sentential form. Also we have
S =+aASb * aaSb + aabb
(1.1)
Consequently it is known that aabb is in L ( G ), the language generated by G. Since (1.1) is a leftmost derivation, the sequence 1, 7, 4 is a left parse. It is very convenient to use trees to represent derivations. For instance, if A -+B1. . . B, is a production, we represent it by the “elementary subtree”
The idea extends naturally to derivations. For instance, a tree for (1.1) is shown below.
b
b
Note that the tree does not tell us whether A ---f a was used before or after S -+ b. However, there is a one-to-one correspondence between trees and rightmost (or leftmost) derivations. It is possible that a string in the language may have more than one tree (equivalently rightmost derivation). For example, consider the following two derivation trees for the string bbb. BNF stands for Backlls normal form or Backus-Naur form.
82
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
This leads to the following definition. Definition. A context-free grammar G = ( V , 2 , P, S) is said to be ambiguous if there is some string x € L ( G ) which has a t least two rightmost derivations.
We have just seen that our running example has an ambiguous grammar. We can also consider a special kind of ambiguity, called “structural ambiguity.” Notice that the nodes of a derivation tree arc labeled by symbols from thc vocabulary of the grammar. An unlabeled derivation tree is a derivation tree from which the labels (grammar symbols) are removed. Definition. A context-free grammar G = ( V , 2, P , S ) is said to be structurally ambiguous if there is some string z E L ( G ) which has at least two rightmost derivations for which the corresponding unlabeled derivation trees are nonisomorphic (ix., have different shapes).
Clearly every structurally ambiguous grammar is ambiguous, but the converse is false, as can be seen by considering the example
A grammar is (structurally) unambiguous if it is not (structurally) ambiguous. A languagc is (structurally) unambiguous if it has some (structurally) unambiguous grammar. A language which is not an unambiguous language is called an inherently ambiguous language. Thus an inherently ambiguous language has the property that every one of its (infinitely many) grammars is ambiguous. It is not clear that inherently ambiguous languages exist, but they do. There is an interesting discussion of these points in Aho and Ullman (1972-1973) and Hopcroft and Uilman (1969). Examples
S+Ab A-+Aa[a
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
83
This grammar is easily sccn to be unambiguous. [In fact, this is a n example of a left-linear grammar and it is known (Hopcroft and Ullman, 1969) that the language generatcd by evcry left-linear grammar is unambiguous.] The language L = ( w w ~ I w E { a ,b ) * ) is easily seen to bc unambiguous. On the other hand, the set
L*
= {WWT
IwE
( a ,b } * } *
is inherently ambiguous. Moreover, the situation is even more complex than that. Let us consider the minimal degree of ambiguity for L*. That is, in some grammar for L*, how slowly can the number of trees grow for a string in L* as a function of its length? The answer is exponential. That is, there arc an exponential number of rightmost derivations for strings in L*. These examples are meant to give some crude idea as to how complex grammatical analysis can be. Now we turn to the general problem which interests us the most, namely parsing. We shall consider two closely related problems, recognition and parsing. 1. The Recognition Problem. Give an algorithm which takes as input a context-free grammar G = ( V , Z, P, S) and a string 20. The output is YES if w E L ( G ) and ERROR otherwise. 2, The Parsing Problem. Give an algorithm which takes as input a context-free grammar G = ( V , Z, P,S) and a string w.The output is ERROR if w 6 L ( G ). If w E L ( G ), the algorithm produces a derivation or a parse of w in G. There may be some instances in which the parsing problem may be redefined to give all the derivation trees, but we shall not do so here, as we know that there may be exponentially many such trees. There are a few mathematical preliminaries that are mentioned now to avoid deIay later. Our choice of a definition for context-free grammars allows arbitrary strings on the right-hand side of a rule. I n particular, this permits null rules, that is, rules of the form A+A Since such rules complicate parsing, it is of interest to give algorithms for their elimination. A grammar is A-free if it contains no null rules. Our proof that null rules can be eliminated from a grammar uses the following lemma. Lemma
a,p
1.1.
E V*. If
Let G a
=
( V , 2 , P, S ) be a context-free grammar and let
p for some r 2 0 and if a =
a1
. . . anfor some n 2 1,
84 ai
SUSAN L. G M W M AND MICHAEL A. HARRISON
I n then there exist t i 2 0 , pi
E V*for 1 5 i
that B
E V* for 1 <_ i 5 nsuch
81. . . on, ai * Bi, and ti
=
n
Proof. We induct on r, the length of the derivation. Basis: Let a
. . an * 0. Then a = 8, so let pi = ai and ti = 0. 0
= a1.
* pi and
Hence ad
0
n
Cti = o = r. i-1
Induction Step: Our induction hypothesis is that if a then there exist t i
2 0, pi E V* such that p
=
PI. . . On,
= a1
. . . an & fi
ti
ai
* Bi and
Now consider
*
Then a = (YI . . . an y & 8. Because G is a context-free grammar, the rule used in a y was of the form A -+ 4. Let Lyk, 1 5 k 2 n, contain the A that was rewritten in a y. Then = a'A$ for some a', 8' E V*. Let
*
*
{
Ti =
Then
ai
ififk
a'(@'
if i =k.
* y; if i # k and * yk. Thus 1
0
ai
(Yk
By the induction hypothesis, there exist ti, pi so that /3 =
81
. . . Bn,
* 86 ti
yi
n
C ti = r. i-1
Combining the derivations, we find 0
ti
ai+yi*@i
for i # k
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
85
and Let
Using this lemma, we show in Theorem 1.1 that any context-free grammar can be transformed to an equivalent grammar in which null rules are used only in derivations of A. Let G = ( V , 2, P, S) be a context-free grammar. There is an algorithm for producing a grammar G’ = (V’, Z,P’, 8’)so that Theorem 1 .l.
(i) L(G’) = L ( G ) (ii) A + A i s i n P ’ i f a n d o n l y i f A E L ( G ) a n d A = S’ (iii) S’ does not occur on the right-hand side of any production in P‘.
I N I and W1 =
{A I A + A ) . For each k 2 1 let Wk+l = Wk U {A I A -+a for some a E Wb*). It is easy to see that Proof. Let n
=
1. Wi E. W;+I for each a 2 1 2. If Wi = Wi+l then Wi = Wi+,,, for each m 2 1 3. Wn+l = Wn
4. Wn
=
(A E
5. A E L(G)
~ 1 ~ 2 . ~ 1
if and only if S E Wn. To construct the new grammar, define G’ = ( B u f S’) , 2 , P’, S’) where if S E Wn {S’+A} Po = 0 otherwise P’ = ( S ’ - + S ] UP* U {A + A t . . . Ak I k 2 1, Ai E V , there exist al, .. .,ak+l E W,* so that A 3 alA1 . . . a k A m + ~is in P).
{
86
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Clearly S' does not appear on the right-hand side of any production in P'. Moreover, A E L (G') if. and only if S' A is in PI if and only if S E W , --f
if and only if S 3 A if and only if A E L (G) . It only remains for us to show that L (G') = L (G) . First we see that for each production A -+ A1 . . . Ak in P', there must have been a production A + a1A1 . . . a k A k a k + l in P, where ayiE Wn* for 1 5 i 5 k 1. Thus
+
& A and so ff
ai
A
2 alA1 . . . ( Y k A k C Y k + l $
A1
. . . Ak.
Thus every production in G' can be simulated in G by a derivation so that L (GI) C L (G). To see that L (G) 5 L (GI), we shall prove by induction on the length h of a derivation that if A
Basis: h = 1. If A
3
Induction Step: If A
$ w, w E 2+ then A =$ w. w is in P , then since w # A, A
3 A1 . .
,
Ak
=$w, w E
-+
w is in P'.
S+, then by Lemma 1.1,
there exist wi E Z* so that w = w1 . , , wk and Ai
G
wi. If wi E Zf then
& wi by thc induction hypothesis. If wi = A then Ad E W , and A + G' Ail . . . Ajp is in P', where 1 5 p 5 k and A j , . . . Ajp is the subsequence of A1 . . . Ak obtained by deleting those A i whose wi = A. Therefore * A 2 Aj, , . A j p 3 w j l . . . Wjpl where wj, . . .wj, = w. Ai
Corollary. There is an algorithm that decides if A E L(G) for any context-free grammar G.
Proof.
A E L ( G ) if and only if S E W,.
Example.
We can illustrate the construction with the following
grammar :
S
--f
aOa
0 + P I aOa I00
P421E E-A
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
87
The grammar is part of ALGOL. To see this, make the following associations : S stands for (string) 0 stands for (open string) P stands far {proper string) E stands for (empty) a stands for ’ 2 stands for (any sequence of basic symbols not containing ’). Here we regard x as a terminal for simplicity. We compute
Wi= {El
w2 =
(E’P)
W3 = { E ,P, 0 ) and
W 4= W3.The new grammar is S‘
+
s
S --+ aoa I aa 0
-+
P 1 aoa I00 10 I aa
P-+X[E Note that the new grammar contains many redundancies. When the useless rules (such as P + E ) are eliminated, it may not even resemble the original one. Note that these changes all came merely from the single rule E -+ A. In more complex examples, the grammars are changed drastically. We shall not arbitrarily exclude A rules from the specification of programming languages, as it is more convenient for the designer to have them present. Now that we have developed this grammatical transformation, it is easy to see how to pass from any context-free grammar G to a A-free grammar which generates L ( G ) - { A ) . A context-free grammar can contain rules which can never be used in a derivation of a terminal string, either because they are not reachable from the start symbol or because they do not generate terminal strings. It is sometimes desirable to eliminate such “useless” rules. (One reason might be to reduce the amount of space needed by a parser.) We next show that such useless rules can always be eliminated. lemma 1.2.
that 1, ( G ) #
For each contest-free grammar G
=
( V , 2, P, S) such
8, one can effectively construct a context-free grammar G’
=
88
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
(V’, 2,P’, S ) such that L (G) = L (G’) , V’ C V , and for every A in V’ 2, there is some x E Z* such that
-
Asx. Q’
Proof. The proof will construct G’. Define
W1 Wk+l
=
(A E N I A
3
x is in P for some x E 2*]
{A E N I A
= wk U
+
x is in P for some x E (2 U W k ) * ) .
Intuitively, Wk contains those nonterminals which will derive some string of terminals in a derivation tree whose height does not exceed k.
C Wk+l (by construction).
Claim 1.
wk
Claim 2.
If there exists an i such that W , = Wi+l then
Wi
=
Wi+,,,
for all m
1: 0.
Proof of Claim. By induction on m. Basis: Wi = Wi+o Induction Step: Suppose Wi = Wi+m. Then A E W;+,,,+ltjA E Wi+,,, *A
E Ws
(ZUWi+,)*
or
A+xisinPforsomexE
or
A - + z is i n P for somex E (Z UW;)*
A E Wd+l A E Wi+n+l - A
E
’+‘i’i.
(Note that the above proof would generalize for any sequence of sets defined in this manner and any predicate, since it did not use the properties of the predicate.) Claim 3.
Wn = Wn+l
where n = I N
1.
Proof of Claim. From Claim 1 we know
w1 s. wz c ws c . .. E w, But for all i, Wi C N
* 1 W; I 5 I N I. w n
=
E Wn+l c
..
*
Therefore
Wn+l.
Claim 4. A E Wi if and only if for some z E 2*, A tion whose tree has height 5 i.
2 x by a deriva-
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
89
Proof of Claim. By induction on i. (We leave the details for the reader.) Now define
G’
=
(V’, 2 , P’, S )
V’
=
w,u 2 u IS}
P’
=
( A+ x
in P I A , x
E
( 2 UW,)*).
It follows from Claim 4 that for every A in W,, there is some x E Z* such that AGx. G’
Now all that must be shown is that L (G‘) = L ( G ). Since P‘ G P we have L (G’) E L ( G ) . To show L (G) C L (G’) let w E L ( G ) . Then
S$w
w E
and
2;*.
Since w E Z*, all rules used in S A w must contain only nonterminals G
A such that for some x E 2*, A
$ x. But from Claims 2 , 3 , and 4 we know
that W , contains all such nonterminals. Therefore by the definition of P’ we know it contains all rules used in S % w. It follows that G
Therefore L (G)
=
L (G’) , which completes the proof.
The hypothesis that L ( G ) # P, is included so that even S is forced to generate a terminal string. In some language theory texts, the hypothesis L ( G ) # P, is dropped, but then the conclusion must be changed to claim that all nonterminals except possibly S can generate a terminal string. Example. We illustrate the construction of Lemma 1.2 on the following grammar: S + aBa
B
---f
Sb / bCC / DaB
C + abb
E
---f
aC
D
---f
aDB
90
SUE4AN L. GRAHAM AND MICHAEL A. HARRISON
The sequence of sets, Wi, would be
wl=
{C)
since C + abb is in P
Wz = { C , B , E )
since B -+ bCC and E
W s = (6,B, E , S ]
since S 3 aBa is in P
w 4 =
3
aC are in P
Wa.
Thus for the grammar G’ we would have
V‘ = ( A , B, C , Ej and P’ consists of
S + aBa
B + Sb I bCC
C + abb
E
-P
aC
If we consider G’ we see that although every nonterminal can generate some string of terminals, the nonterminal E cannot be reached from S, and hence removing it along with the production E 4 aC would not change L (G’) . The next lemma proves this in general. Lemma 1.3. For each context-free grammar G = (V, Z, P, 8) such that L (G) # 0 one can effectively construct a context-free grammar G’ = (V’, 2 , P’, S) such that L(G’) = L (G ) and for each A E N there exist a,B E V* and w E Z* so that
Proof. We can assume without loss of generality that G satisfies the conditions of the previous lemma. For each A E N define
Wi(A) = { A ) W k + l ( A )= W k ( A ) U ( C E N 1 there existcqp E V* so thatB E W k ( A ) and B + aCB is in P ) . Intuitively Wk ( A ) contains all nonterminals reachable from A in fewer than k steps. The wk ( A ) have the following properties: Claim I .
Wk(A) C Wk+l(A).
Claim 2.
If W k ( A )
Wk+m(A ) *
=
Wk+l(A) then for all m 2 0, w k ( A )
=
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Claim 3.
W , L ( A )= W,+,(A) where n
=
91
I N I.
Claim 4. W , ( A ) = ( B E N I A =$aBP where (Y, E V * ) . The proof of these facts completely parallels that used in the previous theorem and is left for the reader. Now consider W , ( S ). This is precisely the set of nonterminals reachable from S. Thus we let
G’ = (V’, 2, P’, S) V’
=
z UW,(S)
P’
=
( A - + a in P 1 A , a E (Z u W , ( S ) ) * ) .
To show that L(G’)
=
L ( G ) we note that P’ C_ P and hence L(G‘) G
L ( G ) . On the other hand, if w
E L ( G ) , then S
w and hence every rule
. all such rules are in G’ used can contain only nonterminals from W , ( 8 ) But and hence S
> w.Therefore L (G’) G’
=
L ( G ), which completes the proof.
Example, Let us return to the previous example and consider G, where P consists of the following productions:
S
4
a3a
B + S b I bCC
-
C ---t abb E
-+
aC
Let us compute Wq ( S ) .
WI(S)
=
(S)
W , ( S ) = ( 8 ,B ) Wa(S) = ( S , B, C )
=
W,(X).
Thus E cannot be reached from S and the construction produces G‘, where
C = {a,b ) S -+ aBa
N
=
( S ,B, C)
I3 -+ Sb I bCC C -+abb Finally, we combine the two lemmas to get the property we sought.
92
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Definition.
P
=
A context-free grammar G
0 or for everyA E N , S
aAB
=
( V , 2 , P, 8) is Teduced if
< w for some
a,0 E
V*, w E 2*.
Theorem 1.2. For each context-free grammar G = (V, 2, P , S) one can effectively construct a context-free grammar G‘ such that L(G’) = L (G) and G‘ is reduced. Therefore we may assume, without loss of generality, that if G is a context-free grammar then it is reduced.
It is also useful to eliminate so-called “chain rules” of the form A + B from grammars. This we do in the next theorem. Theorem 1.3. For each context-free grammar G = ( V , 2, P , S) there is a grammar G’ = (V, 2 , P’, S) so that L (G’) = L (G) and P’ has only productions of the form
S-tA
E (V - S)+,
A
~r
A-ta
aE2.
lg ( a ) 2 2
Proof. We may assume, without loss of generality, that G = ( V , 2,
P,S) satisfies the conditions of Theorem 1.1. Let A E N . We shall eliminate all chain rules A
-t
B, B E N by the following construction. Define
Wi(A) = { A ] and, for each i
2
1
W i + l ( A )= W i ( A ) IJ { B E N I C + B i s i n P f o r s o m e C E W , ( A ) ) . The following facts are immediate: (i) For each i 2 1, W i ( A ) 5 Wi+l(A) (ii) If W i ( A ) = W i + l ( A ) then for all m Wi+m ( A ) (iii) W , ( A ) = W,+,(A) where n = I N I
(iv) w , ( A )
=
(BE N I A
2
0, W i ( A )
=
G BI.
Next, we define G’ = (V , 2, P’, S ), where
P’
=
{ A --+ a I a 6 N and B -+a is in P for some A
EN
for which
B E W,(A)).
It is clear that G‘ satisfies the conditions of the theorem. It is straight[7 forward to verify that L (G’) = L (G).
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
93
Example
E-+E+TIT T -+ T*F I F F
Ia
+
Clearly W 3 ( F )= { P i , W3(T) = IT, F ) , W 3 ( E ) = ( E , T,F ] . Thenew grammar is E+E TI T*FI ( E ) 1 a
+
T
3
T*F I ( E ) [ a
F
3
( E ) In
In many applications, and particularly in Sections 2 and 4, it is essential that our trees have binary branching except a t the leaves. Grammars in this form were first studied by Chomsky. Definition. A context-free grammar G = ( V , 2, P, S) is in Chomsky normal form if each rule is of the form
(i) A -+ BC with B, C E N (ii) A + a with a E Z (iii) AS3 A
- (8)
Next we show that any context-free language is generated by a grammar in Chomsky normal form. Theorem 1.4. For each context-free grammar G = ( V , 2, P, S ) there is a grammar G' = (V', 2, P', S) so that L(G') = L(G) and G' is in Chomsky normal form. Proof. We may assume that G satisfies the conditions of Theorem 1.3. Furthermore, for each production in P,A 3 B1,. . . , B,, Bi E V , T 2 2 we may assume that each Bi E N . For if some Bi E 2,replace Bi by a new nonterminal symbol C and add a production C 4 Bj. (Recall Bj E 2 . ) This is repeated for all such instances in all the rules. Now we can construct P'. 1. If A a! is in P and lg (a) S 2 then A 3 a! is in P'. 2. Let A B1. . . B,, r 2 3, Bi E N be in P. Then P' contains A 4 BlCl ---f
--f
Ci + B2Cz
94
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
.
where C1, . . , Cr--2are all new symbols. (Note that if r = 3, the rules are A -+ &C1, C1 4 BzB3.)This transformation is illustrated in Fig. 1. It is now a straightforward matter to verify that L (G') = L (G).
0 Example.
Consider the following grammar: S -+ TbT T-+TaTIca
First we eliminate terminals on the right and produce S 4 TBT
T + TAT T+CA B-+b A+a
c-tc Only the first two rules need modification t o achieve Chomsky normal form. Thus we have T+CA S-tTDI B+b D1 4 BT A-+a T -+ TDz c+c Dz 4 A T
/*\ B1
i s equivalent to E2
E,
'r-1
Er
FIG.1. The transformation used in the proof of Theorem 1.4.
95
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Now, we shall introduce another grammatical form that has a number of important theoretical and practical implications. Definition. A context-free grammar G = ( V , 2, P , S) is said to be in Greibach normal form (GNF) if each rule is of one of the following forms:
A -+ aB1. . . B, A+a S+A
.
where B1, . . , B, E N - { S ),a E 2. A grammar is said to be in m-standard form if G is in Greibach form as above and n 5 m. Thus a grammar is in Greibach form if and only if it is in m-standard form for some m. This form has a number of important implications. Ignoring S ---f A, all other rules have right-hand sides which begin with a terminal. If G is in GNF and w E L (G) , w # A, then every derivation of w from S in G has exactly lg (w) steps. The following proposition generalizes this fact and states it precisely. Proposition.
Let G
=
( V , 2, P , S ) be a context-free grammar in
Greibach normal form. If w E Z*, a E N*, and S tion of S
L
L
w then every deriva-
w a has exactly max{1, Ig(w) } steps.
The most important practical applications of GNF are to top-down parsing. A problem occurs in top-down parsing when we have “left recursive” variables. Definition.
Let G = ( V , Z, P , S ) be a context-free grammar. A non-
terminal A E N is said to be left recursive [right recursive] if A
+
4
Aa for
+ V* [respectively, A * aA]. A grammar is left (right) recursive
some a E if it has at least one left (right) recursive nonterminal.
In “recursive descent” parsers, the presence of left recursion causes the device to go into an infinite loop. Thus the elimination of left recursion is of practical importance in such parsers. Note that a grammar in GNF is never left recursive. Thus, Theorem 1.5 solves an important problem in top-down parsing. We are now ready to prove this result. The argument proceeds by considering two lemmas. Lemma 1.4. ?r
Let G
=
(V, 2 , P, S) be a context-free grammar. Let B + p1 1 . . . 1 B, include all rules
= A + a1Ba2 be a production in P and
96
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
in P with R on the left-hand side. Define G1 = (V, 2, PI, S ) where PI = I a l P p l 2 ) . Then L (GI) = L ( G ). ( P - { r ) ) U ( A + adm I a&az 1
.
Proof. Clearly L(G1) C L ( G ) since if A + aIpia2is used in a GI derivation, then A * crlBa2+ a l @ moccurs in G. G
G
Conversely, note that A -+alBa2 is the only production o f G not in GI. If A -+ a1Ra2 is used in a G derivation of a terminal string, thdn B must be rewritten by some B + pi. These two steps are combined in G. Therefore L (G) C L ( G I ) a
The next lemma involves certain important regular sets. lemma 1.5.
Let G = ( V , 2, P , S) be a A-free context-free grammar.
Let
A4AaiI
. . . IAa,
where ai # A for all i, 1 5 i 5 T, be all the rules with A on the left such that the leftmost symbol of the right-hand side of the rule is A (except possibly for the rule A -+ A ) . Let
A - t o 1 I . . . I P* be the remaining rules with A on the left. Let GI = (V U { Z) , 2,PI, S ) , where 2 is a new nonterminal symbol and all the productions with A on the left are replaced by A -+biz1 . . I B.2 I PI I . . I P a
2~~12~...~~crrZ~crl~...~ar Then L(G1) = L ( G ) . Proof. The effect of this construction is to eliminate the left recursive variable A . In each place, we have a new right recursive variable Z. Note that none of the new A rules is directly left recursive because none of the begins with A . Since Pi # A, we cannot get left recursion on A by “going through” 2. Also note that 2 is not a left recursive variable because cri # A for all i and because none of the ai begins with Z. To complete the proof, note that the original productions in G with A on the left generate the regular set (pl u . . . u p.) (a1U - . . U a,)*.Moreover, this is the set generated by A (with the help of 2) in GI. With the aid of this remark, it is straightforward to verify that L (GI) = L (G). We 0 omit the details.
Now we turn to the main result. Theorem 1.5. Every context-free language is generated by a contextfree grammar in Greibach normal form.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
97
Proof. Suppose, without loss of generality, that L = L ( G ) , where G = ( V ,8,P, S) is reduced, in Chomsky normal form, and assume (for the moment) that G is A-free. Let N = ( A l , . . . , A , ) with S = A1. We begin by modifying the grammar so that if A , + Aja is in P , then j > i. Suppose that this has been done for all rules starting with A1 and proceeding to Ak. Thus A i -+A,a for 1 i 5 k implies j > i. We now deal with the rules which have Ak+l on the left. If Ak+l -+ Aja is a production 1, we generate a new set of productions by substituting for with j < k Aj the right-hand side of each of the Aj rules according to Lemma 1.4. By repeating this operation at most k - 1 times, we will obtain rules of the form &+I-+ aa, a E 8 or Ak+l -+ Ata with 1 2 k 1.
+
+
+
The rules with 1 = k 1 are replaced according to Lemma 1.5, in which we introduce a new nonterminal Zk+l. Repeating the construction for each original nonterminal gives rules of the following forms: 1. A h - + A l a
2. A k - ) a a
3.
ze + a
( ( N - ( 8 ) )U {Zl, . . . , Zn))* a E 2 , a E ( ( N - ( S ) )U {ZI,..., Zn))* a E ( ( V - I S ) ) u 121,. . . , Z,))*. with 1
> k, a E
The reader should verify that the exact quantification given above is correct. This requires a separate proof and employs the fact that G was in Chomsky normal form. Note that the leftmost symbol on the right-hand side of any A , production must be terminal by (1) and the fact that there is no higher indexed nonterminal. For A,-l, the leftmost symbol generated is terminal or A,,,. If it is A,, generate new rules by use of Lemma 1.4. These new rules all begin with a terminal symbol. Repeat the process for A,+ . . . , A1. Finally, examine the 21, . , Z, rules. These rules begin with a terminal or a n Ai. For each Ail use Lemma 1.4 again. Because G was in Chomsky normal form, the only terminal on the right-hand side of any rule is the first one. To complete the proof, we must deal with the case where A E L . If this occurs, let G = ( V , 8 , P , S ) be the GNF grammar for L - { A ) . Define G’ = (V U (8’1,8,P’, S’),whereP’ = P u {S’+ S ) U {S’+ A I A E L } . One application of Lemma 1.4 produces the desired grammar. 0
..
Corollary. Every context-free language is generated by a grammar which is not left recursive.
98
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Example.
Consider G as shown below.
AI
-+
AiA, 1 0 I 1
Az-+0( 1
G is A-free and in Chomsky normal form. Using Lemma 1.5
Az--tO[ 1
-+oz 112 10 I 2 -+ AzZ I A2
A1
J
Substituting for Az in the 2 rules gives
A1 4 0 2 I12 10 11 Az-+O/1
I
2 3 02 I 12 10 1 Note that AP is no longer reachable. Our next development will be to analyze the number of steps in a derivation of a string of length n. We will use this result in subsequent sections to analyze the number of steps required by certain parsing algorithms. Unless some restriction is placed on the grammar or the derivation, the length of a derivation may be unbounded. For example, consider the grammar G whose rules are S-+ S S 1 a 1 A There are an infinite number of derivations of every string in L (G). Our purposes will be served by restricting the grammar to be of the following type. Definition. A context-free grammar
for each A
+ E N , A + A is impossible.
G
=
( V , 2, P , S) is cycle-free if
Now we can state the pertinent result. Theorem 1.6.3 Let G = ( V , 2, P , S) be a cycle-free context-free grammar and let 1 = max(lg(a) I A -+ a is in P, A E N ) . There exist constants co, cl, and cz which depend only on 1 N I and 1 such that CI > cz
and for all A E N , (Y E V*,if A (a) if (b) if
n = 0
n
>0
=& a,lg(a)
then i I co then i I cln
= n then
- cz
The statement of the theorem and its proof are due t o A. Yehudai.
99
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Proof. First we must consider derivations of strings whose length is a t
most one.
Claim 1.
Suppose A
2 a. Then
(i) If a = A, no path in the derivation tree is longer than / N I. (ii) If a = a E Z, then the path from the root labeled A to the leaf labeled a is no longer than I N I. (iii) If a = B E N then the path from the root to the leaf labeled B is no longer than 1 N I - 1.
Proof of Claim. Consider a path from the derivation tree of A
as shown in Fig. 2. Suppose the path from A
=
+
3 a
Al to a is of length k. Then we have
+
Since A i + A; is not possible, we must have that all of All . . . , Ak, and aaredistinct.Ifa=BENthenk< I N I - l . I f a = a E L:~(A},then k I N 1. This completes the proof of Claim 1.
<
Now we are in a position to complete the analysis of A derivations. Note A has a tree of height that a derivation A a t height j in the tree is a t most li. Thus
< I N 1. The number of nodes
Let co be this constant in Eq. (1.2) which depends only on 1 and I N we have proved (a) of the theorem.
FIG.2. A derivation tree of A
*+ a lg (a)5 1.
1.
Thus
100
BUBAN L. GRAHAM AND MICHAEL A. HARRISON
Now we must work out the case whcrc a nonterminal derives another nonterminal. Claim 2. Suppose A =$B , B E V . If the length of the path from the root A to a leaf B is k then
i
I k ( ( 2 - 1)co + 1 ) .
Proof of Claim. The argument is an induction on k.
Basis: k
=
0 implies i = 0 and the basis is verified.
Induction Step: Assume k > 0 and that the result is true for the path to the node labeled B of length k - 1. Then A *Al..
.A ,
where
Ad=% B
for some d, 1
Aj$ A
j # d.
5d5m
and
By the induction hypothesis id
5
(k
- 1) ( ( 1 - 1)Co + 1 ) .
Moreover ij 5 co
if j # d
by our proof of (a). Therefore
i= 1
$.id
f
cij
i#d
i5 1
+ (k - 1 ) ( ( 1 - 1)CO + 1) + ( I - 1)co = k ( ( l - 1)Co + 1)
using that m 5 1. This completes the proof of Claim 2. If we combine Claims 1 and 2, we have already shown the following result. Claim 3.
Suppose A
is
I
=$B, B E V . Then
I N I ZINl
if
2>1
INl((Z--l)Co+1)
if
BEV
(~N~-l)((l-l)co+l)
if
BEN
PARSING OF GENERAL CONTEXT-FREE LANQUAQES
101
Now we are ready for the main claim, which will complete the argument. Claim
4. If A
2 a,lg(a) = n > 0 then
Proof of Claim. The argument is an induction on n.
Basis: n = 1. We compute ~1.1
So i 5
c1-1
-
~2
=
I N l ( ( l - l)&
+ 1).
- cz by the second case of Claim 3.
Induction Step: Assume the result true for all strings of length at least i *
1 and less than n. Suppose that A a,lg(a) = n 5. 2. The tree of this derivation must have a node with at least two immediate descendants producing non-null strings, as shown in Fig. 3, Let D be the highest such descendant of A . Thus we have
A
,* * D 3 D1... D,
where
i
FIG3. The general derivation of A =+ a.
102
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
for each j, 1
5 j 5 m. Moreover m
a1. . . % = a
and
x n j = n j-1
and there exist j , , j 2 such that aj,
f A # (~iz
since l g ( a ) 2 2. To simplify the notation of the proof, assume that there is some p, 2 I p 5 m, such that CYI,
. . . ,a p # A
and apt1
=
... = L
Y ~=
h
(There is no loss of generality in this assumption, since if this condition were not satisfied, we could introduce new indexing to make it so.) Then
1-1
j-1
i-Pfl
By the induction hypothesis,
j- 1
=
cln
- cz.
j-ptl
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
103
Since the induction has been extended, the proof of Claim 4 and the proof of the theorem are complete. Because our fundamental goal in this chapter is to examine parsing and the resources required for its implementation, we must discuss how to implement it and how to compare algorithms. The usual technique is to implement algorithms on a standard model of a computer and then to estimate the time and storage requirements of each algorithm. For that approach, standardized models of computation must be used. There are two models in common usage which we shall employ, the Turing machine and the random access machine (or RAM for short). A Turing machine is exemplified in Fig. 4.The device has a finite state control and a read-only input tape on which two-way motion is allowed. There are also k work tapes on which the device may read or write symbols from some fixed alphabet and on which two-way motion is allowed. The work tapes are two-way infinite tapes, in that if we attempt t o move right or left on a work tape, we always find a new tape square. Work tapes are always blank a t the beginning of a computation. An atomic move of the device is governed b y a single-valued transition function4 which depends on the internal state, present input, and present contents of each of the k work tapes. I n one move, we change state and change all k work tapes and move a t most one square on the input and on each work tape. Certain states are designated as accepting states. The device accepts a n input if it halts in an accepting state. Examples of this type of machine may be found in Aho et al. (1974) and Hopcroft and Ullman (1969). The time complexity T ( n ) of a Turing machine is the maximum number of moves made by the machine in processing any input of length n. The space complexity S ( n ) is I
I I
la1
I I I I I ~1
read-only input tape
State Control
k work tapes
t FIG.4. A Turing machine. Because it is a function, this device is deterministic.
104
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
the maximum number of squares traveled from the initial position on any work tape in processing an input of length n. Example. Let us design a Turing machine which will accept if and only if the input is of the form anatwherc n 2 1. It is convenient to use the following identity for the calculation of n2: n-1
n* =
C (2i + 1 ) . i-0
This identity may be verified by induction on n. Assume we have a Turing machine with one work tape. The following intuitive description of the Turing machine should bc sufficiently clear that a more formal version can be easily written down from it. 1. Advance the input one square. Accept if the end of the input has been reached. This handles the special case when n = 1. 2. Initialize the work tape by writing bbb on it. 3. Simultaneously advance the input while reading the work tape. Note that the direction that we read across the work tape alternates. Accept if the end of both the input and the work tape is reached simultaneously. 4. Add two b’s to whichever end of the work tape we happen to be a t and return to step 3.
It is assumed that we halt and reject in the above description if an abnormal condition arises, for example, if we run out of input in the middle of step 3. In Fig. 5 a computation which accepts ae is shown. It is easy to see that the time bound and space bound for this computation are linear in the length of the input, which is itself quadratic in n. I n the analysis of Turing machine computations, a distinction is made between “on-line” and “off -line” computations. Suppose we are using a Turing machine to recognize some language. In an of-line computation, no restriction is placed on the motion of the input head and on when the device accepts. In an on-line computation, the device must read the input from left to right. After reading the ith input, the machine must indicate whether or not a1 . , . a , is or is not in the language accepted by the device. An online restriction severely constrains the freedom of the device in processing an input. For this reason, one can sometimes prove a lower bound in this case when one is not available in the general case. Although Turing machines have an important role in the theory of computation, they are not natural models for algorithmic analysis. Turing machine computations require a great deal of time moving back and forth along thc tapes looking for information and do not have the familiar ran-
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
WORK TAPE
INPUT
TIME
105
a a a a a a a a a 0
t
t
1
t
4
t
T b b b
t b b b
5
t
T b b b
6
t
t
b b b
7
t
8
t
9
t
t b b b b
t b b b b b
t b b b b b
10
t
T
b b b b b 11
t
f
b b b b b 12
t
t b b b b b
13 14
t
T
t
f accept
FIQ.5. A computation for aD.
dom access memory properties of computers. This suggests that another model is needed and has led to the use of RAM’S (named for random access memory model). A RAM is a device of the type shown in Fig. 6. It has an input tape which can be read. Unlike a Turing machine tape, every square may con-
106
BUBAN L. GRAHAM AND MICHAEL A. HARRISON
ccuuuun I
Program
I
read-only input tape
Re l s t e r s
F
a
w r i t e a n l y output ]a-pet[
FIG.6. ARAM.
tain an integer.s The output tape is of the same type, except that it can be written but not read. The RAM has an unbounded number of registers, each of which is capable of storing an integer. There is no bound on the size of the integers stored in a register, nor is there a bound on the number of registers which may be utilized. There is a program in a RAM which resides in a program memory which cannot be altered. Each register can be directly accessed by the program. A program counter indicates which instruction is currently being executed. The output is printed on a write-only output tape, each square of which can hold an arbitrary integer. There is tin instruction set which can be anything that is sufficient to compute any partial recursive function. For example, a simple instruction set might be
LOAD u STORE u SUBTRACT u READ u WRITE u BRANCH ZERO I where u identifies a register and 1 identifies an instruction in the program. We are being deliberately vague about the actual instruction set [cf. Aho et al. (1974) for a number of different choices]. The reason we are vague is that we wish to use computers at a high level and so we shall write algorithms for a RAM in an ALGOLlike dialect. It would be a straightforward task to translate such a language into RAM code. Example. Suppose locations A . to A,-I in our RAM each contain an integer. We wish to write a program to sort the integers into ascending
There is no a priori bound on the size of the integers used.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
107
order. Let BIT have the values true or false and let SWAP exchange the values of its arguments. initialization: BIT := false; main loop: repeat fori:=Oton- 1do if A ; > A++l then begin SWAP ( A , Ai+l); BIT := true end until BIT = false.
We leave it to the reader to analyze this algorithm, but note that in the worst case, we might have n (n - 1) comparisons. Thus this is an algorithm which runs in quadratic time but linear space. Notice that in our ALGOL dialect, program labels (such as “initialization” and “main loop” in the example) will have no computational significance, but will simply serve as reference points for the discussion. In addition to program variables containing integer or boolean values, we will allow variables to take on (unordered) sets as values. Our algorithms will often deal with two-dimensional matrices. The matrices will use O-origin addressing. That is, an m X n matrix will consist of rows 0 to m - 1 and columns 0 to n - 1. We will use the following matrix concepts. A matrix M = ( m i , j )is (strictly) upper triangular if i 1 j (i > j ) implies m i j is empty. The principal diagonal of an n X n matrix M consists of the elements mo,o,ml,l, . . . , mn-l,n-l.
2. The Cocke-Karami-Younger
Algorithm
Introduction
The first parsing algorithm to be presented has been known since the early 1960’s. Various forms of it exist in the literature under different names. One version is attributed to Cocke and is reported in Younger (1967). Another version is given by Kasami (1965) and Kasami and Torii (1969), while the most accessible version is due to Younger (1967). We present a slightly different formulation than has been given previously, in order to make the algorithm compatible with the other methods to be discussed. 2.1 The Recognition Algorithm
The algorithm to be presented requires grammars in Chomsky normal form. As we saw in Section 1, there is no loss of generality in this assump-
108
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
tion, since we can transform any grammar to an equivalent grammar in Chomsky normal form. The string A is in a language if and only if the Chomsky normal form grammar for the language has a rulc S ---f A, where S is the start symbol. Furthermore, the rule S --t A cannot be used in the derivation of any strings of length greater than 0. Consequently, without loss of generality, we can restrict our attention to input strings of length greater than 0 and A-free grammars in Chomsky normal form. The key to the Cocke-Kasami-Younger algorithm is to form a triangular matrix of size (n 1) (n 1 ) where n is the length of the input string. The entries in the matrix are subsets of variables. We can then determine membership in the language by inspecting one matrix entry. We can also use the matrix to generate a parse. We next present the algorithm for constructing that matrix.
+
+
Algorithm 2.1.1. Let G = ( V , 2, P, S) be a grammar in Chomsky normal form (without the rule S 4 A) and let w = aluz . . . a,, 12 2 1 be a string, where for 1 5 k 5 n, ak E 2. Form the strictly upper triangular (n 1 ) X (n 1 ) recognition matrix T as follows, where each element ti,j is a subsct of V - Z and is initially empty.”
+
loop 1: loop 2:
+
for i := 0 t o n - 1 do
ti,i+l := { A 1 A + u ; + ~ is in P ] ; ford := 2 t o n do for i := 0 t o n - d do beginj := d i; t i , j := ( A 1 there exists k, i 1 Ik 5j - 1 such that A BC is in P for some B E t ; , k , C E t k , $ ] end.
+
+
Example
S-,
SS I AA I b
A+ASIAAIa
w
=
aubb
The recognition matrix is
l--l-l~lAi~l ----I I I A
T = l
S,A
S,A
I
S
Recall the O-origin addressing convention for matrices.
A,S
I
S
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
109
In computing 1, the superdiagonal stripe is filled in by loop 1. As the following diagram illustrates, the remaining computation is done down the diagonals ( j - i = d ) and working up to the right-hand corner as d = 2, . . . , n. Thus the rows are filled left-to-right and the columns are filled bottom to top.
As shown in Fig. 7, the (i,j)th element of the matrix is computed by scanning across the ith row and down the j t h column, always to the left and below the entry in question. Thus we successively consider the pairs of entries
+ 1 ) (i + 1 , j ) (4 i + 2) (i + 2, j > (i,i
110
SUSAN L, GRAHAM AND MICHAEL A. HARRISON
Each matrix entry is a set of nonterminals. For each pair of entries, we find every grammar rule with a right part composed of a symbol from the first entry followed by a symbol from the second entry. The left parts of these rules constitute the (i,j ) th entry in the recognition matrix.' The next theorem and its corollary show that the algorithm is correct, in that it recognizes all and only strings generated by the associated grammnr . Theorem 2.1 .l. Let G = ( V , 2,P, S) be a grammar in Chomsky normal form (without the rule S -+ A ) . Let w = a1a2 . . . a,, n 2 1 be a string, where for 1 5 k 5 n, ak E 2, and let T = (ti,j) be the recognition matrix
if and only if A
constructed by Algorithm 2.1.1. Then A E ai+&+2. . . aj.
+
*
Proof. Since the recognition matrix is strictly upper triangular and (n 1) X ( n l ) , thc algorithm must set entries t i , j , wherc 0 5 i 5 n - 1, 1 5 j 5 n, and i < j . Consequently 1 I j - i I n. The first loop handles the case j - i = 1. In the second loop, since j = d i, we know that d = j - i. For a given value of d, we must have j = d i I n. Consequently i 5 n - d. It follows that Algorithm 2.1.1 sets every entry of the recognition matrix and each element is assigned a value only once. We show by induction on d that the values of T correspond appropriately to derivations.
+
+
+
+
Basis: j - i = 1, s o j if and only if A =+ ai+l.
-1
=
i. Clearly, after the first loop, A E ti,i+l
Induction Step: For some d = j all d' < d. After the second loop, A E
ti,j
Since k - i
- i, 1 < d S
if and only if there exists k , i < k A ---f BC is in P for some B E t i . k ,
n, assume the result for
<j
cE
such that tk,j.
< d and j - k < d, the induction hypothesis applies to both
7 Algorithm 2.1.1 is essentially the same computation found elsewhere in the literature for this parsing method. However, it differs in the indexing of elements within the parsing matrix. We have transformed columns to rows and rows to diagonals, as indicated below, in order to be consistent with the matrices for the other metho& presented subsequently.
111
PARSING OF GENERAL CONTEXT-FREE LANGUAGES ti,k
and t k , j , and we have
B E
ti,k
c E tk.j
if and only if
B
+ ai+l . . . a k + +
if and only if C =+a k + l
.. . a j
Combining these results, we have
A E
ti,j
if and only if
A
This extends the induction. Corollary. w E L(G) if and only if
+ ai+l . . . a+ * BC =+ 0 S
to,,,.
Algorithm 2.1.1 is an “off-line” algorithm in the sense that all of the input is read a t the beginning and the successive matrix entries computed correspond, by Theorem 2.1.1, to sets of successively longer substrings of the entire input string. An “on-line” algorithm would compute the matrix entries in such a way that after reading ai, the recognition matrix would indicate which nonterminals generate a1a2 . . a{. I n fact, Algorithm 2.1.1 can be rewritten as an on-line algorithm. We present this algorithm so that the reader can compare it with the on-line algorithm due to Earley which is discussed in the next section.
.
Algorithm 2.1.2. Let G = ( V , 2, P , S ) be a grammar in Chomsky normal form (without, the rule S t A) and let w = alaz . . . a,,, n 2 1 be a string, where for 1 5 i 5 n, ai E Z. Form the strictly upper triangular ( n 1) X (n 1) recognition matrix T as folIows, where each element t i , j is a subset of V - Z:
+
+
forj := 1 t o n do begin := { A 1 A -+ a j is in P } ; €or i := j - 2 downto 0 do tj.-l,j
+
t i , j := ( A I there exists k , i 1 such that A t BC is in P for some B E end.
5 k 5j ti,k,
C E
1
tk,j)
Notice that this algorithm fills in the matrix column-by-column and the column elements are computed “bottom-to-top.” We leave it t o the reader to verify that Algorithm 2.1.1 and Algorithm 2.1.2 yield the same recognition matrix for any grammar and string. Next we estimate the number of steps required by the algorithm, as a function of the length of the string being analyzed. Observe that for a given ~, of ( A 1 A + ai+.l is in P ) takes terminal symbol u ~ + computation a fixed amount of time. Similarly, for fixed i, j, and k, computation of
112
S U S A N L. GRAHAM A N D MICHAEL A . HARRISON
( A I A 3 BC is in P, B E t i , k , C E t k , j } takes an amount of time bounded by a constant with respect to the length of the string being analyzed. Theorem 2.1.2. Algorithm 2.1.1 requires O(n3) steps to compute the recognition matrix.
Proof. Loop 1 takes cln steps to initialize the superdiagonal. The setting of tilj in the body of loop 2 takes cz(d - 1) steps, since k can take on d - 1 values. Therefore the algorithm requires n
C~TI
+ C (d - 1)
(TI
- d + 1)
=
+
~ 1 %
d-2
- n,
6
=
o ( n ~steps. >
The space required by the algorithm is determined by the number of elements in the strictly upper triangular matrix. Since each element contains a set of bounded size, we get immediately the next theorem. Theorem 2.1.3.
Algorithm 2.1.1 requires n ( n
+ 1)/2 cells of space.
2.2 The Parsing Algorithm
In the preceding section we have given a recognition algorithm. Algorithm 2.2.1, which follows, provides a means of obtaining a parse for a given string from its recognition matrix. The algorithm will also be used for the recognition matrices defined in Section 4. Algorithm 2.2.1. Let G = ( V , Z, P , S) be a grammar in Chomsky normal form (without the rule S + A) with the productions numbered 1, 2, . . , , p and the ith production designated ri.Let w = a l a . . . an, n 2 1 be a string, where for 1 5 k 5 n, ak € Z and let T = (ti,j) be the recognition matrix for w constructed by Algorithm 2.1.1. I n order to produce a left parse for w, define the recursive procedure f
PARSE (i,j , A ) which generates a left parse for A =+ ai+lui+z . . . a j by procedure PARSE (i,j, A ) ; begin if j - i = 1 and rm = A -+ a;+l is in P then out,put ( m ) else if k is the least integer i < k < j such that rm = A 4 BC is in P where B E t ; , k and C E t k , j then begin output (m) ; PARSE (i, k , B ) ; PARSE@, j , C ) end ; end;
113
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Output a left parse for w as follows: main program:
if S E tO,n then PARSE(0, n, S) else output (“ERROR”).
Notice that Algorithm 2.2.1 produces the leftmost parse “top-down” with respect to the syntax trees. By contrast, Algorithm 2.1.1 builds the recognition matrix “bottom-up.” The reader can easily modify the algorithm to produce a rightmost parse. We next verify that Algorithm 2.2.1 works and derive its running time. The bheorem is followed by an example. Theorem 2.2.1. Let G = ( V , Z, P , S ) be a grammar in Chomsky normal form (without the rule S 4 A ) , with productions designated r1, T ~.,. . , rP.If Algorithm 2.2.1 is executed with the string w = alaz . . . a,, n 2 1, where for 1 5 i I n, ai E 2 , then the algorithm terminates. If w E L (G) then the algorithm produces a left parse; otherwise it announces error. The algorithm takes time 0 (nz).
Proof. We establish the theorem by a sequence of partial results.
Whenever PARSE (i,j , A ) is called, then j
Claim 1.
> i and A E
ti,+
Proof of Claim. The procedure PARSE is called once from the main program and twice from within the body of PARSE. In all cases the call is conditional on the satisfaction of this property of the arguments. Claim 2. PARSE (i,j , A ) terminates and produces a left parse of A ai+lai+2.. a+
+
.
Proof of Claim. We induct on j
Basis: j - i
+
=
- i.
1. By Claim 1, A E
t;,i+l.
It follows from Theorem 2.1.2
that A * ai+l and therefore that P contains a rule T,,, = A 4 ai+l. Clearly, the procedure PARSE terminates and produces the left parse m. Induction Step: Assume the result for j - i 5 d. Consider PARSE(i, j , A ) , where j - i = d 1. Since j - i # 1, the else portion of the conditional statement in PARSE is executed. It follows from Claim 1 and
+
+
*
Theorem 2.1.2 that for some B, C in V - 2,A + BC ai+lai+2 . . . aj and consequently that P contains some rule T,,, = A -+ BC, where B E ti,k and C E tk,+ Therefore PARSE produces (m, PARSE (i, k, B ) , PARSE (k,j, C) ) . By the induction hypothesis, PARSE (i, k, B ) and PARSE (k,j, C)
114
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
terminate and produce the appropriate left parses. Consequently the else
+
portion of PARSE terminates and produces a left parse of A =+ ai+~ui+~. . . aj. Claim 3. For each d , 1 5 d 5 n, a call of PARSE (i,j , A ) takes a t most - i = d and c is some constant.
cd2 steps, where j
Proof of Claim. We induct on d. Basis: d
=
1. PARSE (i,i
+ 1, A ) takes one step.
Induction Step: Assume the result for j - i I d. Suppose PARSE (i, j , A ) is called, with j - i = d 1. Clearly the else portion of the procedure is executed. Except for calls of PARSE, for some constant CI, the else portion takes at most c1( j - i - 1) 5 cl ( j - i) steps, since k takes on j - i 1 values. Therefore if we choose c = c1, then including the recursive calls on PARSE and using the induction hypothesis, PARSE requires at most c ( j - i) c(k - i)' c ( j - k)2steps. Let k = i z andj = i z y. Then the number of steps is
+
+
+
+
c ( ( j - i)
+ (k -
+ (j-
i)2
k)2)
= c(z
+ +
+ y + z 2 + y*).
Using the elementary relationship (5
+ Y)/2 s ZY,
we get 5
+ y + + y2 5 + y2 + 2xy = (z + y)2 = (j- i)z. x2
x2
This completes the induction proof. Putting the partial results together, it follows from Theorem 2.1.2 that if w B: L ( G ) , then S 6 to,,. Clearly the algorithm terminates and announces error in this case, Otherwise the algorithm calls PARSE (0, n, 8).It follows from Claim 2 that the algorithm terminates and produces a left parse for
S
+
*
. . . a,
~ 1 %
=
w.It follows from Claim 3 that the algorithm takes
0
O ( n z ) steps.
Example. Continuing the previous example, we number the productions
1. S + S S 2. S - A A 3. S + b
A+AS 5. A + A A 6. A - a
4.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
115
The parse of w = aabb is generated in the following way: Since S E t 0 . 4 , PARSE(0, 4, S) is called. Since ‘lr2 = S --t AA is in P , A E to.1, and A E t1.4, we get
(2, PARSE(0, 1, A ) , PARSE(1, 4, A ) ) Since x4 = A + A S is in P , A E
t1,2,
=
S -i SS is in P , S E
t2,31
(2, 6, PARSE(1, 4,A ) .
and S E h,4, we get
(2,6,4,PARSE(1,2,A),PARSE (2,4,S)) Since ?rl
=
=
and S E
(2,6,4,6,PARSE(2,4,S)) . t3,4,
we get
(2,6,4,6,1,PARSE(2,3,S),PARSE (3,4,S)) = (2,6,4,6,1,3,PARSE(3,4,5)) =
(2,6,4,6,1,3,3)
which corresponds to the tree
b‘
bI
By combining our recognition and parsing results, we get a time bound for general context-free parsing. Theorem 2.2.2. Every context-free language L can be parsed in time proportional to n3 and in space proportional to n2,where n is the length of the string to be parsed.
Proof. By Theorem 1.4, every context-free language L E 2* is generated by a grammar G = ( V , 2, P , S) in Chomsky normal form. For any a,,ifn=Othenw=AisinLifS-+AisinG.Ifn > 0 stringw = a then by Theorem 2.1.1 we can determine whether w E L by a computation which, by Theorem 2.1.2, requires at most O(n3)steps. If w E L, then by Theorem 2.2.1, we can generate a parse for w in another O(n2)steps. The space bound follows from Theorem 2.1.3. 0 2.3 A Turing Machine Implementation
Although we have shown how Algorithm 2.1.1 recognizes strings in time O(n3) on a random access computer, it does not follow that this can be
116
SUSAN L. QRAHAM AND MICHAEL A. HARRISON
done on a Turing machine within the same time bound.8 It might be the case that the Turing machine cannot organize its tapes sufficiently well to avoid many bookkeeping steps which will change the time bound. I n this section we show that in fact recognition can be done on a Turing machine in time 0 (n3). Our model will consist of a Turing machine with a read-only input tape and two work tapes. Theorem 2.3.1. Let G = ( V , 2 , P, 8)be a A-free context-free grammar in Chomsky normal form and let w = a1 . . . a,, ai E 2, 1 5 i I n. There is a Turing machine A which computes the recognition matrix T = ( t i , j ) in time at most cn3 for some constant c.
Proof. The initial configuration of A is shown below.
T where rJ and $ are endmarkers on the tapes and B is a “blank” symbol. The idea of the argument will be to have the upper triangular entries of t i , j laid out on both tapes 1 and 2. On tape 1 the entries of T will be written by rows, while on tape 2 the entries will be by columns. The computation proceeds by filling in the matrix by diagonals, filling entries in the same order as does Algorithm 2.1.1. The first phase of the computation is to initialize A to the configurations
0,l
Tape 1 (by rows)
l#lh,l\
...I
0,n 1,2
Y
t
1,n
It1,21.4
1 . . .I
n-2,n-1 tn--2.38-1
I
n-2,n n-1,n
I~
I $J
1 . n
Y
n-1
n
0,l 0,2 1,2 Tape 2 [j!(tO,ll It13 I. (by columns) 2 ’
t
*.I
0,n-1
I . . .]tn-2,n-11 n-1
0,n
n-1,n
I . . .”It,--l.* ,1 9b 1 n
* See Ah0 et al. (1974, chapter 1) for a general comparison of computation times for these two models.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
117
Since Tape 1 and Tape 2 each have 0 (n’) entries, the initialization takes a t least O ( n z )steps. It is left to the reader to show that the initialization can be done in time 0 (n’). We next show how to compute any element ti,,, where i 1 < j 5 n. The configuration just prior to the computation is shown below.
+
The procedure to compute t i , jis as follows once we are in the position above. Move right on tapes 1 and 2 , computing t i , j in the finite state control until a blank is encountered on tape 1. This is square (i,j) on tape 1, so write the entry there. Move left on tape 2 to the first blank, which is (i,j),and make the entry there. This computation takes 0 ( j - i) steps. Next we must get into position for the ti+l,j+l (if we were not previously at the “end” of the diagonal). Move right on tape 1 to the first nonblank; since the rows are filled leftto-right, this is ti+l,d+2. Move right on tape 2. Since the columns are 1) filled bottom-to-top, the first blank we encounter is in square (0, j and the subsequent first nonblank is in position (i 2, j 1 ) . If we reach $ on the work tapes then we have concluded a diagonal. Thus, the previously computed entry was ti,,,. The next entry to be computed will be (0, n 1 - i) . To position the work tapes:
+
+
+
+
1. Move to f on both tapes 1 and 2. If no blanks were encountered along the way, then look at the last entry t i , j in the finite state control. This was to,%.If S € then halt and accept, else halt and reject. 2. If there were blanks along the way, then move the head on tape 1 t o the first square to the right of f. This is to,l. On tape 2,scan right and find the first blank, then go one square to the right of the blank. The blank is in position 0, n 1 - i. The square to the right is tl.,,+l--i. The tapes are then positioned correctly.
+
To estimate the number of steps, note that the computation of any element t i , j , i 1 < j _< n takes at most cn steps. (One can compute c from
+
118
SUSAN L. GRAHAM A N D MICHAEL A. HARRISON
studying the tape motion). Since there are 0 (nz)elements, at most c1n3 steps are required to compute all the elements. I n addition we must consider the tape repositioning. To compute the next element in a diagonal requires 0 (n)steps for repositioning. For each of the n diagonals, we must reposition the heads over the entire work tapes. Consequently repositioning requires 0 (+) steps. Thus, including the initialization, the entire computation takes
O(n3)
+ O(n3) + O(nz) = O(n3)
steps.
0
Example. Consider the grammar G used in the previous examples and shown below. Let w = aabb.
S-+SSIAAIb A +AS
I A A I a.
After initialization the configuration of the Turing machine is
We scan right on both tapes to compute 0,2. This is entered in the 0,2 square of tape 1, which is the current position of the tape head. On tape 2, we scan left to the first blank to fill in the entry. Then we go right on tape 1 to the first nonblank. On tape 2, we go right on through one or more nonblanks and then through blanks and stop a t tzVa. This computation is repeated. The following is the configuration just before k , 4 is computed:
0,l 0,2 0,3 0,4 1,2 1,3 1,4 2,3 2,4 3,4
[ # I A ISlAI
I
IA I A I
1SI T
1 s 1 $1
0,l 0,2 1,2 0,3 1,3 2,3 0,4 1,4 2,4 3,4 / # / A IS,AIA I IA I S I I 1 1s I$]
T
PARSING OF QENERAL CONTEXT-FREE LANGUAQES
119
We compute t 2 , 4 as before and make the entry. When we go right we reach the endmarker and must move left. This yields
t 0,l 0,2 0,3 0,4 1,2 1,3 1,4 2,3 2,4 3,4
[ # ] AIS,AI
I
Ix
IA l A I
t
IS IS
I$]
0,l 0,2 1,2 0,3 1,3 2,3 0,4 1,4 2,4 3,4
l # l A 1 X,AjA
1 1A
I 1 s 1s 1 9 6 1
[ S I
T
and so it goes . . . . As is true in the random access case, the recognition matrix can be computed on-line by a Turing machine, provided the bookkeeping is done slightly differently. We leave this construction to the reader. 2.4, linear Grammars-A
Special Case
Although, as we have shown, the Cocke-Kasami-Younger algorithm has an 0 (d)time bound, it is possible to obtain better bounds for some subclasses of context-free languages. In this section, we show that we can obtain an 0 (n2)bound if the grammar has an additional property known as linearity. Definition. A context-free grammar G = ( V , 2, P , S) is linear if each rule is of the form A-+uBv or A-tu,
where A , B E N ; u,v E Z*. A language is linear if it can be generated by a linear grammar. Example. L = {ZOW' 1 w E ( a , b } * ) is a linear language which is generated by the linear grammar shown below.
S--taSa I bXb I A
The language
L' = {aWcaibj 1 i, j 2 0 ) is not a linear context-free language. A discussion of this language occurs in Greibach (1966). It turns out that we can show that Algorithm 2.1.1 works very well on linear grammars. To do this, we introduce a normal form for linear grammars.
120
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Theorem 2.4.1. Every linear context-free language is generated by a grammar G = ( V , 2 , P , 8)whose rules are in linear normal form;namely, whose rules are only of the following forms:
S-tA
if
A E L(G)
A+a
a
E Z,A E N.
Proof. Assume, without loss of generality, that G is A-free. By using Theorem 1.3, we may eliminate all “chain productions” A + B. Assume, without loss of generality, that this has been done. Suppose the original grammar has a rule A -+ a1 . . . akBbi . . . br. Replace this rule by
A + alAl Ai
+ ai+lAi+l
A k +
for 1
5i
Br-ibr
Bi+l + Bjbifl
for 1
- 1 >j2
1
BI + Bbi, where AI, . . . , A k and Btdl, . . . , B1 are new variables. Note that suitable conventions allow this transformation to have the proper values when k < 2 or I < 2. It is also clear that after applying the transformation to all offending rules, we have a grammar of the desired form. Clearly this transformation does not change the language. 0 Now we are ready to prove that every linear context-free language can be parsed in time O(n2) by a form of the Cocke-Kasami-Younger algorithm. Algorithm 2.4.1 provides a variant of the recognition matrix construction of Algorithm 2.1.1, which handles linear grammars of the form introduced in Theorem 2.4.1. (The difference between the two algorithms is only in the way in which the t i i ) are ~ computed in the second loop.) Algorithm 2.4.1. Let G = ( V , 2, P, S) be a grammar in linear normal form (without the rule S + A) and let w = a m . . . an, n 2 1 be a string, where for 1 6 i 5 n,ad E 2. Form the strictly upper triangular (n 1) X
+
121
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
+
(n 1) recognition matrix T as follows, where each element ti,,. is a subset of v - z: loop 1: loop2:
f o r i := 0 t o n - 1 do := ( A I A -+ ai+l is in P }; f o r d := 2 t o n d o for i := 0 to n - d do i; beginj := d ti,j := ( A 1 A -+ ai+J3 is in P for some B E U ( A 1 A -+ Baj is in P for some B E end.
+
ti+l,j) ti,j-~}
Theorem 2.4.2. Let G = (V,2 , P,S) be a grammar in linear normal form (without the rule S -+ A ) . Let w = ala2 . . . a,, n 2 1 be a string, where for 1 I k 5 n, ak E Z and let T = ( t i , j ) be the recognition matrix
constructed by Algorithm 2.4.1. Then A E
ti.j
if and only if A
+
+ ai+l . . . aj.
Proof. The argument parallels the proof of Theorem 2.1.2, except that in analyzing a derivation, the first step is either A + ai+lB or A -+ Baj. We do not present the details here. 0 In order to obtain our time bounds, we simply analyze Algorithm 2.4.1. Theorem 2.4.3. Every linear context-free language L can be parsed in time proportional to n2, where n is the length of the string to be parsed. Additionally every such language can be recognized using only a linear amount of space.
Proof. By Theorem 2.4.1, there is no loss of generality in assuming that L is generated by a grammar in linear normal form. Using Algorithm 2.4.1, an input string of length n is in the language if S E to,,, (or if n = 0 and S -+ A is in the grammar). Following the timing analysis of Theorem 2.1.2, loop 1 requires c1n steps. Since the setting of t i , j in loop 2 requires inspection of only two matrix entries, it takes at most a constant cz number of steps. Therefore the algorithm requires n
s n + C z ~ ( n - d + + )= s n +
- n)
= O(n2)steps
d--2
to construct the recognition matrix. Using Algorithm 2.2.1, it follows from Theorem 2.2.1 that a parse can be produced in 0 (n2)steps. Since in loop 2 each t i , j is computed using only entries from the previous diagonal, all that must be remembered for recognition are the previous diagonal (or the input) and the current diagonal. Since the diagonals have at most n elements, this is a linear space requirement. (However, it does not suffice if a parse is to be generated.) 0
122
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Just as we modified Algorithm 2.1.1 to obtain a simpler construction of the recognition matrix in the case of linear normal form grammars, so too we can modify Algorithm 2.1.2 to obtain an on-line O(n2)algorithm for linear normal form grammars which needs only linear space for recognition. In fact these time bounds can be achieved with an on-line Turing machine. Also, we can modify Algorithm 2.2.1to obtain a parse from the recognition matrix for a linear normal form grammar in linear time. We leave the details of these modifications to the reader.
3. Earley's Algorithm Introduction
In this section we develop three versions of a very useful parsing algorithm due to Earley (1968, 1970). One of the nicest features of this algorithm is that it works on any context-free grammar. I n other words, A rules are acceptable and no grammatical normal forms are necessary. The time and space bounds are 0 (n3)and 0 (n') , respectively, and the constants are small enough for the method to be practically useful. 3.1 The Recognition Algorithm
As in the Cocke-Kasami-Younger algorithm, we form an upper triangular recognition matrix.9 However, unlike the method of Section 2, the present algorithm introduccs entries on the main diagonal of the matrix. Furthermore, the matrix entries now carry more information than in the previous method. To describe the entries, we need the following notion. Definition. Let G = ( V , Z, P, S) be a context-free grammar. Let metasymbol not in V . Then for any A -+a/3 in P , where a, p E V*,A is a dotted rule.
- be a
--$
a*@
Entries in the Earlcy recognition matrix are sets of dotted rules, rather than the sets of nonterminals of the previous method. Notice that the number of distinct dotted rules is determined by the grammar. As before, we can determine membership in the language by inspecting one element of the matrix, and we can use the matrix to generate a parse. We next present 0 One usually sees Earley's algorithm described in the literature in terms of lists of items, where an item is a dotted rule together with a nonnegative integer. The item (A --t a*@,i) is on the j t h list if and only if the dotted rule A + a-19is contained in the i, j t h entry of the recognition matrix according to our formulation of the algorithm.
123
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
the first matrix construction algorithm, followed by some examples and an explanation. Algorithm 3.1.1. Let G = (V, 2, P , S ) be a context-free grammar and let w = alaz . . . a,, n 2 0 be a string, where for 1 5 k 2 n, a k C 2. Define the function PRED mapping nonterminals to set's of dotted ruIes by
PRED(A) = ( B 4 ( r e p I B -+ ap is in P , for some y E TI*, A
By,
and a & A ) The function is extended to a function PREDICT mapping sets of nonterminals to sets of dotted rules by PREDICT(X)
=
U PRED(A). A EX
The program variable SEEN has as values subsets of V - 2. The program variable TEMP takes sets of dotted rules as values. Form the upper triangular (n 1) x (n 1) recognition matrix T as follows, where each element t i , j is a set of dotted rules:
+
+
PREDICT ( {S ]) ; for j := 1 t o n do begin to,o =
SEEN := 0; for i := j - 1 downto 0 do begin
scanner : completer:
predictor :
:= ( B 3 aup.7 I B 4cu-upy is in f ; , j - I , u = a j , andp**Af; TEMP := { B3 aAp. y 1 there exists Ic, i 5 k 5 j - 1 such that B 3 a.A/3y is in ti.k, A -+ 0 - is in t k , j l and p J* A ] ; while (TEMP U t i , j ) # t ; , j do begin t i , j := t i , j U TEMP; TEMP := ( B 3 aAp.-y 1 B 4a - A p y is in ti+, A -+ 0 . is in t i , j , and p =+* A ) end; SEEN:=SEENu{A E V - ZIB+a*Apisinti,i) end; t j ,j := PREDICT (SEEN) end ti,j
124
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Example.
Consider the trivial grammar
S+AB
B
+ CDC
D-+d Let us see how the algorithm works for the only input in the language, w = d. The first statement yields 20,o =
( S 4 .AB, A
3
0 ,
S
-+
A-B,B
-+
*CDC,
C + *, B
-+
C - D C , D -+ *d)'O
The outer for loop is executed once, for j = 1. After setting SEEN to 0, the inner for loop is executed once, with i = 0. The scanner sets to,l t o { D -+ d. ) . Next the completer sets TEMP t o { B -+ C D - C , B -+ CDC. } , since C generates A. Since TEMP contains dotted rules which are not in &,I, the body of the while statement is executed. The value of to.1 becomes ( D - + d . , B-+CD-C, B-CDC.} and thc value of TEMP becomes ( B 3 C D . C,B 3 CDC. , S -+ A B - ) . Again, TEMP contains dotted rules which are not in to,l, so the body of the while statement is executed again, yielding t 0 , l = (D+ d., B 3 C D * C ,B 3 CDC., S - - + A B * ) TEMP = ( B 3 CD*C, B -+ CDC., S -+ A B . ) . Since TEMP is now contained in to.l, the while loop terminates. SEEN is then set to ( C ) and the predictor sets t l , l to { C -+ ). Since S -+ A B . is in tO.l,we shall see that d E L ( G ).
-
Example.
Let G be the grammar
E
-+
E*E I E
+
+E I a
and let w = a a * a. Notice that the string w has two parses. We present the recognition matrix and ask the reader to verify the computation. 10
Notice that we let
- concatenate with
A.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
E - + * E * E E+E**E E+E. +E E+.E+E E+aE4.a
E+E
+E
E+E + E E+E*rE E+E. +E
*
125
E+E*.E
E+E+EE+E-*E E+E- +E E+E*E-
E+E**E
E+E*E* E+E*+E EdE.f E
E+-E*E E+-E+E E+.a
E+E-*E E+E- + E E4a.
I
T-
Since E 4 E w E L(G).
+ E . is in
(and E
-+ E* E .
is in to,5),we conclude that
Just as in the on-line version of the Cocke-Kasami-Younger algorithm, Algorithm 3.1.1 fills in the Earley recognition matrix column by column, and within a column the elements are computed “bottom-to-top.” For each element, the scanner enters information dependent on the next input symbol. Subsequent additions (by the completer) to the (i,j )t h element of the matrix depend, as in the Cocke-Kasami-Younger algorithm, on a simultaneous scan across the ith row and down the j t h column, always to the left, and at or below the entry in question. Since additions to the ( i , j ) t h element depend in part on its previous value, a kind of closure operation is carried out on each element (by the while statement). After the nondiagonal elements of the column have been computed by the scanner and the completer, the diagonal element is set by the predictor. The difference in the two algorithms stems from the following observations. As we saw from Theorem 2.1.1, a variable A entered in matrix ele-
126
SUSAN L. GRAHAM AND MICHAEL A, HARRISON -t.
ment ti,j of the Cocke-Kasami-Younger matrix indicates that A 3 ai+l . . , aj. Thus the entry rccords only the first and last sentential forms of a derivation. However, as we will see in Theorem 3.2.1, an entry A -+cu.8 in matrix element t i , j of the Earley matrix indicates that A + cub & . . . a,& as well as recording other information (namely, that for some
.
E V*, S 2 a l h . . a i A y ) . Thus, in some sense, Earley’s algorithm records information about (leftmost) derivations at each step, rather than only at the end. The scanner serves to update those dotted rules (or partial subtrces) whose “expected” next element is the next input symbol. The predictor indicates which rules might possibly generate the next portion of the input. Its entries are always in the diagonal clement. The role of the completer is to update those dotted rules whose “expected” next element is a nonterminal which generates a suffix of the input read thus far. The program variable SEEN records those nonterminals immediately preceded by a dot in some dotted rule entered in the column. In Earley’s original formulation of the algorithm, the predictor adds only dotted rules having no symbol to the left of the dot and the completer moves the dot across only one symbol of a rule. Consequently if the grammar contains null rules, a number of completer and predictor steps may be necessary to make those entries in the matrix which record derivations of A. We have instead incorporated derivations of A in the scanner, predictor, and completer directly, by defining these steps so as to move the dot across sequences of symbols which generate A. This modification both simplifies and speeds up the algorithm.
y
3.2 Correctness of the Algorithm
Since the matrix elements are of bounded size, it is easily shown that Algorithm 3.1.1 terminates. The next theorem gives the correspondence between entries of the recognition matrix and derivations, thereby establishing that the matrix has the desired recognition and parsing properties. By contrasting this theorem with Theorem 2.1.1, the reader can see the relationship of the Cocke-Kasami-Younger algorithm to this one. Theorem 3.2.1. Let G = ( V , 2, P,S ) be a context-free grammar. Let ala2 , a,, n 2 0 be a string, where for 1 5 k 5 n, ak E Z, and let T = ( t i , j ) be the recognition matrix constructed by Algorithm 3.1.1. Then 20 =
A
-+
..
a - p is
in
ti,j
in V* such that S
if and only if a yA6 and y
. . . a j and there exist y and 6
ui+lai+2
alaz
. . . ai.
127
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
We can better understand the statement of the theorem by considering the intuitive picture in Fig. 8, where the derivation is represented in the form of the corresponding syntax tree. Intuitively, if A --+ a - 8is in t i , j , then we are working on a potentially valid parse in that we know that there is a sentential form yA6, where
& a l h . . . ai. Furthermore, the dotted rule A a! 3 ai+l . . . aj. We know nothing yet about 8 or 6. y
-+ a.8
indicates that
Thus dotted rules introduced by the predictor represent rules that could be used in the derivation since their left-hand sides occur in valid sentential forms. The completer causes the effect of a completed subtree t o propagate up the tree. We do not give a proof of Theorem 3.2.1 here, but we refer the reader to the proof of Theorem 4.9 in Aho and Ullman (1972-1973). The proof in Aho and Ullman (1972-1973) is incomplete in the “only if” direction, but the details can be supplied. In the “if” direction the argument is quite elegant. The algorithm presented here is somewhat different than the one in Aho and Ullman (1972-1973) and Earley (1968, 1970), but the reader can observe that rules 1, 2, 3 of Aho and Ullman (1972-1973) correspond to PREDICT ( { S ]) , rule 4 is contained in our scanner, and rule 6 is contained in our predictor. Rule 5 is contained in our completer if i < j and can be in the completer, predictor, or scanner if i = j . One can derive important corollaries from Theorem 3.2.1. Corollary.
only if S
Let G, w,T be as in Theorem 3.2.1. Then w f L ( G ) if and is in to,,, for some a! E V*.
3 a!.
Proof. By Theorem 3.2.1, S al . . . a, = w. Corollary.
.--)
a.
is in to., if and only if S
*
Earley’s algorithm is an on-line recognition algorithm. S
FIG.8. A pictorial representation of Theorem 3.2.1.
a!
&
128
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
3.3 The Time and Space Bounds
We next analyze Earley's algorithm to estimate the time and space required, as a function of the length of the input. Our analysis is much like that for the Cocke-Kasami-Younger algorithm. Of course our estimates consider only orders of magnitude. For practical applications of Earley 's algorithm, the implementer would be concerned about the constants as well, and might therefore consider other realizations of the algorithm (see, for example, Townley, 1972). Suppose we have a grammar G = ( V , 2 , P , 8).Let I P I = p and for 1 5 i 5 p, if ri = A i+ pi, define Zi = lg (pi). Then the number of different dotted rules is
+
since for every rule there are l i 1 positions for the dot. C is a constant depending only on the grammar (not on the length of the input). Consequently the number of steps to scan a matrix element is constant with respect to the length of the input. It follows that the number of steps required to update SEEN is also independent of the length of the input. Observe also that since the evaluation of PREDICT for any subset of the nonterminals depends only on the grammar, the number of steps to evaluate PREDICT ( { S 1) or PREDICT (SEEN) is also independent of the length of the input. Theorem 3.3.1. Algorithm 3.1.1 requires O(n3) steps to compute the recognition matrix.
Proof. We first analyze the number of steps to execute the body of the inner for loop for fixed values of i and j . As we saw in the proof of Theorem 1.1, it is possible to compute the set of nonterminal symbols of a grammar which generate A. By appealing to Lemma 1.1 we see that for any sequence of symbols j3,B -* A if (and only if) each symbol in the sequence generates A. Consequently the scanner requires at worst c1 steps for some constant ci depending only on the grammar. The first setting of TEMP in the completer requires c 2 ( j - i) steps, since k can take on j - i different values. The while loop can execute at most C times and takes a t most a constant cs number of steps. The setting or updating of SEEN requires at worst some constant c4 steps. Thus the inner for loop requires at most c1 ~ (- i) j c3C c4 steps. The setting of to,orequires at most c6 steps and
+
+
+
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
129
the predictor requires at worst c6 steps, where c6 and c6 are constants. The inner for loop is executed j times; the outer for loop is executed n times. Combining all these results, the algorithm requires at most
= c7
+ can + cen2 +
for constants c7, ca, c9, and
1-1
i-0
CIO.
c10
cC n
i-1
i-1
i-0
( j - i!
steps
But
i-1
- (n
k-1
+ 2) (n + 1)n
=
o(n3)*
6 Consequently the algorithm requires at most 0 (n3) steps.
0
Since each element in the upper triangular matrix is of bounded size, as is the auxiliary program variable SEEN, the next theorem follows easily. Theorem 3.3.2. The space required by Earley’s algorithm to process a string of length n is of the order n2.
The reader may enjoy trying the algorithm on the grammar S + SS I a with the input w = an.Time proportional to n3 is actually needed for this example. We leave the determination of space requirements to the reader. It is possible to improve Algorithm 3.1.1 by taking a more careful look at the completer. Notice that in the first example following Algorithm 3.1.1, the first computation by the completer adds B -+ CDC. to to,^. It is this dotted rule which causes the first execution of the body of the while statement to add an additional new dotted rule (namely, S -+ A B - ) to t o , l . The reason is that since C =+A, “completing” D causes “completion” of B. By a modification of the techniques used in the proofs of Theorems 1.1-1.3, one can compute the relation A =$B for all nonterminals A , B in any context-free grammar. Using the results of this computation, the completer can be modified to incorporate chains of completions such as that of
130
S U S A N L. GRAHAM AND MICHAEL A. HARRISON
the example. We make that modification in Algorithm 3.3.1. Algorithm 3.3.1 produces the same matrix as Algorithm 3.1.1. However, we do not present the correctncss proof here. By making the appropriate modifications to the proof of Theorem 3.3.1, the reader can observe that Algorithm 3.3.1 also requires O(n3)steps in the worst case. Clearly the space required is again 0 (~2).By comparing Algorithm 3.3.1 with Algorithm 2.1.2 the reader can get a good understanding of the similarities and differences between the Cocke-Kasami-Younger recognition method and the method due to Earley. It is a straightforward matter to simplify Algorithm 3.3.1 in the case that the grammar G contains neither null rules nor chain rules. Algorithm 3.3.1. Let G , w, T, SEEN, and PREDICT be as in Algorithm 3.3.1. Form the recognition matrix T as follows:
:= P R E D I C T ( ( S ) ) ; for j := 1 to n. do
t0.o
scanner :
begin SEEN := 0; for i := j - 1 downto 0 do begin t i , j := { B + aap-y I B + a-apy is in t+.l, andp;
completer:
a
= aj,
A};
( B -+ a A p * y [ there exists k, i + 1 5 k 5 j - 1 such that B + a - A p y is in
t i , j := t i , j U
ti,k,
A
+u
s
is in t k , j ,and p
1
&. A } ;
t i , j := ti,j u ( B + a d p a y B -+ a . A p y is in ti,ir
p
predictor:
A, and for some C in V - Z such that
A 5 C , C --, U S is in ti,j) SEEN := SEEN U { A E V in t i , j ) end; ti,j := PREDICT (SEEN) end
- Z I B + a.A@is
Theorem 3.3.1 analyzes the time complexity of Earley’s algorithm in the most general case. For certain special cases, the algorithm requires less time. One such special case is an unambiguous grammar. Notice that the only part of Algorithm 3.1.1 which requires time proportional to jzper column, or time proportional to n3 overall is the completer. Since the number of dotted rules in the column is proportional to j, the time bound must
131
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
stem either from unnecessary computation or from adding particular dotted ruIes to particular matrix elements in more than one way. We show in Lemma 3.3.1 that for unambiguous grammars, any matrix entry is generated only in one way by the algorithm. We then show that b y modifying the algorithm, we can form the recognition matrix for a n unambiguous grammar in 0 (nz) steps. For convenience, we use the following definition: in carrying out Algorithm 3.1.1 for some grammar and input, let 0 5 i L k 5 j 5 n, where the use of i,j, k is as in the algorithm, and for some rules B -+ a A p y and A 3 u, B -+ a-Apy is in t i , k and A + u . is in t k , j . Then we say that the addition of B -+ a A p . y to ti,j by the completer depends o n the pair (A, k) . Lemma 3.3.1. Let G = ( V , 2, P , S ) be a reduced unambiguous context-free grammar. Let w = a l e . . . a,, n 2 0, be a string, where for 1 5 k 5 n, a k E z. I n executing Algorithm 3.1.1, no dotted rule added to any matrix element ti,i by the completer depends on more than one pair ( A , k ) foranyA E V - ZandO 5 k5 j.
Proof. Let B -+cw.y be a dotted rule and let ti,j be a matrix element. The completer adds dotted rules to t i , j by inspecting entries t i , k and t k , j for values of k between i and j - 1. Since the while loop is executed a constant-bounded number of times, each pair of entries is inspected a, number of times independent of n. Suppose the addition of B + ar.y depends on (A1, k) and also on (A2, Z), where AI, A2 E V - I: and either A1 # AZ or k # 1. Then for
al,
LYZ,
PI, pz, ul, uz E V* such that
01
& A,
8 2 A, and a = ar1Alpl = a2A2p2,t k , j must contain a dotted rule A1 + u1. and t2.i must contain a dotted rule A2 -+ uz.. Furthermore, t i , k must contain B -+ arl.A&y and t;,l must contain B -+ a~-A~/32y. By Theorem 3.2.1, there exist el and ez such that for some w E Z* S & &BBz2 alaz , . . aiB&
* alaz . . . aialAlplyOz
*
=+a l e . .
. a,w
132
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
and also S
elB&& alaz . . . aiBOz
* =+a1a2 . . . aia;+l . . . alal+l . . . aj&7&
* * alaz . . . ajw, But we have just exhibited two distinct derivations of the same string, contradicting the unambiguity of G. We can conclude that for unambiguous grammars, the addition of each dotted rule to each t ; , j depends on only one pair ( A , k). 0 lemma 3.3.2. Let G = ( V , Z, P , 8) be a reduced unambiguous context-free grammar. Let w = alaz . . . a,, n 0, be a string, where for 1 k 5 n, ak E 2. I n executing Algorithm 3.1.1, no dotted rule is added to any matrix element t i , j in more than one way."
>
<
Proof. Let B 3 cr.8 be any dotted rule and let t i , j be any matrix entry to which it is added. If i = j, then the dotted rule is added to t ; , j only by the predictor, which is executed once per column. Suppose i # j. Then the dotted rule can be added to t i , , only by the scanner or by the completer. We claim that no dotted rule is added to ti.j by both the scanner and the completer. Suppose to the contrary that some dotted rule B -+ a.7 is added to t i , j by both the scanner and the completer. Then the rule must have the form a = cu'~glA@~
where B 3 cr'.a@lA@2r is in ti.i-1 and for some k, i < k < j , B -+ a'aBr.A&r is in t i , k and A + ( r e is in t k , j . In a fashion analogous to the proof of Lemma 3.3.1 one shows that this is possible only if G is ambiguous. Therefore we conclude that the scanner and the completer are disjoint in their effects. Since the scanner is executed once for each t i , j , the lemma is immediate in that case. For the completer, the lemma follows from Lemma 3.3.1. 0 11 Lemma 4.6 of Ah0 and TJllman (1972-1973) states a simiiar result, restricted to the case that u # A. However, the proof given there haa minor errors.
133
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Notice that in the proof of Lemma 3.3.1 and 3.3.2 we need not consider ambiguities which cause a matrix element t k , j to have two entries A 6and A -+ u where 6 # u. Consequently the lemma is true as well for ambiguous grammars in which all ambiguities are of this kind (that is, grammars which are structurally unambiguous). In Algorithm 3.1.1, the addition of a dotted rule t,o t i , j by the complcter can come from a simultaneous search across the ith row and down the j t h column. It is this process which results in O ( j z ) steps per column of the matrix. Since, in the case of an unambiguous grammar, a t most C of those pairs of entries can yield an entry in t i , j , one can obtain a faster algorit,hm by inspecting only those pairs of entries which add dotted rules to t i , j . In Algorithm 3.3.2 we present a modified forx of Algorithm 3.1.1 which achieves this result. The modifications are primarily to the completer. However, wc must also introduce additional bookkeeping which collects those entries in a column having dotted rules with the same nonterminal immediately following the dot and which keeps track of those entries to the current column that in-.-olve rules with dots at the end. Also, we execute the scanner for each element of the column before we do any of the completer steps. --f
a ,
Algorithm 3.3.2. Let G, w,T, SEEN, TEMP, and PREDICT be as in Algorithm 3.1.1. Let PENDING and DOTBEFORE be additional program variables, where PENDING is a set of ordered pairs ( A , i) where A E V - Z and 0 S i 5 n. For 0 5 j L. n and A in V - 2 , DOTBEFORE/ is a set of row indices, each between 0 and j - 1. Form the recognition matrix T as follows:
scanner:
to,o := PREDICT((X}); for each A in V - 2 do DOTBEFORE# := ( i I B --f a.AP is in t i , o }; f o r j := 1 t o n do begin PENDING := 0; for i := j - 1 downto 0 do begin ti,i := { B --+ aup.7 I B + cu.upy is in ti,j-l, a
=
andP&A}; for each C in V - Z do if C 3 U - is in t i , j then PENDING := PENDING U { (C, i ) ) end;
ai,
134
completer:
predictor:
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
while PENDING # 0 do begin remove any ( A , Ic) from PENDING; for each i in DOTBEFOREkA do begin TEMP := { B -+ a A @ . y I B + a . A @ yis in t i , k and /3 A); €or each C in V - do if C 3 u - is in TEMP and there is no a E V* such that C -+ a - is in t i , j then PENDING := PENDING U { (C, i) } ; t i , j := t i , j U TEMP end end; SEEN := { A E V - 2 1 there exists i, 0 5 i < j such that B --f a . A @ is in t i , j ] ; t j , j := PREDICT (SEEN) €or each A in V - I: do DOTBEFOREjA := { i [ 0 5 i 5 j and B 3 a *A@ is in t i , j } end.
The scanner and predictor portions of Algorithm 3.3.2 are the same as those parts of Algorithm 3.1.1, except for some additional bookkeeping and the fact that the scanner is executed for the entire column before any completer steps are carried out. The way the completer works is as follows. Each addition by the completer to a matrix element t i , j arises because for some k , i 5 k < j , t i , k contains a dotted rule with a dot directly preceding a nonterminal, call it A , and t k . j contains a dotted rule of the form A 3 u-. Since the matrix is generated column by column, the entries in previous columns remain fixcd. Consequently, by maintaining in PENDING a record of entries of the form A 3 u. made in the current it is possible to make subsequent entries based on that one without searching the current column. When we later “process” an entry A 3 urn made in elcment t j , j , we can avoid searching the ith column for entries with a dot before an A by inspecting the set DOTBEFORE,” of those elements of column i containing dotted rules with a dot before A . (In the original version of the algorithm given by Earley, this strategy can fail when we are processing an entry A + u - made in element t j , j , that is, when the column being searched for entries with a dot immediately preceding A is the column 12
It is only the A which is relevant for subsequent processing.
135
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
currently being computed. However, by generalizing the predictor, we have circumvented this difficulty.) We leave it to the reader to verify that for a given grammar and input, Algorithms 3.1.1 and 3.3.2 yield the same matrix. I n the following theorem we examine the time bound for our improved algorithm. Theorem 3.3.3. Algorithm 3.3.2 requires a t most O(n3) steps to compute the recognition matrix. If G is (structurally) unambiguous, then the algorithm requires at most 0 (n2)steps, where n is the length of the string to be processed.
Proof. It is easily shown that the setting of to,o, the computation of DOTBEFOREoAfor all A in V - 2 , and “bookkeeping” statements of the form €or each C in V - 2 do . . . each requires at most a constant amount of time. The setting of SEEN in the predictor and the computation of sets DOTBEFOREjA require time proportional to the number of matrix elements. Consequently the time required for all of the algorithm except for the completer is proportional to the number of elements in the matrix (0(n’)) . It remains to analyze the time required by the completer. Suppose the value of j is fixed. The scanner adds entries to PENDING once for each i and C, 0 5 i 5 j - 1. The completer adds pairs (C, i ) to PENDING only the first time that a “completion” C + u. is added to ti,+ Consequently no pair (C, i) is placed in PENDING more than once. Sincc the number of nonterminals is fixed and since i ranges between 0 and j - 1, the body of the while loop for the completer is executed at most 1 V - Z 1 j times. For fixed A and k , DOTBEFOREkd contains at most k 1 values. Since k 5 j - 1, the body of the for loop which cycles through the values in DOTBEFOREkA is executed at most j times. The body of that loop requires at most a constant amount of time. Consequently the completer requires at most 0 ( j 2 ) steps. Hence Algorithm 3.3.2 requires at most O(n3)steps. Suppose the grammar is (structurally) unambiguous. Let j be fixed. The reader can easily verify the following three claims:
+
Claim 1.
and k
< j.
Claim 2.
( A , k ) is in PENDING only if for some u, A
-+ u
s
is in
tk,j
( A , k ) is in PENDING only if DOTBEFOREkA # $3.
Claim 3. For fixed A , k, if DOTBEFOREkA # 8, then dotted rule with a dot immediately preceding A .
ti,k
contains a
Claim 4. For each ( A , k) removed from PENDING and each i in DOTBEFOREjA,a dotted rule is added to t i , j which was not already there.
136
SUSAN L. QRAHAM AND MICHAEL A. HARRISON
Proof. By Claim 1,A --+ u is in t k , j . By Claims 2 and 3, B + a.A& is in Consequently B --+ crAp.7 is placed in TEMP and subsequently added to ti+ By Lemma 3.3.1, B 4 aA0.y depends on at most one ( A , k). Since each ( A , k)is llprocessed” only once, and since by Lemma 3.3.2 the actions of the scanner and the completer are disjoint, B -+ crAp-y could not be added to t i , , more than once. ti,k.
It follows from Claim 4 that for each ( A , k) in PENDING and for each value taken on by i for that ( A , k ) , a “new” dotted rule is added to the j t h matrix column. Each matrix clement has at most C entries. Consequently at most C ( j 1) entries are made in thejth column. Summing up, the number of different triples of values taken on by A , k , and i is proportional to the number of entries in the j t h column. That is, the completer requires 0 (j) steps. It follows that Algorithm 3.3.2 requires at most 0 (n2) steps for a (structurally) unambiguous grammar. 0
+
The reader should observe that in Algorithm 3.3.2, the values of PENDING and DOTBEFORE and the number of elements in DOTBEFORE can be as large as n, the length of the input. In the previous algorithms, the values of the recognition matrix and the auxiliary program variables depended only on the size of the grammar. It is not known whether 0 (n’)can be achieved if all values must be independent of n. The class of linear grammars constitutes another special case in which the time bound can be reduced. The reader can verify that if G is a linear grammar, a dotted rule B + aA.0 in t i , j potentially depends on at most one pair ( A , k), where k = i + lg(a). Additionally (if i < j),B --f aA.8 can be added to t,,, at most once by the scanner, in the case that B -+ a’.aA@is in t i , j - I , a = a j , and A & h, where a = a’a. Consequently Algorithm 3.3.2 requires 0 (n’)steps. The reader can also verify that there exists a (right) linear grammar which requires time and space proportional to n2 under Earlcy’s algorithm. For any class of grammars, if it can be shown that the number of nonnull elements per column is independent of the column index (that is, the value of j),then by maintaining sets of indices for terminals preceded by dots in the same way that Algorithm 3.3.2 maintains such sets for nonterminals, a linear bound can be achieved.
3.4 The Parsing Algorithm
The algorithm that follows provides a means of obtaining a rightmost parse for a given string from the Earley recognition matrix. The algorithm is similar to that used for the Cocke-Kasami-Younger recognition matrix.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
137
Algorithm 3.4.1. Let G = ( V , 8,P , S) be a cycle-free context-free grammar with the productions designated HI, m,. . . , rP.Let w = a m . . . an, n 2 0 be a string, where for 1 5 i 5 n, ai E Z and let T = (ti,i) be the recognition matrix for w constructed by Algorithm 3.1.1. Define the recursive procedure PARSE (i,j , r,) that generates a right
parse for A =+ a
a , , ~. . . a,, where A, = A -+ a, as follows.
procedure PARSE ( i , j 1 r , ~ A - + A 1 A z.. .Ap,,,); begin output ( m ) ;j o := j ; for 1 := p , downto 1 do ifAl E 8 thenjo := j o - 1 else if k is thc greatest integer i 5 k 5 j o such that for some a E V*, A l + a - is in t k & and A --+ AIAz. . . Al-l-AI . . . A, is in t i , k then begin PARSE ( k ,j ~At, --+ a) ; j o := k ; end; end; Output a right parsc for w as follows: main program:
Example.
if for some a E V*, S + a. is in to,n then PARSE(0, n, S -+ a ) else output (“ERROR”).
Consider thc second example of Section 3.1; with grammar
1. E-+E*E 2. E --+ E+E 3. E + a and string w = a+a*a. Since E -+ E+E- is in t O , S , PARSE(Ol5,m= E - + E + E ) is called, yielding (2,PARSE (2,5,r1= E-+E*E),PARSE (OJ, 7r3 = E-ra) ). PARSE (2,5,1r2=E-+E*E) generates (1,PARSE (4,5,m = E - + a ,PARSE(2,3,ra=E--+a)) ) which yields (1,3,3). PARSE (O,l,?ra= E+a) yields 3. Thus the output of the algorithm is (2,1,3,3,3). If the grammar contains cycles, then Algorithm 3.4.1 may not terminate. However, if we modify the algorithms so that in the case that some t k , j o contains two or more dotted rules A l + a - and A l + y - we choose the dotted rule that was entered first by Algorithm 3.1.1, it can be shown that Algorithm 3.4.1 always terminates. We leave these details to the reader.
138
SUSAN L. GRAHAM AND MICHAEL A. HARRIBON
Theorem 3.4.1. Let G = ( V , I;, P,S) be a cycle-free context-free grammar with productions designated TI, m,. . . , rp. If Algorithm 3.4.1 is e x e cuted with the string w = a m . . . a,, n 1 0, where, for 1 5 i 5 n,ai E I;, then the algorithm terminates. If w E L (G) then the algorithm produces a right parse; otherwise it announces error. The algorithm takes time 0 (+I.
Proof. The proof follows that of Theorem 2.2.1. We outline the proof, giving only the details that differ significantly. Claim 1.
A
Whenever PARSE(i,j,.rr,=A-m) is called, then j
2i
and
-+a*is in t t , j .
Claim 2. In the sequence of calls on PARSE during the execution of generates Algorithm 3.4.1, no call PARSE(i,j,rm=A-+A1A2 . . . A,) another call PARSE (i,j1~m=A+A1A2 . . A,).
.
.
Proof. Given a call PARSE(i,j,rm=A-+AIAz . . A%), each call PARSE (k,j o , At -+ a) directly generated has the property that i 5 Ic Ij~ and j , 5 j. It follows inductively that this property (namely, that successive values of j are nonincreasing and, for fixed values of j , successive values of i are nondecreasing) is true of the sequence of calls generated by PARSE (i,j,rm=A-+AIAz . . . A,). Suppose the call PARSE(i,jj*m= A t A l A z . . . A,) directly generates the call PARSE(i,j,r,=B+BlBz . . . Bp,).It follows that for some I , 1 S I _< pm, At = B and that A + AlAz . . . Az-lhA~ . . . A, is in It follows from Claim 1 and Theorem 3.2.1 that
A
3
+
3
A l A z . . . Al-lAr.. . Apm AiAa
. . Az-lai+i. . . ajAt+1.
*
A,,
+ Ui+l . . . aj *
+
+
+
. . A, * A. Hence A * B. and that AIAz . . . =$ A and Inductively, if PARSE(i,j,?r,=A-+AIAz , . . A,) generates a call with the same arguments, then A free. Claim 3.
+
3
A , contradicting the fact that G is cycle-
PARSE (i,j,rm = A-a)
terminates and produces a right parse
of A ==+a & ai+l . . . a*
Proof. Consider the sequence of calls on PARSE during the execution of Algorithm 3.4.1. It follows from the argument in Claim 2 that the sequence of values for j is nonincreasing and that for fixed j the sequcnce of values for i is nondecreasing. It follows from Claim 1 that 0 2 j and 0 5 i.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
139
Consequently we conclude from Claim 2 that the number of calls is finite. Clearly, if PARSE does not make a call, then it terminates. Therefore the algorithm terminates. The production of a right parse then follows by a straightforward induction on the number of calls on PARSE. Claim 4. A call of PARSE(i,j,n,=A+a) for some constant c if j > i and c steps if j
takes at most c ( j - i ) 2steps =
i.
Pr0of.1~ We first analyze the number of steps required by the call on PARSE, excluding the recursive calls. The for loop is executed at most q times, where q is the length of the longest right-hand side of a production. For each execution of the for loop, if A z E I: the number of steps is c3 for some constant c3. If A ZE N , the number of steps excluding recursive calls is a t most c4( j o - k 1) for some constant c4. Since i 5 k and j o 5 j for all values of k and j~ during the call, the total number of steps excluding recursive calls is at most qcs ( j- i 1) where c5 = max (c1, cd) . Since each recursive call has arguments j ~k ,such that i I k and j o 5 j, each recursive call requires at most qcs(j - i 1) steps, excluding its recursive calls.
+
+
+
By Claim 3, PARSE (i,j,7rm= A+a) produces a right parse of A + a 2 a i + l . . a+ Let p be the length of that derivation. By Theorem 1.6, if j = i, then p I co and if j > i, then p I c l ( j - i) cz, where co, c l , and ez are constants. Since a production index is printed when and only when PARSE is called, this call on PARSE generates at most p recursive calls. Combining results, the total number of steps for the call on PARSE is at most
.
coqc5
and
+
qcs(j - i 1) ( s ( j - i) Simplifying Eq. (3.2) yields
qcs(j - i
if j
=
- cz)
(3.1)
i if j
> i.
(3.2)
+ 1) (Cl(j - i) - ez) = QCS(Cl(j- i ) 2 + (c1 - ez) (j- i) I qcs(c1(j - i ) 2+ Cl(j - i))
CP)
< 2qcsc1(j - i ) 2 . Letting C = max ( C O ~ C ~2qcsc1) , completes the claim.
0
Further refinements can be made to Earley’s algorithm (see, for example, Bouckaert et al., 1973; Hotz, 1974; Pager, 1972). The formulation given here lends itself to a technique which results in an n3Turing machine implementation. There is yet another version of the algorithm which works in time proportional t,o n3/10g n. I*
The outline of this proof is due to Ralph Merkle.
140
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
4. Valiant’s Algorithm Introduction
In the previous sections, we have seen two algorithms which can recognize and parse context-free languages in time proportional to n3. Both algorithms were practical in that they could be implemented reasonably efficiently, and indeed this has been done. We are now about to give a procedure due to Valiant (1975) that will result in a time bound proportional to n2.81, To achieve this, we will resort to techniques that are asymptotically superior but result in such huge constants that this method is of only theoretical interest. The importance of this result should not be discounted. It is likely to suggest practical new algorithms which are more efficient. Our approach to the algorithm is different from what can be found in the literature. At the heart of the method is a lemma which is not proven in detail in Valiant (1975). We give a detailed proof here and carefully determine which axioms are used. This allows us to use the lemma in other contexts and to prove a number of new results which will be reported elsewhere. We first give an overview of the ideas of the algorithm. Let G = ( V , Z, P, S ) be a context-free grammar in Chomsky normal form. Suppose w is a string of length n to be parsed. Our algorithm produces the same recognition matrix as Algorithm 2.1.1, but the computation proceeds in quite a different way. We form the same starting matrix as in the CockeKasami-Younger algorithm. We note that one can naturally define a (nonassociative) product operation on nonterminals of the grammar. In a properly generalized sense, the recognition matrix can be found by taking the “transitive closure” of the starting matrix. It turns out that one can do this efficiently if one can take the “product” of two upper triangular matrices efficiently, One can reduce the computation of this product to the computation of ordinary boolean matrix products. This, in turn, reduces to studying efficient methods for computing ordinary matrix products. The subsequent sections are organized as follows. In Section 4.1 we formulate the computation of the recognition matrix as a transitive closure problem. In Section 4.2 the matrix multiplication techniques of Winograd (see Fischer and Probert, 1974) and Strassen (1969) are given, as well as techniques for dealing with boolean matrices. Section 4.3 is devoted to a proof of Valiant’s lemma. The proof requires a good deal of background material needed to talk about nonassociative products. I n Section 4.4 we present the transitive closure algorithm, demonstrate that it works, and analyze its time bound. We are then able, in Section 4.5, to obtain our time bound for context-free recognition.
141
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
4.1 Recognition as a Transitive Closure Problem
In this section we present another method, due to Valiant (1975) for computing the Cocke-Kasami-Younger matrix. The computation is a form of transitive closure and has more general applicability than just to context-free language recognition (for general background, see Fischer and Meyer, 1971 ; Munro, 1971). Wherever possible, we express the computation in its more general form. We first define our more genera1 notion of transitive closure. Given any binary operation on a set S we can extend that operation to matrices whose elements are subsets of S in a natural way. Suppose A = ( u ; , ~ )B, = ( b i , j ) ,and C = (ci,j) are n X n matrices'* whose elements are subsets of S. Then we define C = A*B by n-1
c ; , ~= U ai,k*bk,j
for 0
5 i,j I n - 1.
k-0
Also, by definition, A E: B if for all i and j, a;.j E bi,j. Of course, we define C = A UBby for 0 5 i,j 2 n - 1. c L , j= ai,?u bi,j Using any product operation for sets and ordinary set union, we define a transitive closure operation on matrices whose elements are sets. Definition. Let D be an
set S and let i > 1 let
TZ
X
TZ
matrix whose elements are subsets of a D(I) = D and for each
* be any binary operation on S. Let i-1
D(i) =
D(j)*D(i-i). j-1
. .
The transitive closure of D is defined to be D+ = D(I) U D@)U . . We say that a matrix D is transitively closed if Df = D. Since the underlying binary operation may be nonassociative, as is the operation to be used for constructing the recognition matrix, we have defined DCi) so as to include all possible associations. Although we write Df = D(I) U D(2) U
. . . U D($)U . . .
as an infinite union, if S is finite then the number of unions is always finite, as the number of possible entries is finite and so there are only finitely many such matrices. We leave it to the reader to find an upper bound for t in this case so that D+ = D(l) U . . . U D(t). l4
Recall our convention of O-origin addressing for matrices.
142 Example.
table:
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Let S = (0, 1) and define a binary product by the following
Let D be any matrix over S. The transitive closure of D is the "standard transitive closure" treated in Fischer and Meyer (1971) and Munro (1971). In the usual fashion, D describes a binary relation p, and D+ corresponds in the same way to the transitive closure p+ of p. The heart of the algorithm for constructing the recognition matrix is the following binary operation on subsets of the nonterminals of a grammar. Definition. Let G = (V, 2, P, S ) be a context-free grammar in Chomsky normal form. Let N1, N2 C V - Z. Define Nl.N2= ( C C V - Z I C - + A B i s i n P forsome A E N 1 , B E N z ) . Example.
Let G contain the following rules: C-AB D+CE E-AF
There may be other rules as well but these need not concern us. Take N1 = { A ] ,Nz = ( B ) ,Na = ( E l . Note that Nl(NaN8) = { E ) # ( D )
=
(NlN2)NS.
This establishes that this product of subsets is not associative. Using the product operation just defined, we claim that the following algorithm computes the Cocke-Kasami-Younger recognition matrix. Notice that we have not specified in detail how to compute D+. We leave the details for subsequent sections. Algorithm 4.1 .l. Let G = (V, Z, P, S) be a grammar in Chomsky normal form (without the rule S -t A) and let w = alaz . . a,, n 2 1 be a string, where for 1 lc n, ak E 2. 1) recognition Form the strictly upper triangular (n 1) X (n matrix T as follows. Let D be a strictly upper triangular (n 1) X (n 1)
< <
.
+
+ +
+
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
143
- 2 and initially the value
matrix, where each element di,j is a subset of V of each element is 0.
f o r i := O t o n - 1 do di,i+l := ( A I A 3 ai+l is in P); T := D+
To show that the algorithm works, we establish the following result. Theorem 4.1.1. Let G = ( V ,2, P , S) be a context-free grammar in Chomsky normal form (without the rule S -+ A) and let w = u1u2 . . . a,, n 1 1 be a string, where for 1 5 k 5 n, ak E 2. Let D+ be the matrix constructed by Algorithm 4.1.1 and Iet T be the CockcKasami-Younger recognition matrix. Then D+ = T.
Proof. We first prove that the only nonempty elements of D ( d ) are along the dth diagonal. Claim.
In Dcd)we have d?). 883
=
r
ti,j
if j - i = d
0
otherwise
Proof of Claim. The argument is an induction on d = j of the induction, namely d = j - i = 1, is immediate. Induction Step: Consider forming the product $if11 =
d*,j
D(dfl).
- i. The basis
We have
d n-1 (J
u d!e)% , k.d(d+l-#). k,j
s-1
B-0
It follows from the property that for any subset of variables S , S - 0 = 0 - S = 0, that the only way we can get a nonzero entry is if there exists some s, 1 5 s 2 d and some k, 0 _< k < n so that d:', Z 0 and ditT1-a) Z 0. It follows from the induction hypothesis that in such a case k-i=s
and
j-k=d+l--s.
Adding the equations produces
- i = d + 1, = 0 except if j - i = d + 1. If j - i = d -+ 1 it j
which shows that d?' follows from the definition of the product operation and the induction hypothesis that d:?)
-
I
( A A -+ BC where for some k, 0 I < n, B E
tijkj
C E h,j} = ti,j.
144
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
This completes the proof of the claim. Next, note that as a corollary of the Claim, =
D(d)
if d
0
> n. Therefore Df =
D(1)
U D(*) U
. . . U D(n)
= T.
0
Corollary 1 . B E d;tl if and only if
+
B =+ai+l . . . aj Proof. d t j =
ti,j
and we invoke Theorem 2.1.1.
Corollary 2. w E L ( G ) if and only if S E
0lAl
0 D =
den.
0
0
0 -
(A1
0
0
0
L41
0
0
(BI
*
0 One can compute that
D+ =
Since we have established that Algorithm 4.1.1 yields the CockeKasami-Younger recognition matrix, we can of course use the parsing
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
145
algorithm of Section 2 for this matrix. Furthermore, using the proof of Theorem 4.1.1 as a guide the reader can verify that a straightforward computation of D+ which takes advantage of the fact that only the product of nonempty sets yields a nonempty set causes Algorithm 4.1.1 to be stepwiseequivalent to Algorithm 2.1.1. Consequently it is easily shown that Algorithm 4.1.1 can be carried out in O(n3) steps. It remains to show that, when viewed as a transitive closure problem, the time bound for this computation can be further improved. 4.2 Strassen's Algorithm and Boolean Matrix Multiplication
We now begin the development of a fast method for computing the transitive closure defined in the previous section. The time bound for computing the transitive closure will depend on the time bound for matrix multiplication. In this section we present a method for matrix multiplication due to Strassen and Winograd.ls We then show that, the method can be extended to more general binary operators (such as the one used in constructing the recognition matrix) via boolean matrix multiplication. Let us first consider multiplication of mat,rices over an arbitrary ring.18 The key idea is to study the 2 X 2 case first and then to reduce the general case to it,. lemma 4.2.1. [Winograd (see Fischer and Probert (1974) ; Strassen (1969)l. The product of two 2 X 2 matrices whose elements are from any ring can be computed with 7 multiplications and 15 additions.
Proof. Let
The following computation computes the cij. Each si term involves one addition and each m iinvolves a single multiplication. ml
=
a&,l
- &,I s3 = a2,2- 81 SI = a1,l
m2 =
a1,2b2,1
+
82
= aZ.1
84
= a1,2 - s3
a2,2
Strassen (1969) was the first to find a 7 multiplicationscheme for multiplying 2 X 2 matrices. The scheme given here is due to Winograd (see Fischer and Probert, 1974) and uses 7 multiplicationsbut fewer additions. Recall that a ring includes additive inverses and an additive identity, that addition is commutative, that both addition and multiplicationare associative,and that multiplication distributes over addition.
146
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
ez.2
=
815
=
s13
+ m5.
It is a straightforward task to verify t.hat these identities correctly com-
0
pute ci,j.
Now we sha.11treat general matrix multiplication in terms of the 2 X 2 case. Theorem 4.2.1. Two n X n matrices whose elements come from an arbitrary ring can be multiplied in time proportional to n2.81.
Proof. Let us consider the case n = 2k. Let us write the equation C
=
A*B involving n X n matrices as
where each C;.i, Ai,j, and Bi,j is an ( 4 2 ) X ( 4 2 ) matrix. Thus we may regard A, B, and C as 2 X 2 matrices over a different ring. It is this observation which will allow us to use Lemma 4.2.1. Let M ( n ) be the number of scalar multiplications necessary to multiply two n X n matrices. Let A ( n ) be the number of scalar additions necessary to multiply two n x n matrices. Let T (n) = M ( n ) A (n).Notice that n2 scalar additions are needed to add two n X n matrices. Using Eq. (4.1) and Lemma 4.2.1, we see that seven multiplications of ( 4 2 ) X ( 4 2 ) matrices and 15 additions of ( 4 2 ) X ( 4 2 ) matrices are needed; that is,
+
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
147
Thus
M(n)
=
(3
7M -
and 2
A(n)
=
7A(:) 4-15(5)
Using the initial conditions M (2) = 7 and A (2) = 15, we conclude that if n = 2k, M ( n ) = 7k
A (n) = 5 ( 7 k - 4 9 and
T ( n ) = 6*7k- 5*4k as the reader may easily verify by induction on k. Note that k = logzn, so that Eq. (4.2) is
M (n) = 7 k
= 7 1 0 = ~ n l o ~ t *= n2.81.
Also observe that
T ( n ) = 6*n2.81 -5
< 6.n2.81.
(4.3)
Note that if n is not a power of 2, one can pad the matrices out t o the next larger power of 2. This can (at worst) double n, which involves increasing the constants by 6 in Eq. (4.3). 0 We could instead assume in Theorem 4.2.1 that n = 2“h, where h is odd. It is then possible to derive an expression of the form
T ( n ) = c.nZ.81, where G < 6. We leave this refinement for the reader. Thus we have an 0 (n””) time bound for multiplication of n X n matrices over a ring. However, the product operation of a ring is associative, but the operation we are using to construct the recognition matrix is not. Consequently we extend Strassen’s result to multiplication of boolean matrices and then reduce our product operation to boolean products. Boolean matrices are n X n matrices whose elements are in the set (0,1).The associated scalar operations and *, given in Fig. 9, satisfy all the properties of a ring except that the element 1 has no additive inverse. As usual, we define matrix multiplication by the “sum of products” rule.
+
148
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
+
0
1
0 1
0 1
1 1
‘ 0 -
1
0
0
0
1
0
1
FIG.9. Tables for boolean addition and multiplication. Example
Since we do not have a ring, we cannot apply our technique for fast multiplication to boolean matrices directly. Nevertheless, as the next theorem indicates, we can multiply boolean matrices within the samc time b0und.l’ Theorem 4.2.2. The product of two n X n boolean matrices can be computed in time proportional to n2.81 steps.
Proof. Given two n X n boolean matrices A and B, compute C = A*B by the algorithm given in Theorem 4.2.1 using Zn+l,the integers modulo (n l ) , as the ground ring. Notice that for 1 5 i,j 5 n, 0 5 ci,j 5 n. Let D be the boolean product of A and B.
+
Claim.
di,j
= 1 if and only if 1
5 c i , j 5 n.
Proof of Claim. If & , j = 0 then there is no k such that a i . k b k . j = 0 and hence c i , j = 0. If d i , j = 1 then for some value of k , ai,k = 1 and b k , j = 1. Therefore 1 5 c i , j 5 n and both the claim and the t,heoremfollow. 0 We next give an algorithm which reduces matrix multiplication of a very general kind to boolean matrix multiplication. Let S be any set and let * be any binary operation on subsets of S. We demand only that * satisfy the following distributive laws. For any X , Y , Z C S X * ( Y u 2) = X*Y u x*z (D1)
(X u Y)*Z = x*z u Y*Z. (D2) Suppose S = { A 1 ,Az, . . . , A } .Let B = ( b i . j ) be a matrix whose elements are subsets of S. Define a set of boolean matrices Bi, &, . . . , B,, where for 1 5 k 5 s the i, j t h element of Bk is 1 if Ak E bi,j and 0 otherwise. In fact, W. L. Ruaao has recently found a method for doing boolean matrix multiplication which is faster than any presently known method for doing matrix multiplication over an arbitrary ring.
LANGUAGES
PARSING OF GENERAL CONTEXT-FREE
Example.
149
Let S = (All A21 and n = 3.
'0 B =
Then
{o B1=
0
0
(A21
0
0
0
. /
'0 0 1\
1 1)
0 0 0
,o
( A I } (AI,
and B2
=
0 0,
0 0 1
,o
.
0 0,
Next we give an algorithm for multiplication. Algorithm 4.2.1.
subsets of S
=
{ Al,
Given two n X n matrices B and C whose entries are . . . , A,}, our goal is to compute B * C.
1. Compute B1, . . . , B, and CI, . . . , C,, where these are defined as above. 2. Compute the s2boolean products B i * Cj for 1 5 i,j 5 s. 3. Let D = where d i , j = ( E I E = ( A , } * ( A , } , where (Bp* c*)i,j= 11.
We leave it to the reader to verify that for operations * satisfying distributive laws ( D l ) and ( D 2 ); Algorithm 4.2.1 computes D = B * C. We can conclude that if M ( n ) is the time necessary to multiply two n X n matrices with elements which are subsets of some set S and B M ( n ) is the time to compute the boolean product of two n X n boolean matrices, then Theorem 4.2.3.
M (n) 5 cBM (n) for some constant c. 4.3 Valiant's Lemma
Having posed the recognition problem as a transitive closure problem and having reduced the multiplication aspect of that problem to fast boolean matrix multiplication, it remains to reduce the transitive closure problem to one with a time bound proportional to that for boolean matrix multiplication. This we do in Section 4.4. The present section is devoted to proving the lemma due to Valiant on which the construction rests. In order to prove the lemma it is necessary to develop a notation for nonassociative products in terms of binary trees. The reader who wishes first to get an overall picture of the transitive closure algorithm may prefer
150
SUSAN L. QRAWM AND MICHAEL A. HARRISON
to skip this notation and the proof of thc lemma on his first reading and return to it after the “big picture” is better understood. As we stated before, Valiant’s lemma and the transitive closure algorithm have applications to areas other than parsing. In the discussion that follows we will consider matrices whose elements are subsets of some set S. The multiplication operation * defined on these subsets is assumed to have only the two distributive properties (Dl) and (D2) used in Section 4.2 to reduce matrix multiplication to boolean matrix multiplication and the additional axiom s * a =o*s (N1)
=o
used in Section 4.1 to reduce construction of the recognition mat,rix t o a transitive closure problem. We will restrict our attention to strictly upper triangular matrices. It will be convenient to use the following facts, which are immediate consequences of the axioms. lemma 4.3.1.
A
If A and B are strictly upper triangular matrices then
* B is also strictly upper triangular and n-1
V
ai.k
* bk,j
j-1
V
=
ai,k
* bk,j.
k-i+l
k-0
Proof. Follows directly from Axiom (Nl) . 0 lemma 4.3.2 (The Distributive Property). Let S be any set and let * be any binary operation on subsets of S satisfying axioms ( D l ) and (D2). Let sl, sz, . . , , st, t 2 1 be subsets of S and let z(s1, sz, . . . , st) denote the ordered composition of s1, s2, . . . , s t under * and some order of association. If I is any family of subsets of S, then for any 1 5 k 5 t , r ( S 1 , $21
v
* *
, sk-1, v ij s k + l , -
*
.
7
st)
=
iEI
v
(sll 82,
...
1
sk-1,
ii
sk+l,
* *
. st) j
iEI
Proof. Follows directly from distributive axioms ( D l ) and (D2).
CI
4.3. I Nonassociative Products and Binary Trees
In order to describe the nonassociative products that we will deal with in Valiant’s lemma, we first introduce a set of such products called terms. We then present a notation for terms and a sequence of proposit,ionsestablishing the properties of the notation. The verification of these propositions is straightforward and is left for the reader. Definition. Let B = sets of some set 8.Let
(bi,j)
be an n X n matrix whose elements are sub-
bii,is, b i s , i , , *
. , big,it+i *
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
151
be any sequence of matrix elements. We say the sequence is acceptable in the case that for 1 5 j , k 2 t 1, i f j < k thenij < ik. The composition under * of the elements of an acceptable sequence under some order of association is called a term of B or, more specifically, an (il, &+I) t e r n with t components. The notation
+
~ * ( b i ~ , bi*,i8, il,
..
denotes any one of the terms of B with components h1,i2,. . . , bi,,;t+l under the operation *. For example bi.2
*
(bz.8
* b3.4)
and
* b 3 , 5 ) * b5-7
(b2.3
are terms, but
* b2.3 * b 3 , 4
b1.z
is not a term because no order of association is given, and bi,z * bz,z
is not a term because the indices do not increase. Definition. Two (i,j) terms are formally distinct if either they are composed of distinct sequences of elements or they have different orders of (B) be the set of all formally distinct (i, j ) terms association. Let having exactly d components from the matrix B. Let n-1
3*(B) = U U 3 8 : ~(8) d>l i,j-0
be the set of all formally distinct terms with components from the matrix B. (We omit the explicit mention of B when the matrix is understood.) Proposition 4.3.1.1. Let B = ( b i , j ) be an n X n matrix whose elements are subsets of S. Then for all 0 5 i, j 5 n - 1,
u
b?? = 1 0 1 T*
T*
E 38!j)(B)
and
bij = U dkl
U T*. T*E3g;j)
The proof is straightforward and is left for the reader.
152
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
\
A natural way to describe terms of the type just introduced is to use binary trees. For example, the ( 2 , 7 ) term (b2,** bs,e) * b6,, can be represented by a binary tree as shown below.
b2,3
b3.5
In order to define precisely the trees that describe terms, we introduce a functional notation. However, we shall continue to use the trees themselves as a descriptive device to explain the formalism. We shall introduce functions T (xl, x2,, , . , xt) which denote binary trees having t leaves labeled x1 to z t from left to right. If B = ( b i , j ) is an n X n matrix whose elements are subsets of some set S, then the subclass of trees ~ ( 2 1xa, , . . . , 2,) we get by restricting xl, z2, . . . , z t to be an acceptable sequence of elements from B will be in one-to-one correspondence with the terms of 8. For example, the terms xa, $4) =
Ti*(%,
21,
Tz* (51,
xz, 2 3 ,
(21
* 52) *
(23
* $4)
and x4)
will correspond to the functions nating the binary trees
T~
=
51
*
(%
*
(23
* xd )
( x ~XZ, , 22,x4) and
72
(a, 2 2 , x3,x4) desig-
The formal definition of these functions is now given. Definition. Let 7
(xil, xi,,
X =
(XI, 22, .
. . . , z i t ) ,t 2 1 over X
. .} be
a set. The class 3 of binary trees is defined inductively as follows:
(i) For any x E X , T ( Z ) = x isin 3. (ii) For any xi,, zir in X , T ( z ~%is) ~ , is in 3.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
(%)
153
If 71 (zil,, . . , z i p )7,2 (zip+l,. . . , z i t ) , and 73 (zjl, zj2) are in 5 for some 1 5 p < t , then 7(2il,
.. . ,Zit)
=
73(71(2il,
...
1
. . . ,Xir))
Z i p ) , 72(2ipfl,
is in 5. (iv) All elements of 5 are constructed from (i), (ii), and (iii) . By a suitable choice of elements, we get the class of trees corresponding to nonassociative products. Definition. Let B = ( b i , j ) be an n x n matrix, where each b i , j is a subset of some set S. The acceptable class 5 of binary trees over B is defined inductively by
(i’) For any element, bi,j of 6, ~ ( b i , j )= b i , j is in 3. (ii’) If yl, yz is an acceptable sequence then ~(yl,yz) is in 5. (iii’) Let zl, z2, . . . , xplzp+ll. . . , zr be an acceptable sequence of elements of 6. If n(zl, . . . , z p ) , r ~ ( x ~ + .l ., . , z t ) , and ~ ~ ( yy2) l , arein 5 , for some 1 5 p < t , then 7(z1,
..
zt)
=
73(71(21,
.
7
xp),
72(2p+l,
*
-
*
zt))
is in 5 . (iv‘) All elements of 5 are constructed from (i’) , (ii’) , and ( 9 ) . Clearly, an acceptable class of binary trees over a matrix is a special case of a class of binary trees over a set. These definitions are illustrated in Figs. 10a and lob. Figure 10a represents (i) or (i’) . Figure 10b illustrates (iii) or (iii’).
FIG.IOa. The simplest binary tree.
FIG.lob. The intuitive meaning of part (iii) or (iii’) of the definition of binary trees.
154
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
The 7 functions satisfy the following properties. Proposition 4.3.1.2. Let 3 be an (acceptable) class of binary trees. For each 7(z1, . . . , z t ) ,t > 1,in 3, there exist k , 1 5 k < t, and three functions, 71, 7 2 , and T~ all in 3 such that 7 (XI,
* *
-
=
$1)
j
..
7 8 (71 (21, *
,
zk) 7 2 (zk+l,*
--,
zt) ) *
This proposition is a restatement of the definition. Our next principle, which is illustrated by Fig. 11, is familiar to us from language theory. Proposition 4.3.1.3. Let 3 be an (acceptable) class of binary trees. Given T I ( Z ~., . . , z,) and T ~ ( Y I , . . ,yt) in 5 and some k, where 1 5 k 5 s, then there exists a unique T in 3 such that
.
T(z1,
- ..
=
2
zk--1, 3/17
-
71 (21, *
.. %-I,
9 3/19
. %> . * , 3 / t ) , zk+l, .
zh+l,
7 2 (3/1,
*
9
2 . 1
*
We next introduce the notion of a factorization for binary trees. Definition. Let 3 be an (acceptable) class of binary trees. For any . , zt) in 3 and any p , q such that 1 5 p 5 q 5 t, if for some T I , T Z in 3,
..
T(z~,
T ($1,
- ,zt)
Then the quadruple
= Sl(X1,
(TI, TZ, p ,
* * 9
zp-1, 72 (zp,
.
q ) is a factorization of
*
7
2,)
j
$,+I,
* * 9
(a,. . . , zt) .
Our next proposition is a unique factorization result for trees.
y1
-
Yt
FIG.11. The Composition Principle.
zt)
-
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Let 3 be an (acceptable) class of binary trees.
Proposition 4.3.1.4.
. . , rr)in 3 and 1 5 p’
1. Given 7 (zl, . p , and q such that (a)
5 q’ 5 t, then there exist 71, 7 2 ,
P I P’
(b) Q
2
q’
and (c)
155
(71, 7 2 ,
p , q ) is a factorization of
T(Q,
. . . , 2,).
2. If there also exist 71’) 72’, r, s such that
(4 r I
P‘
(e) s 2 Q’ and
(f)
(Q’, n’, r, s) is a factorization of
. . .,
~(21,
$1)
then either (g) r L. P and s
2
P
(h) p 5 r and q
2
s.
or 3. There exists a unique factorization satisfying la, lb, and l c and such that (p - p ) is minimal.
Although, we have written Proposition 4.3.1.4 out in great detail, the intuitive picture is shown in Figs. 12a and 12b. By Case 2, we see that either 7 2 is nested inside of 72’ or vice versa. This situation suggests the following definitions. Definition. A factorization of the type described by Case 3 in Proposition 4.3.1.4 is called the smallest (p’, q’) factorization of 7.
TA x
,... xp-,
xp x;,
x;
xq xq+,
xt
FIG.12s. A factorization of T according to Proposition 4.3.1.4, Case 1.
156
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
r
p and s 2 q
p 5 r and q
1. s
FIG.12b. Factorizations of T according to Case 2 of Proposition 4.3.1.4. The smallest (p', p') factorization is exemplified in Figs. 13a and 13b. In those figures, let us choose p' = 5 and q' = 9. The unique choices of p and q are p = 4,q = 9. The trees r1 and r2 arc shown in Figs. 13a and 13b. Note that the root of T~ is the least common ancestor of z6and z9. Now we define a smallest tree. Definition. r ( a , . . . , zl)is said to be (u, v ) minimal if the smallest (u, v) factorization of r is (i,r , 1, t ) , where i is the identity function of one variable, ie., i ( z ) = z.
Returning to Fig. 13(b), the tree r2 (24, , . . , z ~ is ) (8,9) minimal. In a (u,v ) minimal tree, the least common ancestor of xu and zois the root of the entire tree. Now we present two lemmas concerning these minimal trccs.
Lemma 4.3.1.1. Let ~ ( $ 1 , . . . , 21) = n(z1, . . . , ~ - 1 , n(z,, . . . , z8), x8+l,. . . , z t ) for some t 2 1, 1 5 r 5 s 2 t. Then for any y1, . . . , ya-r+l n(yl, . , . , is (u - T 1, v - r 1) minimal if and only if (71, 7 2 , T, s) is the smallest (u,v ) factorization of r .
+
'1
'2
'3
'4
+
'5
'6
'7
'8
9'
'10 '11 '12
FIQ.13a. A typical binary tree 3.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
157
FIG.13b. The smallest (5.9) factorization of 3.
Proof. The result follows directly from the definitions and Proposition 4.3.1.4. 0 The next result is also an obvious propert,y of these trees. Let 3 be an (acceptable) class of binary trees. Let be in 3 and let t 2 1, 1 5 p 5 q 5 t . Then
lemma 4.3.1.2. 71, T Z , 72‘
if and only if rZ =
72’.
Now we extend the composition principle and the factorization notions to terms by establishing a correspondence between trees and terms. Proposition 4.3.1.5. Let B = (b;,i) be an n X n matrix whose elements are subsets of some set S. Let 3 be an acceptable class of binary trees over B. Let * be any binary operator on subsets of S satisfying axioms (Dl), (D2) and ( N l ) . Then there is a one-to-one correspondence between the set of terms 3* and the set of trees 3. The correspondence is defined inductively by
(i)
for any element bi.i of 7
(ii) for any and
TI
(bi,i) c-)T*
(bil,it, bil,i8, . .
T2(bip,ip+i,
...
B
-,
(bi,j)
bip-i,ip)
bis,it+i)
158
SUBAN L. GRAHAM AND MICHAEL A. HARRISON
Using this correspondence, we extend the notion of minimality to terms in the obvious way. Definition. Let B be an n X n matrix whose elements are subsets of S. Let 7 * ( b i , . i Z ,. . . , b i t , < t + lbe ) a term of 8. Then for any p , q, 1 5 p 5 q 5 1, r* is (ip,i p )minimal if the corresponding tree r is (ip,ip)minimal.
4.3.2 Central Submatrices and Matrix Reduction
We next introduce two concepts related to matrices which are useful in conjunction with Valiant’s lemma and the transitive closure algorithm. The first concept is the notion of a central submatrix. Intuitively, as shown in Fig. 14, a central submatrix is a matrix contained within a larger matrix such that the principal diagonal of the smallcr matrix lies along the principal diagonal of the larger matrix. Definition. Let B be an n x n matrix. Then A is a central submatrix if there exist m 2 0 and r , 0 5 T 5 n - 1 such that A = (ai,j) is an m X m matrix and for 0 i, j 5 m - 1, ai,j = bi+r-l,j+r-I.
<
The most important property of central submatrices for our purposes is given by the next theorem and its corollary. Theorem 4.3.2.1 (The Central Submatrix Theorem). Let B be a strictly upper triangular matrix. If A is a central submatrix of B then for any d 2 1, the corresponding central submatrix of Bed) (or B+) is A(d) (or
A+).
Proof. Let B be an n X n matrix and A be m X m with = bs+r--l.L+r-~ for each 0 5 s, t 5 m - 1 as the indices range over A. These are the i,j positions of B where r I i, j < r m. Thus, let i’ = i - r 1 and jl = j -r 1. We will show by induction on d that
+
+
+
Basis: d = 1. This is immediate. Induction Step: Suppose d
FIG.14.
A
> 1 and the result holds for d - 1. It follows
pq
is a central submatrix of B.
159
PARSING O F GENERAL CONTEXT-FREE LANGUAGES
from the definitions and Lemma 4.3.1 that d-1
j-1
* bfTu).
b t i = V W b$ u-1 k - i f 1
If wc let k'
=
k -r
+ 1 and employ the induction hypothesis, then we get
b!d? = $01
d-1
j'-1
u-1
k'=i'+l
v
a$'b,&j,u) = a$).,. ,I
(J
This completes the proof for Bed). For B+, note that B(+) =
u
B(d).
ia
Corollary 4.3.2.2. Let B be a strictly upper triangular matrix which is transitively closed. If A is any central submatrix of B then A is transitively closed.
The second useful concept is that of the reduction closure operation on a matrix. Let C = (ci,j) be a strictly upper triangular n X n matrix and let n 2. r 2. n/2 2 1. Define the 2(n - r ) X 2(n - r ) matrix D = (&,?I to be the r reduction of C if Definition.
dipj=
where 'i = i
I
ci,j
if
cp.i
if i
ci,jt
if i _< n - r - 1 , j > n
ci?.3
if
+ 2r - n, j'
=j
O_
> n - r - 1,j I n-r -1
i,j
-r-
1
>n-r -1
+ 2r - n.
The r reduction of C is illustrated in Fig. 15a. Intuitively, we form D by deleting the ( n - r 1)st to rth rows and columns. Using the notion of the r reduction of a matrix, we can define the reduction closure. Intuitively (as illustrated in Fig. 15b) we take the transitive closure of the r-reduced matrix and then restore each element to its original place in C.
+
Definition. Let C = (ci.j) be a strictly upper triangular n X n matrix and let n 2 r 2. n/2 2 1. Let D = (di,j) be the T reduction of C and let E = ( e i , j ) be the transitive closure of D. Define the n X n matrix C+@)=
160
SUSAN L. GRAHAM AND MICHAEL A, HARRISON r A
r
r
C= r
/
i
2r-n
1
3
7
9
0.
FIG.15a. The r reduction of
C.
(ct:?)) to be the reduction closure of C , where for 0 5 i, j 5 n - 1
I
ei,j
c;,~
c+(d = i,i
or n - r - l < i S r - l
e;),j
ei,jv eit,,,
wherei’ = i
-
if i , j s n - r - l if n - r - l < i < r - l
>r -l,j5n -r -1 TL - r - l , j > T - 1 i, j > r - 1
if i
if i 5 if
(2r - n ) ,j’ = j
-
(2r
- n).
FIQ.15b. The reduction closure of
C.
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
161
4.3.3 Statement and Proof of Valiant’s Lemma
Using the results of the previous two subsections, we now present the computat,iondue to Valiant. lemma 4.3.3.1 (Valiant’s Lemma). Let B be a strictly upper triangular n X n matrix and let n 2 r 2 n/2 2 1 and suppose that 2r-n
A r
r
where the r X r submatrices B1 =
El ‘El and b =
of B are transi-
tively closed. Define E, as the submatrix of B given by r
r
Er =
where the shaded area above is taken from B. Then B+ = (B U E:)+@). Proof. Since 61and & are transitively closed, and since B is strictly upper triangular, the only part of B+ that is unknown is region 3, in which 0 5 i 5 n - r - 1 and T < j 5 n - 1. This follows from Corollary 4.3.2.2 on central submatrices. Let 3* be the set of terms of B and let 3 be the corresponding set of acceptable trees., Let T* be any (il, ir+l) term of 6, where il 5 n - T - 1 and 1. Then for some i, 5 n - T - 1 and iQ+l 2 T , T* contains a unique (n - r - 1, T - 1) minimal (ip, iQ+l) term T ~ * .Moreover, either Claim 1.
it+l
2r-
162
S U S A N L. GRAHAM A N D MICHAEL A. HARRISON
p = q or for some ra*,r4*E 3* and for some i8such that n - r r - 1, 72*
=
. . , bi8-1,ia)*
7a*(bipripCI, *
~q*(b<,,.+lt
- .- ,
- 1 < i, 5
biqlp+l).
Proof of Claim. Choose u and v such that
I n - r - 1 < iu+l
i, and
I r - 1 < i,.
i,l
Let 7 be the binary tree corresponding to 7* (by Proposition 4.3.1.5) and let (71, 7 2 , i,, &+I) bc the (unique) smallest (iu,i,) factorization of 7 (by Proposition 4.3.1.4). Then 72*(bip,i,,+l,. . . , biq,iq+l)is the unique minimal (ip,iq+l) term. Since i, 5 i, and i,+l 2 i,, we know i, 5 n - r - 1 and iq+l > r - 1. If p = (I we are done. If not, then it follows from Propositions 4.3.1.2 and 4.3.1.5 and Lemma 4.3.1.2 that for some n*, r4* E 3*
. . , bi,-l,i,) * n*(bi,,ia+i, - - b i q , e l ) It remains to show that n - r - 1 < ia 5 r - 1. But this is a direct conseTZ*
= ?a*
(bi,,,;,+l,
*
quence of the minimality of 72* and the way u and v were chosen. This completes thc proof of Claim 1.
For any 2,y, 0 S x 5 n - r - 1 5 r - 1 5 y 5 n - 1, let U,,,(B) be the set of all formally distinct (n - r - 1, r - 1) minimal (2,y) terms of B. Claim2.
Foranyxsn-r-
l a n d y > r - 1, r-1
V T*C
7*
=
b,,, U U
Ua,u(B)
bZ,8
* be,,.
(4.4)
a-n-r
Proof of CEaim. By Proposition 4.3.1.1, b;,. is the union of all formally distinct (2,s) terms of B and b:, is the union of all formally distinct (s, y) terms of B. Therefore, by Claim 1, r-1
U T*E
Uz,u(B)
7*
=
b,,,
U U
b z a * bEu.
a-n-r
Since B1 and b are transitively closed, b,,# = b:,, and b8,# = bEv for all 5 5 n - r - 1, n - T I s 5 r - 1,y > r - 1. Note that Eq. (4.4) can be computed by the following procedure. 1. Matrix multiply partitions 2 and 6 of B. 2. Form the union of the resulting (n - r ) partition 3 of B.
x
(n - r ) matrix with
163
PARSING OF GENERAL CONTEXT-FREE LANGUAQES
Let C be the (n - r ) X (n - r ) matrix which is in position 3 of B .u:E We proceed to analyze C. It is clear from Claim 2 and the construction of C that the following is true. Claim 3. For any i
5 n - r - 1 , j > r - 1,
In words, c i , j + is the union of all (n - r - 1, r - 1) minimal ( 2 , j ) terms of B. One can note that if j = i 2r - n, then Eq. (4.5) reduces to the following:
+
ci,i-n-r
u
= T*E
T*.
Ui,i+sr--n(B)
Let D be the r reduction of B U E,2 obtained by deleting the central 2r rows and columns. D is a 2 (n - r ) X 2 (n - r ) matrix. Clearly
atj = v
v 7
E
-n
7
T~;](D)
from Proposition 4.3.1.1. Moreover, each T contains exactly one element from C. This follows from the strict ordering of the subscripts on the terms and the method of constructdon of D. To finish the proof of the lemma requires establishing a correspondence between the terms of D and of 8.
+
Claim 4. Let i 5 n - r - 1, j > r - 1, and j’ = j - 2r n. Every (i,j ’ ) term of D is a union of some formally distinct (i,j ) terms of B.
Proof. Let 2 = 7*(di,il,d i e V i s.,. . ,
be an (i,j’) term of D, where
L is selected so that ik 5 n - r - 1 and &+I > n - r - 1. Such a k exists by our hypotheses about i and j. Then 2 = T*(bi,iq, b i B , i a ,
* * * 7
b i b i l i t i Cik.ik+l-n+?j
bit+i+2r-n,ib+e+2r-n,
-
u
7 * ( b i , i i ,*
-
+
..
1
9
bir+irr-n.j)
bi&i.ik,
C,
--
* 9 bit+irr-n,j)
C E Uik,ik+l+8r-n(B)
using Claim 3 and Lemma 4.3.2. Let T be the binary tree corresponding to T*. By Claim 3, for any c E Uik,it+l+2r-n (B), there exists some TI* E S* such that c = ~ ~ * ( b i ~ , ib;i ,i , i ; ,
164
SUSAN L. ORAHAM AND MICHAEL A. HARRISON
. . . , b;rr,ik+l+tr-n). Let 71 be the binary tree corresponding to Composition principle, there exists a unique 7 2 E 3 such that 7@i,i27
* *
b i b c t , i b , 71(bik,id, *
-.
bik+i+2r-n,it+r+lr-n,
=
72(bi,ir,
*
7
*
-
2
71*.
By the
birr ,ik+i+?r-n),
bii+Zr-n,j)
bi*-t,iky b i k , i # , ,
. - , bip,,ii+l+2r-n) *
* *
-
bi&r--n,j)*
J
The term 72* corresponding to 7 2 is an (i,j) term of 8. It follows from Lemma 4.3.1.2 that since the terms of U;k,ik+,+zr-n(B) are formally distinct, the (i,j)terms of B are formally distinct. This completes the proof of Claim 4. We now turn to the reverse direction.
+
Claim 5. Let i 5 n - r - 1, j > r - 1, and j’ = j - 2r n. Every (i,j ) term of B occurs in the formation of a unique (i,j’)term of D.
Proof. Let z = ~ * ( b ; , ;.~. ., , be an (i,j ) term of B. Let binary tree corresponding to T*. Choose u and v so that
i, 5 n - r
7
be the
- 1 < iu+l
and i,-l
and let write
=
- .., ..
71 @;,is,
< i,
i,, i,) be the smallest (iw,i,) factorization of
(71, 72,
7(bi,;aj
Ir - 1
1
7.
Then we may
bit,j)
bi,-t,i,,
Ta (bip,ig+l,
.
* t
bir1.ig),
biq,ip+l)
-
* *
biiJ) *
Let 72* be the term corresponding to T~ and let 71* be the term corresponding to 71. Then 72*(bip,ip+l, . . . , biT-l,i,,) is an (n - r - 1, T - 1) minimal (ip, i,) term of B and must be in UiP,iq,Thus by Claim 3 x is one of the terms in
u
- ,
71* (bi,ii,
* *
C E Ui,,i,tb)
However, since i,
5 i,
n
bi,-I,i,,,
C, bi,,ig+l,
-- , *
bit,j)
(4.6)
- T - 1 and i, 2 i, > T - 1
c i p ,ip-r
= di, ,i,,+n-zr
because of the way D is formed. Thus (4.6) becomes 71*(di,ip, * *
-
9
d i p - 1 , i p ) dip,ipt-n-2r,
We claim that
i,
diptn-Zr,ig+i+n-2r,
< i, - 2r
4- n.
* * * t
dit-2r+n,if).
(4.7) (4.8)
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
To prove (4.8),note that since i, 5 i, and i,
5 i,, we have that
- ip) 2 (i, - i”). > r - 1 and i, 5 n - r - 1, it follows from (4.8) that (i, - ip) 2 (i, - i,) > r - (n - r ) (i,
Since i,
165
(4.9)
or
(i, - ip)
It follows that i, - 2r
+ n > i,.
> 2r - n.
Therefore Eq. (4.7) is an ( i , j ’ ) term of
D.
There is still the uniqueness issue. Suppose z occurs in the formation of some other (i,j’) term of D, say
. - - dip~.ip*-2r+n, di~~-2r+n,ip’+l-zr+n, . - - dit-zr+n.j’) (4.10) where ipt5 ?z - r - 1, i,, - 2r + n > n - r - 1. (Thus dipt.iq#-2r+n is ~ 3 * ( d i , ; P ,d i , , i , ,
the term which “crossed” the deleted section of the matrix.) Note that the sequence of subscripts in (4.10) depends on the same sequence of subscripts which appear in the expression of x, although the number of arguments of r3* may be different from rl* and p‘, q’ may differ from p , q. By an analysis similar to the first part of the proof Claim 5, there exists 7 4 3 such that dip#.ip’-2r+n
and for c
= tip' 3iul-r
Uipl.iqt
c =
T4*(big#.ip’+1,
and
x
. . . ,bi*L,,i@3)
- . - C, - . . . . , c, . . . ,
=
~*(di.ip,
di(-Zr+n,jO
=
T3*(bi,i*,
bi,f,j.).
Since c is an (n - T - 1, r - 1) minimal (ipr,ip,)term of B and since for the values of u and v that we have chosen, c is (u - p’ 1, v - p’ 1) minimal. It follows from Lemma 4.3.1.1 that (TI’, 74,p’, q‘) is the smallest (u, v ) factorization of T. Hence by uniqueness, r1 = TI’, TZ = 74, p‘ = p , and q = q’. This completes the proof of Claim 5. To complete the main proof, it follows from Claim 4 that d t j #is a union of distinct (i,j ) terms of 8. From Claim 5, every (i,j) term of B appears in d: Thus is exactly the union of all formally distinct (i, j) terms of 8. From Proposition 4.3.1.1,
+
+
+
for all i 5 n - r, j > r, and j ’ = j - 2r n. But this is exactly the region of B which had to be verified, and the proof is complete.
166
SUSAN
L. O R A W M AND MICHAEL A. HARRISON
Notice that since r
E:
=
we have just shown that
a+ =
This will provide a simple recursive method for the computation of B+. 4.4 Computing D+ in Less than O(na)Time
If D is a strictly upper triangular matrix whose elements are subsets of some set S and * is an arbitrary binary operation on S which satisfies axioms (Dl), (D2), and (Nl), we wish to compute D+ as efficiently as possible. Since * is not necessarily associative, techniques from the literature which make that assumption are useless here (Ah0 et al., 1974; Fischer and Meyer, 1971; Munro, 1971). It should be observed that our method will be as efficient as any which is known in the associative case. It is convenient to assume that n is a power of 2 in what follows. We shall return to this point later and will relax this assumption. We now introduce a family of procedures which will be useful. Definition. We define the operation Pk of taking the transitive closure of an n X n strictly upper triangular matrix B under the assumption that the (n - n/k) X (n - n / k ) submatrices B1 and h (shown below) are already transitively closed.18
Recall that a matrix D is t r a d $ v e l y & a d if D+ = D.
PARSING OF GENERAL CONTEXFFREE LANQUAQES
167
n n-K
n-f
It will turn out that we need only construct Pz,Pa, and Pq. Let T i ( n )be the time required t o compute Pi for an n X n matrix. Also let T R ( n ) be the time required t o compute the transitive closure of an n X nmatrix. Now we given an algorithm to compute D+ in terms of P2. Algorithm 4.4.1. Let A be an n X n strictly upper triangular matrix. We define a procedure TRANS (A) which returns the transitive closure of A.
1. If n = 2 then TRANS(A) = A. n T
2. Consider D =
3. Compute TRANS (AI) and TRANS (At).
4. Let 6
5.
=
Compute P2(B) = A+.
168
SUSAN
L. GRAHAM AND MICHAEL A. HARRISON
Now we will argue that the algorithm works, and analyze it. lemma 4.4.1.
Algorithm 4.4.1 correctly computes A+ and
Proof. To see that PZ(B) = A+, we induct on k where n = 2k. Note that the result is true for k = 1. Suppose Ic > 1 and that TRANS works correctly for n = 2k. Applying the algorithm to a matrix A of size 2k+1,we correctly compute TRANS (Al) and TRANS (A*) where
by using the induction hypothesis. Let
Now B has the square submatrices A1+ and yields the closure of B, which is
A2+
of size 2k. Applying Pz
Clearly the time bound is
T R ( n ) _< 2 T R ( n / 2 )
+ Tz(n) + O ( n 2 ) .
0
Next we turn to the interesting task of computing Ps. Algorithm 4.4.2. Let B be an n X n matrix as shown below, where each square is of size 4 4 .
B-
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
169
Assume that 1 . 5
B1'
2 6 '
and
are already transitive closed. 1. If n = 2 then B+ = B.
2. Apply Pz to B3
=
Apply Pa to B5
=
4.
to yield
5.
Apply P4 to
to yield
2
15 16
3
'
4
5
6
7'
8'
9
10 11
12
13 14
lemma 4.4.2.
= B3+.
v] 14
1
Kl
to yield B+.
15 16
The previous algorithm computes Pz and
Tz(n) I Tz(n/2)
+ 2T3(3%/4) + T*(n).
Proof. The inequality is trivial to verify. To see that the algorithm works, note that we were allowed to assume that B1 and b were transitively closed. Note that and are transitively closed because they are central submatrices of transitively closed matrices (cf. Corollary 4.3.2.2). Thus we apply Pz to Bs to get B8+. Next consider step 3. To apply P 3 to B4 we must check to see if
170
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
is transitively closed, and it is by assumption. Moreover we require that
= B4+. Now be transitively closed, which it is by step 2. Therefore P3(B4) we want to apply Psto Bg. But
are transitively closed by step 2 and our assumption. To invoke P4in step 5 we must only check that B4+ and B6+ are transitively closed. But this is true from steps 2 and 4, respectively. Thus we obtain B+. 0 With the aid of Valiant’s lemma, it is easy to give algorithms for Pa and P4.
Algorithm 4.4.3. Computation of Pa. Given an n X n upper triangular matrix B. Let us represent B by
where each square is of size n/3. We may assume that
and
PARSING OF GENERAL CONTEXFFREE LANGUAGES
1. Compute B U E,
* E,
=
171
C, where r = 2n/3. Recall that
Thus
2. Compute C+(') for r = 2n/3 using Pz, which means that we compute
PZ of
This is permissible since
and
are closed.
The algorithm for P4 is similar, with r = 3 4 4 . Algorithm 4.4.4 (for the computation of P l ) . Let B be an n X n upper triangular matrix which is represented by
B=
-1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
where each entry is a matrix of size 4 4 . Assume that
81 =
F] = BT
9
10
11
172
SUSAN L. QRAHAM AND MICHAEL A. HARRISON
and that
1. Compute B U E,
* E,
= C
where r = 3 4 4 so 0
2
3
0
0
0
0
8
0 0 0 1 2 0
0
0
0
5
6
7
8
c=- 9 2. Compute C+(*)for r
=
10 11 1 2
3 4 4 , using P2,i.e., apply Pz to
Now we begin t o analyze these procedures. lemma
4.4.3. Algorithms 4.4.3 and 4.4.4 compute Pa and Pq, respec-
tively. Moreover
+ T~(2n/3)+ O(n7 7'4(n) I M ( n ) + T~(n/2)+ O(n2).
Ta(n)
5 M(n)
Proof. Correctness follows from Valiant's lemma, while the inequalities follow trivially. 0
A t this point, we have reduced the computation of the transitive closure to the computation of the product of two upper triangular matrices. We must estimate the time required to do this. Let M (n)be the time necessary to multiply two such n X n matrices.
173
PARSING O F GENERAL CONTEXT-FREE LANGUAGES
Lemma 4.4.4. If n that for all m
=
zk for some k 2 1 and there is some y
2 2 such
M(2"+') 2 27M(2m)
then TZ(n)
I
t
c1M (n)
for some constant
if
czM (n) log n
for some constant cz if
c1
y
>2
y = 2.
Proof. If wc substitute the result of Lemma 4.4.3 into the result of Lemma 4.4.2, we get TZ(n)
I 4Tz ( 4 2 )
+ 2M ( 3 4 4 ) + M (n) + 0(n2).
By the monotonicity of M, M(3n/4) Tz(n)
I 4Tz(n/2)
5 M(n ) so
+ 3M(n) + O(n2)
(4.11)
The solution of (4.11 ) is claimed to be log n
Tz(n) I O(n2Iogn)
+ 3 ~ ( n )C 2(~-7)m.
(4.12)
m-0
If we let n = 2k, tho claim becomes k
+
Tz(2k) 5 ~ ( k 2 ~ 3M(2k) ~ )
2(2-7)m
(4.13)
m-0
for some constant c 2 c1/4, where C I is the constant in (4.11). We can verify this by induction on k. We shall just do the induction step here. Since M (2k) I: 2-7M (Zk+'), it follows that T(2k+1) 5 c ( k
k
+ 1)22(k+1)+ 3M(2k+1) 1 + 2(2-7)
[
2(2-7)m] P
O
or ~ ( 2 & + 15) c ( k
k+1
+ 1)22(k+1)+ 3 ~ ( 2 k + 1 )C 2(*-7)m. m-0
From inequality (4.11) Tz(2"+')
5 4 T ~ ( 2+ ~ )3M(2k+')
+ ~i(2")).
By the induction hypothesis, k
T z ( ~ ~ +5' ) 3M(2"')
+ ~ l ( 2 +~ ~4 ~) ( k 2 + ~ ~12~'W(2~) )
2(2-7)m.
m-0
(4.14)
174
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Since c
2
4 4 , we have 4ck
+
~1
+ 4~ = 4c(k + 1).
5 4ck
Multiplying both sides by 22kyields
+
+
~ 1 2 ' ~ 4 ~ k 2 ~~ 1 2 ' ~ ~ k 2 ~ 5 + 'c(k
+ 1)2'".'
Substituting this in (4.14) yields k
+ 12M(2k) C 2(2eT)m+ ~ ( +k 1)22k+2.
T2(2k+1)5 3M(2"')
m-0
This completes the induction proof of (4.13) and also establishes (4.12). Since the series zrnconverges if 1 x 1 < 1,then the result follows. 0
x;-o
Next we use the present result to estimate TR (n) . Lemma 4.4.5. If n = 2k for some L 2 1 and if there is some 8 2 2 so that for all m Tz(2"+') 2 2dT2(2m) then TR(n) 5 O(Tz(n)).
Proof.
By Lemma 4.4.1
T R ( n ) 5 2TR(n/2)
+ T2(n) + O(n2).
If n = 2k,one can easily prove by induction on k that k
TR(2k) 5 O(22k)
+ Tz(2k) C 2(l-Om. -0
Thus log n
TR(n)
5
O(n'>
+ Tz(n) C 2(1a)m. m-0
By our growth assumption about TZand by Lemma 4.4.4, we have
TR(n)
5 0(7'2(n)).
Now we can combine our various results into one theorem, in which we relax our assumptions about n being a power of 2. Theorem 4.4.1.
If there exists y M(2"')
2 2 and 6 2 2 such that for all m 2 27M(2m>
and TR(2'"+')
1 26TR(2m)
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
175
and moreover there is a constant c such that for all n cM ( n ) 2 M (2’10~“1) then O(M(n))
if
y > 2
O(M(n)log n )
if
y = 2.
TR(n) =
Proof. It may have seemed that we would run into difficulty in assuming that n = 2k and using P3, which would seem to reqiire that the size of the matrix be divisible by 3. Close observation of P2,which calls P3, indicates that if n = 2k1P3 is applied to matrices of size 3 4 4 = 3.2k-2. When we take the “reduction by one-third” operation, we deal with matrices of size 2k-2 and there is no problem. To complete the proof, we merely pad out an n X n matrix to the next higher power of 2 if n # 2k,which is 2lloganl. The method of padding is illustrated below. n
We may estimate T R ( n ) by using the previous lemmas and the fact that CM( n ) 1 M (2110g “1). The result follows easily. 0 Now we can combine our various results to obtain the following important theorem. Theorem 4.4.2. Let S be a finite set and * an arbitrary binary operation on S that satisfies axioms (Dl), (D2), and (Nl). If D is a strictly upper triangular matrix whose entries are subsets of S then the time required to compute D+ is T R ( n ) and
cBM(n) 5
if B M ( n )
cn210gn
if
TR(n) 5 BM(n)
2
-
n2+* for some
>0
n2.
Proof. The result follows from Theorems 4.2.2, 4.2.3, and 4.4.1.
0
176
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
4.5 An Upper Bound for Context-Free Parsing
Adding the results from Section 4.1 to those of Section 4.4, we can analyze Algorithm 4.1.1 for computing the Cocke-Kasami-Younger algorithm in order to obtain a time bound for context-free recognition. Theorem 4.5.1. Let G = ( V , 2 , P , S) be a context-free grammar in Chomsky normal form. The time needed to recognize a string of length n generated by G is at most a constant times
B M ( n ) i n2.*I
ifle BM(n) = n2+e for some
n210gn
if
BM(n)
-
E
>0
n2.
Proof. Algorithm 4.1.1 requires 0 (n2)steps to initialize the matrix and
TR (n + 1) steps to compute the transitive closure. It is easily shown that the binary operation defined in Section 4.1 satisfies axioms ( D l ) , (D2) , and ( N l ) . The time bound then follows from Theorem 4.4.2. 0
-
5. The Hardest Context-Free language
The purpose of this section is to exhibit a single context-free language which has the property that if one can parse this language in timef(n) 2 n and space g ( n ) 2 n then any context-free language can be parsed within these bounds. The significance of this result is that it was unnecessary to derive general time and space bounds for each of the parsing algoIithms that have been studied. We could have shown instead that these algorithms had those bounds on just one particular grammar. We did not choose this approach because the general techniques of algorithmic analysis are often more informative and they lead to notice of significant special cases. The discovery of this language is due to Greibach (1973). Let A denote our favorite model of a computer, for instance a random access machine or some form of Turing machine. Our first result concerns such devices. Lemma 5.1. Let A be any computational model of the type just described which accepts L C A* in time p (n) , where p is some polynomial. Let 4: Z* + A* be a homomorphism. The set cp-'(L) = (wE 2* 1 4w E L ] can be accepted in time p (n) by a device A' of the same type as A . 19 W. L. Rurzo has shown that BM(n) _< ~ ~ 0 . 8(log 1 log nbog n).' 5 n2.E1but there is not space to include this result here and so only the weaker result is stated.
177
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Proof. We construct A’ to work as follows: 1. Scan w and translate it to +w. This takes kIg (w)steps for some constant k. ) steps. 2. Simulate A on +w. This takes p (lg ($w) 3. Thus w may be processed in
P ( k (+w)>
+ k k (w) 5 k.’P(k (w))
for some constant k’. Note that a similar result holds for space. Next, we recall the Dyck set on two letters. Definition.
Let &
=
{allh,61, & } and let D be the language generated
by
S
---t
SS 1 aiS& 1 @S& 1 A
D is called the Dyck set on two letters. Another way to describe D is as the set of all strings which cancel to A under the relations xa161y = xy
.R?
xa2&y.
It is important to note that this is one-sided concellation. Thus &az Z A. Now we can formally define the special language which interests us. This language can be intuitively described as a nondeterministic version of the Dyck set. Definition.
LO = { A } U
Let
2; =
(all U Z , 61,&, $, c}. Define
(wy1cz1CE . . . xncynczndI n
2 1, yl . . . yn E $D, zi,
for all i, 1 I i 5 n, yi E {al, a,61,& ) * for all i xi)s and zi)s can contain c’s and $’s.)
2
zi
E
2;*
2 ) . (Notice that the
It is easy to see that Lo is context-free. Imagine a pushdown automaton (Hopcroft and Ullman, 1969) which “guesses” a substring yl in the first block and processes it in the natural way a pushdown automaton would work on the Dyck set (i.e., stack everything and pop when the top of the stack is a; and the input is 6,).The computation is repeated for each block as delimited by d’s. We are now ready to begin the main argument. Theorem 5.1. If L is a context-free language, then there is a homo(Lo- ( A ) ) . morphism so that L - { A } =
+
+-’
178
SUSAN L. GRAHAM AND MICHAEL A. FIARRISON
Proof. There is no loss of generality in assuming that A is not in L. (If A E L then the argument to be given works for L - A. Since c$A = A E Lo for any homomorphism, the construction also works for L.) We may also assume, without loss of generality, that L = L ( G ) , where G = ( V , 2,P, S ) is in Greibach form. Thus every rule A: of P is of the form
A
-+
UCY,
where A E N , a E 2,and a E (N - { S 1 ) *. Let us indexN as { AI, . . . , A , ) where A1 = S. We begin by defining two mappings from P into &*. If 7rk is Ai 3 a then ?7rk
If
Ak
= &(Zz'&.
is A i --+ aAj, . . . Ajmfor somem
2 1 then
?rh = alhicil alatjmal. . . alazilal.
To define 7,we recall that i is the index of the left-hand side and if i # 1 then T A k = k k else T A k = $ala2al?rk Since T and ? encode only the nonterminals in the production and not the terminal, we let Pa = ( rpl,. . . , rP,] be the set of all productions whose right-hand side begins with a E 2. We define the homomorphism c$ by da
=
if Pa # 0 then CT (rPJ c, eke $$.
. . c7
( 1 ~ cd ~ )
It only remains for us to show that L = q5-l (LO- { A ] ) . The following claim is the key to the proof and gives the exact correspondence between the derivation in G and the structure of LO. Claim. For each bl,
. . . , be in 2 ; Ail, . . . , Air in N -
{ S ] ,we have
that
. *Air under production sequence (apl,, . . , xPh)if and only if (i) $(bi . . . b k ) = Z l q i m l d . . . d X k C y k C Z k d (5) y l . . . y k E INIT($D) = {x [ xw E $D for some w E 2*) (iii) r ( y 1 . . . y k ) = +ata2'ru1 . . . ata2'1a1, where for each w, p(w) is the b l . . bkA;,
a
.
unique minimal length word which can be obtained from w by cancellation.
179
PARSINQ OF GENERAL CONTEXT-FREE LANQUAffES
Since (ii) and (iii) hold if and only if y1 . . . yk E #D,we have that w E L if and only if (i) +w = xlcylczld . . . xkcykczhd (ii) y1 . . . yk E y!D (iii) y; = rnpi E &* for i 2 2 which holds if and only if +w
E LO-
{A).
Now we shall prove the claim by induction on k. Basis:
k
=
1.
Case 1. The derivation is
Ai*a where this production is np+ Then applying T to this production gives = $al&al&&&
TTpi
If P,, = { r p l.,. . , r A ) , we have +U
=
. . . CT (n,)
CT ( r p lc)
cd
xlcylczld
=
If we let y1 = T ( T , ~ ) then p(yl) = q! and so yl E INIT(#D) and all the properties of the claim are established. Conversely if each part of the claim is satisfied, there must a derivation A1 =+a and Case 1 has been verified. Case 6. The derivation is A1 where m
=
. . Aj,,,
2 1and the production is npi.By definition, T
If P,,
aAj,Aj,.
(r,,,
...,
T
T,},
+a =
= ~ #alazal~1&alala2ima1 ~ ..
.
alazi1al
then CT (T,,) C
. . . CT (7rh)Cd = XiCyiCzid
satisfying property (i) of the claim. As in Case 1, we let yl = Then p(y1) = #al&imal . . . alazila,
T ( T ~ ~ ) .
satisfying property (iii) . Consequently yl E INIT ($0) , satisfying property (5). Conversely, if each part of the claim is satisfied, there must be a derivation A1 * aAjlAj2. . . Ajm and the basis has been verified.
180
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Induction Step: Assume the result for k
2
1. Suppose we have
. . bkAi,. . . . . . Aj, and . . . bk) = #U1Uz"Ui . . . U I G i l U I . Let
S+ bl,
Air
the production rP is A i , + bk+lAj, where p ( y 1 . . . y k ) dl&i%lalazj~ul . . . alazi1al and 4 ( b k t . 1 ) dXkCykCZkd,
1( y l
.
*
. gk+l)
= p
=
+(bl
=
xk+lcyk+lczk+lcd.
- .
(al&"al
=
pk+1
xlcylczld ,
=
T(Xp)
.. =
We compute
al%"alyk+l)
. . . alCJ&"alalaz~~Ul . . * aluzj~a,
a1CJ&'ra1
while
S
% .. bl
&Ail
. .Air *
bl
..
bkbk+lAjl
.
AjIAi,
.. 6
Air-
This establishes one dircction of the proof. On the other hand, suppose +(bl . . . b k ) = xlcylczld . . . dxk+lcyk+lczk+ld and p(y1 . . . yk+l) E #(al,&I*. By the construction of 4, the only way this can happen is if p (y1
. .y k )
=
#alazi'al ,
. . ala2"al
T(Xp)
=
d1@dlala2iral
and p((yk+l)
=
. . . al@,ilal
where r pmust be a production
A , -+ b k + l A j l . . . Aj,
(5.1)
But since p (yl . . . y k + l ) E # { all az)*, there must be some cancellation between p (yl . . . Y k ) and p (yk+l), i.e., s = &. By the induction hypothesis, S
bl
. . . bkAi,. . . At,
and by using Eq. (5.1) , we have
S3
bl..
.bkAj,. . . Aj&, . . . A;,
Therefore the induction has been extended.
0
Intuitively the homomorphism C#I encodes the set of productions that a top-down parser might use in processing the next input symbols. The ('guess'' made by the pushdown automaton is which of the possible productions to use. What we can conclude from Lemma 5.1 and Theorem 5.1 is that the time needed to recognize the words of some language L is proportional to the time necessary to recognize the subset of LOcorresponding to the encoding related to a grammar generating L.
181
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
6. Bounds on Time and Space
We have considered a number of parsing algorithms, and for each of these we have computed the time and space requirements. I n this section, we shall summarize the least upper bounds and greatest lower bounds now known for the time and memory used in parsing. If we let R(n) be the time required to recognize (or parse) a string of length n, then we have the following result. Theorem
6.1. n _< R(n) 5 cBM(n) < n2.*l.
Proof. The upper bound follows from Valiant’s algorithm. The lower bound is obvious since we must read the entire string before giving an answer. 0 Let us define R‘ (n) to be the time required to recognize a string of length n by an on-line Turing machine. I n this case, we must formulate the problem as in Section 1 to prevent the machine from copying the input to a work tape and then proceeding as in an off-line computation. Theorem 6.2.
For on-line computation,
n2 5 R’(n) S log n
n3 - < n3. log n
Proof. The upper bound of n3 follows from Earley’s algorithm. The na/ (log n) result can be obtained by a parsing method which combines the algorithms of Earley and Valiant. This is an unpublished result which will be reported in Graham et al. (1976). The lower bound is due to Gallaire (1969). We shall not reproduce the argument here but will note several features of the proof. Let
z
= (0, I],
L
=
c, s 6 z
(2*22*cSuls.. . SUfS 1 t
2 1, 2 E z*, lg
(2)
> 0, uj E z*
for l l j l t and ui = zT for some i, 1 S i
5
t}.
It is interesting to note that L is a linear context-free language. Yet Gallaire shows that L requires at least n2/(log n ) time for on-line recognition. Since L is linear, it can be recognized (even on-line) in time proportional to n2. Let us now concentrate on the space requirements for recognition, Let S (n)be the space required to recognize a string of length n.
182
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Theorem 6.3.
log n
5 S ( n ) _< 1og2(n)
Proof. The proof of this result is quite long and we shall give only a concise sketch here. First lct us work on the lower bound. There is no loss of generality in dealing with single-tape Turing machines because the change in the tape bound is only a constant [cf. Theorem 10.2 of Hopcroft and Ullman (1969)l. Suppose we have any single-tape, off-line, L (n) tape-bounded Turing machine. The instantaneous descriptions (ID’s for short) of such a machine are of the form (q, i, a) where q is a state, a is the portion of the work tape scanned so far, and i is the cell of the work tape being scanned (so 1 i 5 Ig (a)) . The maximum number of ID’s is
r ( n ) = sL(n)tL‘,’ where s is the number of states and t is t,he cardinality of the work alphabet. Let C1, . , , , C,(,) be an enumeration of all ID’s. For each n 2 0 and each v E 2* with lg (v) I n, define a transition matrix T, (v) to be an r ( n ) X r ( n ) matrix whose entries are from the set {00,01, 10, 11 ). t i , j is defined as follows: The first digit of ti,j is 1 if and only if the Turing machine started in tape configuration C; while scanning the first cell of v$ can reach ID Ci without moving the input head left out of v$. 2. The second digit of titj is 1 if and only if when started in Ci while scanning the first cell of v$, the device can move the head left out of v$ and can be in ID Cj the first time it does so. 1.
Thus T ( v ) describes the behavior of the device on v when v is a suffix of the input. Then we have the following useful Lemma. Lemma 6.1. If lg (uv)5 n for some uv E L, where L is the language recognized by a single-tape Turing machine M and for some ‘v E Z*, T n ( v ) = T,(v’) then uv’ E L.
Proof. The argument is a straightforward “crossing sequence” proof and is omitted. Now, consider the language
L
= {wcwT1 w
E (0,1]*).
We shall shortly show that any on-line Turing machine requires at least log n tape to recognize L. Let n be a fixed odd integer and consider all strings in S = ( 0 , 1 ] n(n-1)’2.
183
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
Claim 1.
No two strings in S have the same transition matrix.
Proof. Suppose w, W' E S with w # E L, but by Lemma 6.1, we have
W'
and T ~ ( w )= Tn(w'). Then
WCWT
WC(W')T E L,
which is a contradiction. If all such strings have distinct transition matrices and there are 2n(n-1)/2such strings but only 4'(") transition matrices, we must have 4r(7d > - 2n(n--1)/2 > 2(n--1)/2. (6.1)
-
Taking logs of Eq. (6.1) yields 2r(n) 2 (n - 1)/2
or 4r (n) = 4sL (n)tL(") 2 n
- 1.
(6.2)
If n > 1 then n - 1 2 n/2. Also for all L ( n ) , becomes
2 L ( n ) ,so Eq. (6.2)
8 ~ t ~2 ~n.( ~ )
(6.3)
Taking logs of Eq. (6.3) yields log 8s
+ 2L (n)log t 2 log n.
Thus
If n >
for n
log n - log 8s L(n) 2logt ( 8 ~then ) ~ log n > 2 log 8s, so Eq. (6.4) becomes
>
( 8 . ~ )Thus ~ . for infinitely many n (all odd n
> 8sz) we have
L ( n ) 1 clogn where c = 1/(4 log t ) . Therefore
S ( n ) 2 log 72. We note in passing that L can be recognized in log n space. Claim 2.
L = { wcwT 1 w E { 0 , 1 ]* can be recognized in space log n.
The argument to establish that S (n) 5 log% is too long to be included here and the reader is referred to Hopcroft and Ullman (1969) and Lewis et al. (1965). 0 The situation for on-line tape bounds is quite simple.
184
SUSAN L. GRAHAM AND MICHAEL A. HARRISON
Theorem 6.4,
In t,he on-line case
S’(n) = n. Proof. Note that if S’ ( n ) = n, we may copy the input on a work tapc and use the upper bound of Theorem 6.3 for recognition. Thus S‘(n) _< n. To handle the lower bound, again let
L
=
{WcwTIw E ( 0 , 1 ) * ] .
There are 2“ words of length n for w such that wcwT E L.The ID’S of the Turing machine when scanning the c must be different, or else we would accept wcwT and W C ( W ’ )when ~ w # w’.But the number of different ID’S would be T (n) = SL (n)tL(n)2 2“. By an analysis similar to that in the proof of Thcorem 6.3, we find that
L ( n ) 2 n. Thus S’(n)
2
n and therefore
S’(n) = n.
ACKNOWLEDGMENT
This work waa supported by the National Science Foundation under grants GJ-474, GJ-43332 and DCR7407644A01. W. L. Ruzzo provided much insight about the material presented in thie chapter, particularly that of Section 4. Helpful comments were also made by many of the students in course CS 277 at Berkeley. REFERENCES Aho, A. V., and Ullman, J. D. (1972-1973). “The Theory of Parsing, Translation and Compiling,” Vole. I and 11. Prentice-Hall, Englewood Cliffs, New Jersey. Aho, A. V., Hopcroft, J. E., and Ullman, J. D. (1974). “The Design and Analysis of Computer Algorithms.” Addison-Wesley, Reading, Massachusetts. Bouckaert, M., Pirotte, A., and Snelling, M. (1973). Improvements to Earley’s contextfree parser, Ges. Inform. 3, 104-112. Earley, J. (1968). An efficient context-free parsing algorithm. Ph.D. Thesis, CarnegkMellon University, Pittsburgh, Pennsylvania. Earley, J. (1970). An efficient context-free parsing algorithm. Commun. Ass. Cmput. Mach. 13, 94-102. Fischer, M. J., and Meyer, A. R. (1971). Boolean matrix multiplication and transitive closure. IEEE Conf. Rec. Symp. Switching Automata Theory, 19th 1971 pp. 124-131. Fkcher, P. C., and Probert, R. L. (1974). Efficient procedures for using matrix algorithms. I n “Automata, Languages and Programming” (J. Loeckx, ed.), Vol. 14, Lecture Notes in Computer Science, pp. 413428. Springer-Verlag, Berlin and New York. Gallaire, H. (1969). Recognition time of context-free languagea by on-line Turing machines. Inform. Contr. 15, 288-295. W. L., On-line context-free recognition in Graham, S. L., Harrison, M. A., and RUZZO, leas than cubic time. Cunj. Rec. Annu. Symp. Theory C m p u t . , 8th (in press).
PARSING OF GENERAL CONTEXT-FREE LANGUAGES
185
Greibach, S. A., (1966). The unsolvability of the recognition of linear context-free languages, J . Ass. Comput. Mach. 13, 582-587. Greibach, S. A. (1973).The hardest context free language. SIAM (SOC.I d . Appl. Math.) J . Comput. 2, 304-310. Hopcroft, J. E., and Ullman, J. D. (1969). “Formal Languages and Their Relation to Automata.” Addison-Wesley, Reading, Massachusetts. Hotz, G. (1974). Sequentielle Analyse kontextfreier Sprachen. Acta Inform. 4, 55-75. Kasami, T. (1965). “An Efficient Recognition and Syntax Analysis Algorithm for Context Free Languages,” Sci. Rep. AFCRL-65-758. Air Force Cambridge Res. Lab., Bedford, Massachusetts. Kasami, T., and Torii, K. (1969). A syntax analysis procedure for unambiguous context free grammars. J . Ass. Comput. Mach. 16, 423-431. Lewis, P. M., Steams, R. E., and Hartmanis, J. (1965). Memory bounds for recognition of context-free and context-sensitive languages. IEEE Conf. h c . Switching Circuit Theory Logical Des., 1965 pp. 191-202. Munro, I. (1971). Efficient determination of the transitive closure of a directed graph. Inform. Process. Lett. 1, 56-58. Pager, D. (1972). “A Fast Left-to-Right Parser for Context-Free Grammars,” Tech. Rep. PE240. Information Sciences Program, University of Hawaii, Honolulu. Strassen, V. (1969). Gaussian elimination is not optimal. Numer. Math. 13, 354-456. Townley, J. A. (1972). The measurement of complex algorithms. Ph.D. Thesis, Harvard University, Cambridge, Massachusetts. (Also available aa Report T R 14-73, Center for Research in Computing Technology, Harvard University.) Valiant, L. G. (1975). General context-free recognition in less than cubic time. J . Comput. Sy& Sci. 10, 308-315. Younger, D. H. (1967). Recognition and parsing of context-free languages in time n*. Infurm. Contr. 10, 189-208.
This Page Intentionally Left Blank
Statistical Processors W. J. POPPELBAUM Department o f Computer Science University o f Illinois Urbana, Illinois
Pros and Cons of Statistical 1nformat.ion Representation An Overview of Time Stochastic Processing . Fluctuations and Precision of Stochastic Sequences . Generation of Random and Quasi-Random Sequences Examples of Time Stochastic Machines . Bundle Processing and Ergodic Processing . Examples of Bundle and Ergodic Machines . 8. An Overview of Burst Processing . 9. Preliminary Results in Burst Processing . . 10. Outlook in Statistical Processing . References . 1. 2. 3. 4. 5. 6. 7.
. . . .
187 190 194 195 . 197 . 205 . 211 . 216 . 224 . 226 . 228
1. Pros and Cons of Statistical Information Representation
We shall call “statistical representation” any information representation in which averaging must be used to extract the meaning (von Neumann, 1963). For the moment we shall impose two conditions: 1. We shall assume time-averaging (of time sequences) 2. We shall assume that the information signal consists of pulses in predetermined time slots, i.e., that we have a serial synchronous digital system (Poppelbaum et al., 1967).
Note that there are averaging systems (Ring, 1969) (see Section 6 below) in which space-averages replace time-averages. It is also possible to timeaverage analog signals or noneynchronized pulses. We shall not discuss such systems in this paper (Afuso, 1968). Under the restrictions above we can still choose between weighted (binary) systems-e.g., a “least significant digit first pulse train of fixed length”-and unweighted systems. In the latter category we can distinguish at least two ways of looking at things. 187
188
W. J. POPPELBAUM
1. We can use “open-ended” unweighted sequences of randomly occurring pulses: this is done in time stochastic processing (TSP). The probability of occurrence is then the information carrier. 2. We can use fixed length blocks, representing numbers in an m out of n system in some (usually deterministic) way: this is done in burst processing (BP) . The average over many blocks then carries the precise information.
It is quite evident that neither TSP nor BP uses optimal ways of coding information as far as bandwidth is concerned (Peterson, 1961). It will be shown below that for 1% precision one needs roughly lo4slots in TSP and lo2 slots in BP! For 10% precision things look better-1O2 slots in TSP and 10 slots in BP are now sufficient. But in 10 slots we could have transmitted 1024 different weighted binary values-even BP needs therefore roughly 100 times more bandwidth than binary computation. But now comes the important point : in many applications we underutilize the bandwidth of a dedicated channel anyway. Examples are as follows. 1. All cases in which a channel of standardized configuration-e.g., a 4000 baud wire with a ((mass” ground-is used to transmit ON/OFF-like signals (engine-room telegraph). 2. Transmission of telemetry signals corresponding t o slowly varying variables of low precision (antenna position indicator). 3. Systems in which the calculational speed of the processors is the limiting factor. It is clearly useless to design a 10-MHz, 10-bit precision pulse code modulation (PCM) system to feed a 10- or 100-psec multiplication circuit! 4. Systems in which the speed limitations of the output actuators form the bandwidth bottleneck (rudder and aileron controls).
This being admitted, i.e., not concerning ourselves for the moment with “wasted bandwidth,” we might give considerable weight to the advantages of TSP and BP (discussed in detail in the following sections) : 1. Simple arithmetic units. Arithmetic in TSP is extremely simple:
multiply and OR’S add. Arithmetic in BP is quite simple: Counters or current summing registers are the essential elements. 2. Error tolerance. Both TSP and BP use an unweighted representation and averaging: an occasional supernumerary or missing pulse does not show. 3. Constant availability of results. TSP with RC integrators and BP with summation in a length n register give a useful output at all times: even short observations can be acted upon.
AND’S
189
STATISTICAL PROCESSORS
4. Use jor numerical and communication purposes. It will turn out that TSP, but especially BP, can cope equally well with data transmission and voice (or video) signals. 5. Loose clocking and synchronization. Both TSP and BP interpret averages for processing-this usually eliminates clocking and “beginsignal” difficulties. 6. Reliability. The considerable simplicity mentioned under (1) leads to higher reliability. Checking (with local coding tricks, see below) is usually affordable. 7. Ease of multiplexing. Switching noise is eliminated and reassembly of frames ( = blocks in BP) is easy, as in all PCM systems. Both TSP and BP are forms of PCM and as such partake of the wellknown PCM advantage : easy signal renormalization in long-distance transmission. (Whether a distance is “long” is determined by the noise of the environment) Our statistical systems are doubly noise-proof-pulses are usually unaltered, but if a few are altered, we can live with the result. It is also worth considering the transducer question. The atomic (Le., quantized) nature of our universe demands that some sort of integration be performed at the input (a photocell’s capacitance integrates current spikes corresponding to photons!). If we used the random pulses before integration, we would have a natural stochastic system with maximum information output a t any given time. The future might bring such systems. Leaving such possibilities aside, we have shown in Fig. 1 in graphical WEIGHTED BINARY TRANSDUCER
MICROOMPUTER
AID
SPEED
NOISE SENSITIVE
TRANSDUCER
A/S
SPROCESSOR TO 1TOLERANT
LLW
SPEED
S/A
190
W. J. POPPELBAUM
form how BP (called burst) and TSP (called stochastic) compare with wcighted binary in cost. It has been assumed that the input signal is dc-like, so that converters are necessary. The relative complication is shown by the size of the boxes.
2. An Overview of Time Stochastic Processing
The fundamental idea of TSP is to use the probability of appearance of a pulse in a given time slot as thc information carrier (Afuso, 1964; Ribeiro, 1964; Gaines, 1967a, c; Poppelbaum et al., 1967). Now the probability p is defined experimentally by considering t,he frequency of occurrence of an event (“pulse in a time slot”) in the limiting case of a n infinite number of time slots. If there arc n pulses in N slots for a given wire, and if n / N tends towards a limit as N --+ Q, , we set p = lim ( n / N ) . n-
w
This means, of course, that for a small number of time slots we may obtain an erroneous assessment of the probability and the number it represents. In Fig. 2 we show [for a synchronous random pulse sequence (SRPS) , i.e., a sequence with identical pulses which, if they occur, occur with a standard shape within regularly clocked time slots] what can happen. On top we see threc pulses in ten time slots, leading to the conclusion that the number transmitted is .3. In the middle we have an alternate arrangement (equally likely), leading again to -3. If now we just take the first three slots of th e
3 IN 10 (AVERAGE)
+ 3
DIFFERENT (AND EQUALLY LIKELY) ARRANGEMENT OF 3 IN 10 (ALSO 4 . 3 )
SHORT SEOUENCES MEAN GREAT UNCERTAINTIES
FIG.2. Uncertainty and sequence length. Representationof a number by the average frequency (probability) in a synchronous random pulse sequence (SRPS).
191
STATISTICAL PROCESSORS
w
0
0-
q=4
FIG.3. Stochastic multiplication of two SRPS’s by digital means in an AND. Stochastic processing uses digital circuits in small numbers to achieve arithmetic operations.
middle sequence, as shown at the bottom, we would arrive a t the conclusion that we are trying to transmit .66 . . . , i.e., 2/3. It follows that short sequences are u n ~ r u s t w o r t ~ must y ~ e average over many slots. The fundamental trick of stochastics is to use th e theorems of p r o ~ a ~ z l ~ t ~ theory to per-fomn arithmetic. In Fig. 3 this is shown for multiplication. The top sequence hitting the AND corresponds to probability .5, the bottom sequence to .4.The probability of an output pulse is the probability of having both incoming time slots occupied simultaneously. If there is no causal relationship between the two sequences (one could think of something as ridiculous as cutting out every fifth pulse of the top sequence to obtain the bottom sequence!) , i.e., if they are uncorrelated, the probability of simultaneous occupancy is the product of the two input probabilities (Petrovic and Siljak, 1962), i.e., .2. We have cheated ever so slightly in Fig. 3 by arranging the pulses in such a way that the assessment of probabilities can be made on a small number of time slots. Encouraged by multiplication, we might now turn to the design of Fig. 4 for addition (Esch, 1969a). Something should warn us that the OR circuit q = 1, i.e., all slots of the output would have to be will not work. Here p occupied. Only in the extremely rare case of the two input sequences q for p = .8 “meshing” would this actually be possible. Furthermore, p and q = .9 would have to give p q = 1.7-a number no longer representable by a probability. We have overflow. Figure 5 shows a way out of the dilemma. We first scale the increasing sequences by a factor .%, i.e., we multiply by a “masking sequence” of probability .5. But we use a trick: the masking sequences, although both representing .5, are complementary, i.e., where one has pulses, the other one does not, and vice versa. This
+
+
+
192
W. J. POPPELBAUM
I
o oo : Io I nn-
.
I I 1;:
It; I
lo1 01 01 I 01 SHOULD
BE COUNTED
means that the products cannot have a n y overlapping. pulses and can be piped into an OR for summation! Note that our scaling also eliminates the overflow problem mentioned above. We must now turn to subtraction and division. It is entirely possible to subtract an SRPS from another SRPS if we agree to store subtrahend pulses when no minuend pulses are available, and if we “balance accounts” as soon as possible. Certain questions about the randomness of the result arise (Afuso, 1968), but it turns out that no damage is done by the pro-
p: 6
2:
5
i =5
9:4
2”d SMUENCE
FIQ, 5. Workable stochastic adder. Scaling of sequencm by two complementary “masks” as a means for addition by an OR. The multiplication by the “mask”and the “complementary mask” eliminatm the overlap problem and allows “meshing.”
193
STATISTICAL PROCESSORS
cedure. Obviously this method corresponds to the “signed absolute value system,” and it therefore suffers from all its disadvantages: we have to know which number is bigger before we start the operation. I n an SRPS the answer may be available only after averaging for a considerable time, and it seems altogether preferable to use a linear remapping (like t,he two’s complement in normal binary), so as to represent both positive and negative numbers by probabilities between 0 and 1. One of the many methods is to remap a number a in such a fashion that it has a machine representation a’ given by (Esch, 1969b) a’ = (1 - a ) / 2
(-1
5a5
+l).
(2-2)
Symbolizing an SRPS corresponding to value a and representation a’ by A = ( a , a ’ ] , it is casy to verify that the following identities hold:
A AB
= (-a, 1 - a’) =
{ l- $(I - a ) (1 - b ) , a’b’)
+ a ) (1 + b ) - 1, a’ + b’ - a’b’) (when a’b’ = 0) ( A V B)disj= ( a + b - 1, a‘ + b’l AC V AB = (1 - i ( 1 - a ) (1 - C) - $(1+ a ) (1 - b ) , + (1 - a’)b’) A @ B = (ab, ~ ’ ( 1 b’) + (1 - a’)b’) (C = B ) Z B V ZC = ( $ ( b + c), +(b’ + c ‘ ) ] ) (2 = (0, 31). A V B = ($(1
U’C’
(2-31
Here 2 is the result of passing A through a NOT, and A B that of ANDing the sequences A and B. ( A V B ) d i s j means that we OR sequences A and B and assume that their pulses never coincide. This is accomplished specifically jn Eq. (2-3) by Aming B and C with sequence 2 having on the average half the slots filled, and with its complement 2 which has the other half-filled. Now sign inversion is done by a NOT (adding the negative gives subtraction), multiplication by a n EXCLUSIVE OR, and addition (with a scaling factor 3) by ANDing with Z and 2 and oRing the result! Note that a special (and easily generated) case for Z and Z is the mask and complementary mask in which every second slot contains a pulse, i.e., a (deterministic) periodic mask. Randomness is not destroyed by using a periodic sequence for masking. I t can also be shown (Afuso, 1968) that division can be done successfully with little circuitry. Here, however, the result unfortunately is no longer random, i.e., bunching occurs because of the form of the algorithm. In all practical designs one extends the remapped SRPS system into a two-wire
194
W. J. POPPELBAUM
system, using a numerator and a denominator (remapped) SRPS. Division then corresponds to renaming the numerator sequence of the divisor denominator and vice versa and multiplying, by EXCLUGIVE OR'S, the new numerators and denominat>ors. 3. Fluctuations and Precision of Stochastic Sequences
If we consider m time slots, and the probability of appearance of a pulse in a slot is p (independently of whether previous slots were filkd or not), we would expect fi = mp to be filled. (We shall not discuss the refinements necessary to make f i an integer for small values of m.) In a practical measurement we would, of course, find n f f i filled and attribute this to the random fluctuations of n around f i for finite sequences. It is therefore necessary to assess the probability P,(n) of having n slots filled in a sequence of length m. The theory is well known (Ryan, 1971). The probability of having n slots in a row filled is p", the probability of having the remaining ones empty is (1 - p)m-". Since we can drop the condition "in a row," there are actually m!/m!(m - n) ! ways of obtaining n out of m, i.e., P,(n)
- n)!]p*(l - p)"".
= [m!/n!(m
(3-1)
Direct inspection verifies that P,(n) is effectively maximized for fi = mp. Setting p = n / m we can rearrange (3-1) to read
()L Z).
(
P,(n) = 1 - -
1--
. . (1
- n+)
and it is clear that for small p and large m, A -+ 1, B and we obtain the Poisson distribution
x (I
-+
-
;)"
1, C -+ exp(--ri)
P,(n) = e - y (fi)"/n!].
(3-31
Now it is well known that by using thc Sterling approximation
+ +) I n n - n + $ I n 2 1 as well as the approximation In (1 + = t - (22/2), we obtain Inn!
=
(n
(3-41
2)
Pm(n)
= I / ( ~ T A ) " ' expi-[
i.e., a Gauss distribution. Calculating an integral, we scc that
2
(2)= Fi +
(n - ~ ) ~ / 2 f i ] )
(3-5)
= z ; n 2 P m ( n ) in the limit by
(6)2
(3-6)
195
STATISTICAL PROCESSORS
I I I
I I
I I I I
I I
I I I I I
I I
A
’
3x
2A
WAp
FIG.6. Standard deviation and confidence level.
i.e., the standard deviation p2 =
p
in (3-5) is given by
(n - f i ) z
=
(2)-
(fi)2
= fi.
+
(3-7 1
+
Now we know [by integrating (3-5) between fi p and fi - p, fi 2p and fi - 2 p , etc.] that the probability P of obtaining a result between fi - Xu and f i Xp is given by the curve of Fig. 6. If we are willing to live with a fluctuation of k p , 68% of all experiments would be acceptable for measuring f i and hence mp = fi. If we remain with this standard of precision, we have for the relative error p (relative to the average!)
+
p = p/fi =
1/5.“2
= 1/ (mp)”2=
(l/p1/2)(l/ml/Z).
(3-8)
Finally, we can assume the first bracket of order unity. (This slightly contradicts the assumption that p is small) and obtain as an average percentage error R
R
=
100/m1’2.
(3-9)
On this basis we made our statements on precision in Section 1.
4. Generation of Random and Quasi-Random Sequences
The generation of SRPS’s with controlled average frequency was originally entrusted to the design shown in Fig. 7 (Afuso, 1968). We use a noise diode to generate random noise (actually a high-pass filtered version) and detect the number of occurrences of noise spikes (sequence A) higher than an adjustable detection level. We then normalize the “above threshold” part of the noise in height (sequence B). Then we sample by a regular clock (sequence C) , and place a pulse (standardized in height and length) into the next slot if and only if the sampling device has detected a pulse in
196
W. J. POPPELBAUM
the preceding slot. The resultant sequence D can be shown to be completely random, yet its frequency is controlled by the detection level! Nonlinearitips can be climinated by nonlinear amplification of the analog input. We can completely eliminate the noise diode if we are willing to accept less rigid standards of randomness (Golomb, 1967; Marvel, 1969). It is clear that if a pulse sequence repeats itself after such a long time that all calculations are over before recurrence, and if during the calculational time “things look random enough,” such a quasi-random sequence can be substituted for a true random sequence (Korn, 1966). What constitutes “sufficient randomness” must be analyzed statistically for a given operation (e.g., multiplication) and given sequence length. Usually adequate randomness can be obtained by using an n-bit shift register with logical feedback in a quasi-random number generator. If this feedback is correctly chosen, one can obtain up to 2n - 1 different combinations, i.e., all combinations except the “all zero” combination (Tausworth, 1965; Anderson et al., 1967). Such a maximum length sequence has the property that onehalf of all subsequences are of length 1 (Le., “0” or “l”, followed by a change) , one quarter of length 2 (i.e., “00” or ‘‘ll”,followed by a change) , etc. Furthermore, the autocorrelation is low, unless we shift by one complete period! I n Fig. 8 we show the generation of a maximum length sequence for n = 5. If we put the register contents through a D/A (usually with some crossovers) , we obtain a quantized quasi-random analog signal (QQRAS) which can be used as “quasi-noise.” If this is done, the SRPS in D of Fig. 7 can be obtained by direct comparison of the analog input SRPS
1
ANALOG INPUT (DETECTION LEVEL)
A
FIG.7. Simplified SRPS generator.
197
STATISTICAL PROCESSORS
I
VOLl QUA!
f
f
4
8
4
2
16
1
I OF OlSE
i
OPRAS OR PUASI-NOISE
FIG.8. Generation of quasi-noise.
signal with this quasi-noise. Should the input be available in digital form, we can decide whether its binary representation is bigger or smaller than the contents of the shift register in the quasi-random number generator by simple subtraction.
5. Examples of Time Stochastic Machines
POSTCOMP (Portable Stochastic Computer) was our first attempt to realize the principles discussed in Sections 1 4 in hardware form (Esch, 1969a). Actually a “signed absolute value” of a number was mapped onto the probability of appearance of a pulse in a given time slot, and its sign was carried separately by an auxiliary wire. All four operations (+, - , + , and X ) could be done. The output was integrated and read off appropriate dc meters.
198
W. J. POPPELBAUM
RASCEL (Regular Array of Stochastic Computing Elements) (Esch, 1969b), whose block diagram appears in Fig. 9, used a much more sophisticated number representation than POSTCOMY. First of all, the two-wire (numerator wire, dcnominator wire) system was used to facilitate division, but each wire used the remapping system explained above, such that numbers between 0 and 1 had a machine representation between 0 and .5, while those between - 1 and 0 wero represented by probabilities between .5 and 1. Figure 10 shows a photograph of RASCEL. The triangular structure on top represents the tree arrangement of successive layers of stochastic processing elements (each capable of the four fundamental operations). The input is formed of nine stochastic sources (adjusted t o arbitrary values, representing anything from - 1 to 1 ) . Neon bulbs indicate which point on the tree is being sampled by the output counter: Sampling times of 1, 10, 100, and 1000 msec are available. Besides proving that stochastic elements could be cascaded to considerable depth, RASCEL also turned our attention to the attrition problem, which is similar to the fact that in a fixed-point computer successive multiplications make the number processed (and therefore its accuracy) smaller
+
SAMPLING COUNTER
-4
A
ARRAY OF INDICATORS CABLE
B
-1
4
FIG.9. RASCEL block diagram.
POWER
STATISTICAL PROCESSORS
199
FIG.10. Photograph of RASCEL.
and smaller. Different methods of circumventing this difficulty (by random duplication) were implemented. Figure 11 shows the fundamental idea of TRANSFORMATRIX (Poppelbaum, 1968; Wo, 1970; Marvel 1970; Ryan, 1971). It takes an n X n input matrix of points in the ij plane, the intensity in (i,j ) being xij. Using n2 coefficient matrices b ~ i(jk = 1 . . . n, I = 1 . . . n ), it forms on-line the most general linear transform, i.e., YLZ =
C
bklijxij
ij
and displays y k l on an n X n output matrix in ( k , 1). The interesting point is that stochastic arithmetic units allowed us to realize the case n = 32 in
200
W. J. POPPELBAUM
INPUT MATRIX (USED IN PARALLEL) I
COEFFICIENT MATRIX (USED FRAME BY FRAME)
OUTPUT MATRIX ( PRODUCED SEOUENTIALLY)
e
Fro. 11. TRANSFORMATRIX principle.
hardware, i.e., TRANSFORMATRIX has 1024 parallel arithmetic units which form bktijxij at once, and within less than 30 psec. This means that the output (1024 points) is refreshed every 1/30 of a second and is therefore flicker-free. [Improved circuits (Lee,a 300-MHz clock instead of a 10-MHs clock and a 1-sec refresh cycle) would lead to a lo6 point display.] TRANSFORMATRIX is, without any doubt, the world's most parallel processor ! The usefulness of the general linear transform is greater than meets the eye: it contains (1) translation, rotation and magnification; (2) Fourier transforms; (3) mask-correlation for pattern recognition; (4) genera1 convolutions. Operations (1)-(3) can all be performed on-line, using as the input the display on a TV screen. In the case of pattern recognition, a set of potentiometers (in a matrix) allows one to assign weights to a 5 X 5 window; this window is then scanned across the input and correlated for all 32 X 32 positions of its center. TRANSFORMATRIX is also the first machine to use quasi-noise (see Fig. 12) to encode the outputs of the phototransistors in the ij plane. Furthermore, the machine uses on-line calculation of the coefficients bklij, assembled in 1024 matrices b l , l , i j . . . ban,a2,ij. It turns out that this is cheaper than accessing a memory at the high speed that is required, at least for operations (1), (2), and (3). The calculation of the b k l i f ) s in the Fourier transform case is possible because of their periodicity.
cij
STATISTICAL PROCESSORS
201
202
W. J. POPPELBAUM
Figure 13 shows a closeup of TRANSFORMATRIX, while Fig. 14 shows bhc Fourier transform of the letter E . It should be mentioned that the acid test of the system is actually obtained in the coordinate transforms. It is herc that sharp dcfinition proves the absence of noise! The APE (Autonomous Processing Element) system (Poppelbaum, 1972; Wo, 1973) is an array of stochastic computers (each capable of addition, subtraction, multiplication, division, integration, differentiation, and storage) communicating with each other on rf channels and powered by light or microwave energy. The ultimate goal is (1) to produce a “computer in a bag,” i.e., integrated circuit chips connected (under the control of a n operator) in the desired topology; (2) to prove the feasibility of satellite computers which one can reconfigure after an “accident.” It would, in the limit, bc possible to have free-floating APE’S, each one a satellite in its own right. The limited power (<100 mW) one can absorb with a light-panel or a microwavc slot antenna, and the relative complication of a n APE as shown
FIG.13. Overall photograph of TRANSFORMATRIX.
STATISTICAL PROCESSORS
203
FIG.14. Fourier transform of the letter E by TRANSFORMATRIX.
in Fig. 15 (>400 transistor-equivalents) , make the use of CMOS mandatory. The setup of the system is accomplished through a controller, which can send out directives on any of n (in our example n = 14, but n = 1000 is quite feasible) channels. Actually receiver B, upon being hit by a directive (FM at frequency vk), produces a priority interrupt and sends the instruction signals to the function decoder. The instruction contains not only the t,ype of operation to be performed, but also bhe AM frequencies on which the two input numbers are to be received. (Note that each element is uniquely Characterized by the output frequency Yk of its AM transmitter). The system has two interesting properties built in. First of all, it can operate with an arbitrary number of elements characterized by the same AM output frequency. Duty cycle encoding makes all these outputs rise and fall in synchronism, the “begin” signal being furnished by a clock broadcasting at a (systems) frequency vc. Since an element is named after Vk, the parallelism of several elements corresponding to V k extends to the operation performed and the AM frequencies to which the inputs are tuned; nobody will ever know about this multiplicity. Similarly it is trivial
204
W. J. POPPELBAUM DATA INPUT AM ATV,
cLr i
DATA WTPUT AND ALIVE TESTING REPLY AM ATVk CLOCU TRANSMITTER
DATA INPUT AM AT V PROGRAM I);STRUCTIOd INPUT, AM AT
+-4
SEPARATOR
FIG.15. Block diagram of APE. (by reassigning ordinal numbers k ) to operate in the absence of a given k (+ V k f . As built by us, the system has some “input transducers,” which send their outputs to a subset of APE’s; these in turn feed other APE’s. The
FIG.16. Photograph of APE system.
205
STATISTICAL PROCESSORS
final output is made available on the controller or on the teletypewriter. The controller can also remotely check all intermediate numbers and even the functioning of a given APE. An operable APE sends back an “alive” signal on demand. Figure 16 shows a photograph of the system.
6. Bundle Processing and Ergodic Processing
Bundle processing (Ring, 1969) maps the time slots of time stochastic processing onto the wires of a bundle. Now the probability of being “energized” (i.e., a t a “one” level) at a given instant, for any wire in the bundle, is the carrier of numerical information (see Fig. 17). Although most of the thoughts on stochastic processing can be carried over into bundle processing (e.g., multiplication can be obtained by ANDing pairs of wires of the incoming bundles!) , there is a fundamental difference-bundles contain a finite number of wires, perhaps 100-1000, while stochastic sequences have theoretically infinitely many slots. The accuracy of a bundle is therefore limited once and for all. But while 100 slots give about 10% accuracy, 100 wires can give 1% accuracy (referred to the maximum number) if we are clever enough in encoding; things are therefore not quite as bad as one might fear. Furthermore, bundles are intrinsically failsoft (one wire more or less does not matter) , can be processed in a distributed manner (see below) , and furnish a result immediately, i.e., without waiting for some time-averaging process. In order t,o be able to draw bundle processors efficiently, the notation of Fig. 18 is used. Visibly the logic (or other) function performed on the pairs of incoming wires-one wire in each pair from each one of the bundles-is summarized by a double circle around the standard logic symbol, +FIXED
No. H OF TIME SLOTS
0 1 0 0 0 1 1
~
h” 1’s”
X . hH
FIG.17. Relation of time stochastic processing to bundle processing. Mapping of a
SRPS onto a bundle. A SRPS uses serial processing; a bundle uses parallel processing.
206
W. J. POPPELBAUM INPUT BUNDLE 2
INPUT BUNOLE 1 (
H
WIRES, h , " l ' r " )
( t i W I R E S , h,"1':")
OUTPUT BUNDLE
IH WIRES, h
"1'8")
MAPPING
xI: h , / H
xzi h2/H
t = h/H
EXAMPLE
FIG.18. Notation for bundle processing.
while the bundles are indicated by very heavy lines. Figure 19 shows some possible bundle opcrations. We can invert, AND, OR, merge, halve, etc. The results are not always useful. Note that the halving operation is necessary after merging, because the merged bundle has twice as many wires as the original ones. If we cut the merged bundle in half in a random fashion (e.g., sawing through the upper half), a random distribution of "ones" will stay random in the remaining half! Just as in stochastic processing, the introduction of negative numbers leads to remapping,i.e., the bundle's cross-sectional average x (0 5 x I1) is made LL linear function of the number y (- 1 5 y _< +1) to be represented. Of course a signed absolute value system is possible, but certainly not attractive-for the same reasons for which we shunned it in time stochastic processing. Figure 20 shows some operations after the remapping y = 2s - 1 (y = 1 - 2x would be just as acceptable). Note that multiplication now again necessitates a more complicated method than just ANDing d l pairs. We have to merge and halve several times, and the result is scaled by 1/8. Division brings up new problems. Again the two-wirc system of stochastic processing inspires us to use ratio bundles, i.e., t o interpret the number to be represented as the (remapped) denominator, divided by the (remapped) numerator. Figure 21 shows some examples of straight, remapped, and ratio representations. The remapping used is actually the alternatey = 1 - 2x (with y = a a n d x = a') mentionedabove.
207
STATISTICAL PROCESSORS
The ratio bundle method leads to a most attractive idea when we add the additional hypothesis that numerator bundles and denominator bundles are equally vulnerable (Coombes, 1970). This hypothesis can be satisfied if we assume that each numerator wire is paired off with a denominator wire (perhaps by twisting them together), so that damage to one will lead to damage to the other. It is then clear that we have not only a failsoft system (i.e., tolerant of a few broken wires), but a nearly failsafe system (i.e., tolerant of a great number of broken wires). Figure 22 shows this idea, without actually using remapping techniques for numerator and denominator. Figure 23 shows-in the bundle notation discussed above-how two numbers can be added by distributed circuitry. Any wire or semiconductor is nonessential, and as long as a reasonable number of transistors, wires, etc., remain intact, the output can be trusted. We have here a n example of a truly difused computational system. Ergodic processing (Cutler, 1974) simply combines bundle processing with the ideas of time stochastic processing, i.e., it uses the wires in a bundle as carriers of pulses. If the time average of the pulses on any wire is made equal t o the average number of wires energized a t any given time, we shall call such a bundle an ergodic bundle, or more simply a n ergodic. Ergodics, b y their very definition, have the remarkable property of carrying with them their own “confidence indicator,” i.e., if time averOPERATION
INVERT SIGNAL
RESULT
SYMBOL
+x
a?
=l-x
t =
x,x,
OR PAIR OF SIGNALS
t=
x1+x*-x1x2
MERGE BUNDLES AND HALVE
t = $X,+X,)
AND PAIR
OF SIGNALS
x2 Y
MERGE WITH ALL ZEROS AND HALVE X
MERGE WITH ALL ONES AND HALVE
Fro. 19. Some bundle operations. All operations are normalized to result in a standard bundle of H wires. X = h / H , h = number of “1’s.’’
208
W. J. POPPELBAUM
I y = 2~ -1 J OPERATION
SYMBOL
INVERT SIGNAL
Y+W
MERGE AND HALVE
RESULT w=-y
w = +(Yl+Y*) Y2
AND PAIRS
IMPLEMENTATION OF ARITHMETIC ADDITION
y"
MULTIPLICATION Yz
FIQ.20. Remapped bundle proceesing.
age (for any wire) and space average (for the bundle of wires) do not coincide, something must be fundamentally wrong! The interesting property of ergodics is that all the bundle processing operations are still valid: the ergodic property is processed automatically. This really means that the checking feature described above is nearly free of charge. I n order to discuss dc (analog) signals, stochastic sequences, bundles, and ergodics in their mutual relationship, it is useful to introduce the symbolism shown in Fig. 24. It should be noted that thin lines represent wires or electronic devices, while thick lines represent bundles and sets of electronic devices (attached to each wire of a bundle). To continue the
209
STATISTICAL PROCESSORS
-
REMAPPED:
h "oms'
0'
=
+
-
(a' = 0.751
a=
1-20'
(a = - 0 . 5 0 )
hn"mi"
I
(a;=
hd
(ad = -0.~01
0.75)
"onoi1
FIG.21. Straight, remapped, and ratio bundles (all failsoft). Each bundle haa H wires, a' = machine representation, a = actual value.
BREAKS CAN BE TREATED AS
PRINCIPLE
-
PROBABILITY OF NO
i
P(O1.I-n
RATIO PROPERTY
'b"I I'
'0'5"
BREAK=^
GROUND (APPLIED ON BREAK) VOLTAGE # TO GROUND
I
\
THE RAT0 OF "1's" IS NOT AFFECTED BY BREAKS
FIG.22. Failsafe bundles (ratio without remapping).
210
W. J. POPPELBAUM
FIG.23. Failsafe bundle addition.
l o ) Sinqle Wtre
A
l b I Smgle Reslslor
I c 1 Single Capacitor
III Eundlo
-
-c
l k l Div8rmn of Each Wire of o Bundle info Two ond Rebundling
l p l Ovanlized Random Pulse Soqurncc I O R P S I
OUf
+
( 9 1 Resislor m Each Wire of a Bundle
L II
1 h I Capacilar In Each
(ml Rondomizer
( r 1 Bundle Version 01 (q I
( n l Ergodic Bundle
(II
Bundle /Single Wire I n l e r f a c e
l g ) Comparator With Ouf :1 III~@>I~@
wore 01 0 Bundle
D
( d ) Single Difference Amplifier
1 t I Difference Amplifier in Each Wire 01 a Bundle
( e l Loqic Funclton F
II I Loqic Function F an Loch Wire 01 o Bundls
Bundle Synchronous Scanner [Romappsr)
I
4( 0 1 Ergodic Slmbe
II I Logoc Funclion F Performed On A l l Wires Used
FIG.24. Symbols for dc, stochastic, bundle, and ergodic processing.
05
Inpul
21 1
STATISTICAL PROCESSORS
Bundle Average
r----1
out I; 1 Means Danger )
_______J
L
Time Average
FIG.25. Ergodic checking circuit.
usage introduced for bundles, there is an exception to this rule: logic symbols and the symbol for a difference amplifier are simply drawn double to indicate a bundle operation. Special mention should be made of the ergodic strobe (see Fig. 240). Here we simply have a sample-and-hold circuit on each wire. Strobing is done once, a t an arbitrary time. The bundle synchronous scanner can be thought of as a disk carrying a contact (at the circumference) for each wire of the incoming bundle. A similar disk is connected to the outgoing bundle and rotates uniformly with respect to the first one. It is easily seen that this remaps any given input wire into all possible output wires in some sequence. The remarkable thing is that this remapping actually gives, on each output wire, a pulse sequence with a time average equal to the average over all wires at the input. This means that the synchronous scanner transforms a bundle into an ergodic. Obviously practical systems will replace rotating disks by circularly shifting registers. Using the symbolism of Fig. 24, we can now draw an ergodic checking circuit in the form given in Fig. 25.
7. Examples of Bundle and Ergodic Machines
The bundle processing methods discussed in the last sect'ion map (a finite number of) time slots onto the wires of a bundle. Figure 26 shows a photograph of BUM (Bundle Machine) (Ring, 1969), a POSTCOMPlike device, except that the simple and cheap time-stochastic methods are re-
212
W. J. POPPELBAUM
FIG.26. Photograph of BUM.
placed by the relatively expensive ones of bundle processing. The lower meters give the (positive or negative) input numbers A and B, while the light matrix shows the bundle representation. The right-hand meter indicates 0 and the bundle representation is .5 (half the bulbs ON!). The upper B, A X B, and A - B , respectively. meters indicate A The extraordinary property of the machine is, of course, the fact that mat2 numbers of wires and transistors can be removed without materially influencing the end results: we have a failsoft system. SABUMA (Safe Bundle Machine) (Coombes, 1970) carries the failsoft feature of BUM one step further, i.e., SABUMA is nearly failsafe: while
+
213
STATTSTICAL PROCESSORS
RUM can tolerate only a small number of internal disasters, SABUMA survives as long as about 10% of all wircs and transistors are still alive. The fundamental principle of SABUMA was discussed in Section 6, i.e., the idea of representing numbers by ratios. If n1 wires (out of N ) are energized in a “numerator bundle” and dl wires (out of N ) are energized in a ‘denominator bundle,” the ratio x1 = nl/dl will remain constant when the probability of safe transmittal, p , is the same for both bundles. As was also pointed out in the last section, it is primordial t o prove that we can do arithmetic with ratios of bundles in a distribzded arithmetic unit, i.e., one in which no one element is critical. Figure 27 shows the block diagram of SABUMA, and Fig. 28 is a photograph. As shown in the block diagram of Fig. 29, ERGODIC (Cutler, 1974) is made up of three units: the generators, the arithmetic unit, and the processor. A generator is simply a circular 64-bit shift register. Clearly, this method of generation of the signals for the bundle provides the ergodic properties automatically. The number of 1’s and 0’s loadcd into this shift register determines the specific number carried by the bundle, both as a time average and as a cross-sectional average. The arithmetic unit consists of randomizers and 74 arithmetic subunits. The randomizers are needed to assure that the two ergodic bundles are disjoint from each other; that is, if one wire in a bundle is energized, the corresponding wire from the other bundle is not energized and vice versa.
__+
-----, X l X 3
x1
A--,
-I
’-
--+
A-B
b
c--.
-
-
b
* x3 xz
xz ---. Input Analog To - Bundle Converters
A*B __c
Output Summing of Amps and Analog Dividers. Analog -To- Bundle
w
x3x2
D
B+
x2x4 x4
A*B
~
-
D
x4
-
D
A/B
FIG.27. SABUMA block diagram.
+
214
W. J. POPPELBAUM
FIG.28. Photograph of SABUMA. (This property is needed only for addition and subtraction.) The arithmetic subunits take one wire from each of the bundles and perform arithmetic operations. The circuitry necessary is very simple: we only need an inverter to obtain the inverse of the signal, a multiplexer for obtaining the signal or its inverse, three gates-AND, OR, and EXCLUSIVE OR-to do the actual arithmetic operations (multiply, add, and subtract) and another multiplexer to obtain the answer from the proper gate. The outputs of each arithmetic subunit are then collected to form the resultant bundle. Figure 30 is a photograph of ERGODIC.
DIGITAL INPUT
DIGIT4L INPUT
FIG.29. Block diagram of ERGODIC.
216
W. J. POPPELBAUM
Fro. 30. Photograph of ERGODIC.
8. An Overview of Burst Processing
If it is desired to overcome the relatively slow gain of precision with the number of pulses in a completely random system like TSP, one can forego the minimum complication of stochastics and go back halfway toward a deterministic system. This leads to a new form of processing called burst processing, which strikes an interesting medium between determinism and a statistical interpretation of the pulse streams. The idea of burst processing (see Fig. 31) is to perform very low precision arithmetic (typicaIly on single decimal digits) and to use appropriate averuging procedures to obtain higher accuracy (Poppelbaum, 1974). Figure 31 shows for instance how one can add 3.4 and 4.2 by decomposing 3.4 into a sequence of four 4’s and six 3’s, while 4.2 is decomposed into a sequence of two 5’s and eight 4’s. A one (decimal) digit adder then gives-as successive
217
STATISTICAL PROCESSORS EXAMPLE:
3.4 IS THE AVERAGE OF
4 4 4 4 3 3 3 3 3 3
4.2 IS THE AVERAGE OF
5 5 4 4 4 4 4 4 4 4
AVERAGE
THEREFORE
3.4 + 4.2
7.6
7.6
FIG.31. Fundamental idea of burst processing: the averages of low precision arithmetic operations can be made extremely precise. The problem is to convert numbers like 3.4 automatically into an appropriate integer sequence.
sums-two 9’s, two 8’s’ and six 7’s. The average of the latter is clearly 7.6, i.e., the sum of 3.4 and 4.2! The fundamental problem is obviously to produce automatically, in some circuit, the required integer sequences; we shall show below how this can be done. Figure 32 indicates some of the nomenclature and some of the conceptual extensions of burst processing. First, it is clear that any averaging procedure gives some error tolerance. This tolerance can be augmented if we decide on a representation of each integer (from 0 to 10, 10 included for reasons which will become clear below) which is unbiased, i.e., if we exclude such systems as classical weighted binary sequences (least significant digit first), as used in serial arithmetic units. In order to avoid difficulties in multiplication we map .no0 . . . onto a “burst” of n pulses in a “block” of ten slots: we obtain an accuracy of 5 . l . By averaging over ten blocks-a ‘Lsuperblock”-we obtain an accuracy 5.01. By averaging over ten superblocks-a “hyperblock”-we obtain an accuracy 5 .001. For simplicity’s sake we shall limit the discussion to superblock averages. The representation of negative numbers can be obtained by either signed bursts or a complementing system. FIRST APPROXIMATION(~O%):
4
BLOCK = 10 SLOTS
b
SECOND APPROXIMATION (1%):
P SUPERBLOCK = 10 BLOCKS
b
FIG.32. Principle of burst representation. The average over a block (= 10 slots) is exact to 5 .1; over a superblock (= 10 blocks) is exact to 5 -01;and over a hyperblock (= 10 superblocks) is exact to 5 .001.
218
W. J. POPPELBAUM
In order to maximize maintainability and ease of training it is proposed to build the whole burst system out of a few standard elements. One of these is the “block sum register” (BSR), which is simply a shift register of length 10, each bit driving a current source of strength V / R , where all R’s are the same, but where V may vary from BSR to BSR. The input clock in Fig. 33 should be construed to accept irregular gating pulses, as long as these do not occur at more than the clock frequency. A clear input allows initializing. It is now important to note that as long as we use the (summing) output of a BSR, i.e., as long as we essentially agree to 10-state logic (alternatives IN CURRENT
JMMING
BUS
*+..
...
...
...
I
V (SOURCE STRENGTH CONTROL )
OUT NT OUT - NUMBER OF ONES
SYMBOL:
X
%
+OUT
IN-+
c
CL
(CLOCKlltLEAR~
FIG.33. Block sum register. The BSR functions an integrator over one block. Its indication is independent of the position of the burst.
STATISTICAL PROCESSORS
219
.4
.4
FIG.34. Constant block sum property of periodic bursts.
will be discussed below) we have a constant sum as long as the bursts remain equal (see Fig. 34). If we then design our system to use only the output of BSR’s in all arithmetic operations, it is no longer necessary to synchronize information coming in on separate channels, e.g., addend and augend! Thus burst processing transmits numerical data in a PCM mode, is noise tolerant, and does not necessitate synchronization. Of course the efficiency of the encoding in the 10-slot design (which could represent roughly 1000 numbers in weighted binary) is somewhat lower, since a block can only represent 10 numbers. In many applications this inefficiency is completely overshadowed by noise tolerance and nonsynchronization. Note that it is, of course, entirely possible to use blocks of five or four slots, giving higher relative efficiency (4 : 16 ratio for four slots). Note that by using a stairstep encoder and a BSR as a decoder as shown in Fig. 35, it is trivial to transmit audio and video signals by bursts. The interesting point is that the BSR acts like an integrator which loses its memory after ten clock pulses. Furthermore, i t is clear from the fact that burst processing is a form of PCM, multiplexing is relatively easy. As a matter of fact, i t is possible to attach to each block appropriate flags (a negative-pulse burst for instance) by which blocks can be sorted, redirected
220
W. J. POPPELBAUM
BURST ENCODER MICROPHONE
1 !
TRANSMfSSlON
10
3
0
BLOCK SUM REGISTER SPEAKER
FIG.35. Encoding and decoding of bursb: burst processing for low cost PCM. Additional hardware over pure analog: encoder-1 BSR 1 comparator; decoder-
+
1 BSR.
and reassembled. Again thc absence of synchronization problems makes the design much easier. As mentioned above it is of paramount importance t o produce thc sequcncc of bursts which represent a number of highcr prccision than .1 by a n automatic device. A simple circuit is a combination of two BSR’s successively filled with “ones,” in which one BSR-the “vernier register’’runs a t 1/10 the speed of the main BSR, i.c., the “ramp register.” Each time the ramp register is cleared (when the “ones” attain the most significant digit), the vernier register advances by one shift. As shown symbolically in Fig. 36, the vernier register adds to the output current of the ramp register one-tenth of one step of the formcr. Normalizing the combined output current (or the voltage at P) to 1, the table in Fig. 36 shows how the slow shift upward of the steps produces first the longer bursts, then the shorter ones and precisely thc requisite number! All that is necessary is to compare thc voltagc in P with the (dc) voltage to be encodedalso normalized to 1-and t o switch on the clock pulses by an AND gate as long as the voltage to be encodcd (.32 in Fig. 36) is higher than the comparison voltage. It is to be noted that in case of many voltages to be encoded in one Iocation, one vernier encoder is sufficient: since the result of the encoding, i.c.,
221
STATISTICAL PROCESSORS
the burst sequence, is always assessed by an integrating BSR, no nefarious correlation effects can possibly occur. Figure 37 shows the extremely simple layout of a n adder. It consists of two BSR’s working into a common bus, the result being reencoded by a vernier encoder. (Closer analysis shows that actually a fixed 10-step encoder is sufficient.) Note that this figure represents a 10-state logic circuit. It is generally felt that the simplifications that can be obtained by going to multistate logic are now within the reach of semiconductor technology. Nevertheless it should be emphasized that one can operate without BSR’s in the processing sections. Figure 38 shows a logic burst adder, in which we simply store the coincidences of the sequences A and B in a shift register and proceed t o add them a t the end of the OR sum formed in the rightmost OR. Overflow must, of course, be prevented by scaling! It is quite trivial to generalize both the BSR design and the logic design to subtraction. I n the former case we take the current difference of two BSR’s and reencode; in the latter case we subtract the minuend pulses logically.
0 IN
( DC TO
BE ENCODED I
0 EXAMPLE OF VOLTAGES AT P :
v, / 10 SYMBOL : BURST SEOUENCE
1STzCK
ZNOAOCK
3RDIOCK
.oo
.01
.02
.10 .20
.ll .21
.12
I
.22
.32
.30
.31
.40
.4 1
,42
30
.51
.60 .70
.71
.80 .90
.81
.52 .62 .72 .82
.91
4
4
.61
FIQ.36. Vernier burst encoder.
. . . . . . . .l O T H O C K ........
.09
........
.29
........
.19
........
.39
........
.59 .69
........ ........ ........
.49
.79
.92
........ ........
.99
3
........
3
.89
222
W. J. POPPELBAUM
-
A
A
- REGISTER
B
B
- REGISTER II
FIQ.37. Block sum burst adder (subtractor). Remarke: (1) Scaling (overflow) can be obtained by modifying R; (2) the superblock average of the output is equal to the sum of the superblock averages of A and B; (3) because a burst sequence is quasi-periodic, the current output of the block s u m register is a constant. Actually, we integrate without an RC circuit; (4) by taking current differences, we obtain a subtractor.
In case the logic version is used for arithmetic processing, the BSR’s are still necessary for the encoding of the incoming (transducer) signals. This is, however, no more (nor less) complicated or costly than one wouId expect in any A/D converter: we always need precision elements at this point! Assuming that a BSR gives a 1%accuracy, it is trivial to form a “cascade multiplier” by using two BSR’s in which the output of the first (on topsequence B) provides the V for the second (on the bottom-sequence A). Reencoding is done by our standard vernier encoder, as shown in Fig. 39. A trick is required, however, if the output of the encoder is to give us .01 accuracy or better: we must make sure that we actually multiply the superblock averages of both sequence A and sequence B. It can be seen A
1
FIQ.38. Logic burst adder. The logic subtractor is even simpler: we simply cancel the first fewpulses of the minuend until the eubtrahend pulses are accounted for.
FROM VERNIER REGISTER IN ENCODER
TIME -SLIPPED
I10 LINES
SEQUENCE B
t-l
I
I
FIG.39. Time slip cascade multiplier. The burst sequences at P and Q are identical under normal conditions.
that if the blocks of sequence A vary slowly, the “time slip register” of Fig. 39 provides the appropriate averaging: we “trap” a whole superblock of B in a series of ten shift registers (a CCD design is possible) and multiply sequence A with all possible block values of sequence B. It is interesting to note that if the value represented by B is constant, the superblock is repetitive (or quasi-periodicfor slow variations), so that points P and Q give the same information! It can be seen that if sequence A has highly variable blocks, some provision must be made to average. A possible design would use ten BSR’s, at the bottom, each one contributing to the total output current. It should also be mentioned that a logic multiplier has been designed as a backup. Division of two burst sequences is obtained by the design in Fig. 40. It consists of a cascade multiplier (middle and bottom BSR) , in which a trial quotient X is multiplied by the denominator B and compared to the numerator A . As long as the product is too small, additional “ones” are shifted into X . More detailed examination shows t>hatwe can again improve the precision by averaging. Here we add to X B the product of B by
224
W. J. POPPELBAUM
ENCODER USED A S VERNIER GENERATOR
B k !
TIME SLIPPED
CASCADE
ON-LINE
MULTIPLIER
B(X
A
T
iVERNIER)
FIG 40. Vernier divider. Principle: Ones are shifted into the X register until the comparator gives B(X f vernier) 2 A.
a vernier number, which increases the sum at the comparator by 1% of B for each successive block. Again time-slipped forms of 3 shouId be used. Here too a purely logic design has been prepared as a backup.
9. Preliminary Results in Burst Processing
Since work on BP began a short time ago, only some existence proofs have been established. There is no doubt, however, that the fundamental system concepts are indeed correct, and that both the generation and processing of bursts necessitate little hardware. Figure 41 shows the transmission of an audio signal through bursts; as can be seen, we are sampling at just above the Nyquist rate. The third trace from the top shows thc burst sequence and the bottom trace the reconstitution by a 10-bit block sum register. The relatively acceptable decoded signal is due to an appropriate choice of the sampling stairstep amplitude and, of course, the fundamental property of a BSR of losing its memory after tcn slots. Note that there is a nearly 90' phase shift between the audio signal and its reconstituted version. If smoothing is desired, it is easy to interpose an RC filter with a time constant equal to the clock-slot period.
STATISTICAL PROCESSORS
Fm.41. Audio encoding and decoding.
Fia. 42. Vernier encoding and addition.
225
226
W. J. POPPELBAUM
FIQ.43. Burst multiplication.
Figure 42 illustrates simultaneously the encoding of 3.3 and 1.2 by a vernier encoder and the addition of the appropriate bursts. Unfortunately only the first nine (rather than ten) blocks are shown; the remaining block simply repeats block nine. Note that in the top trace the step ramp effectively moves up by one-tenth of a step from one block t o the next. The bottom trace is simply obtained by recoding the sum of the two BSR’s receiving the two middle traces by a (non-vernier) ten-step encoder, as described in Section 8. Visibly the average of the bottom bursts is 4.5 (adding mentally the tenth block), i.e., the sum of 3.3 and 1.2. Finally, Fig. 43 shows the multiplication of 8 and 5 by a cascade multiplier as shown in Fig. 39. The result (Le., 40) is scaled by a factor of 10 in the bottom trace. No use is made of the time-slip feature because the two input numbers are kept at exactly 8 and exactly 5, respectively.
10. Outlook in Statistical Processing
Few pcople realize that because of the atomic (quantized) nature of the universe most information emanating from physical systems occurs in the form of a random sequence of pulses. A typical example would be a photocell: each photon produces an output pulse. Of course we generally only
STATISTICAL PROCESSORS
227
see the integral of the pulses because of their overlap and the limited resolution (bandwidth) of our measuring equipment. The idea of using the statistical properties of input variables is not new; for instance, fuel flow has been measured by Geiger counters and radioactive admixtures. It now seems that transducers furnishing directly random sequences to measure temperature, speed, etc. are not too hard to design and form perfectly reasonable alternatives to the chain “pulse -+ average value ---t encoding into random signals.” Ultimately such direct stochastic t>ransducerswill also be quite low in cost. At that point many systems will go over to time stochastic processing, the objection to such processing methods being usually that the conversion of input variables is too expensive! Another area of promise for TSP is that of hydrodynamic simulation. Here we map partJicles onto pulses (with appropriate rules about how closely pulses may follow each other, if compressibility is to be incorporated). Some attempts have already been made to solve two-dimensional flow problems with imposed boundary conditions, the “liquid” being assumed incompressible. Burst processing, too, is a strong contender for a network of calculating nodes in such simulation schemes. The whole field of neurophysiology offers problems in which TSP is a natural; this is, of course, due to the fact that nerves transmit information in the form of random pulse sequences. Research in such areas as decoding nerve signals in view of controlling prostheses, bridging spinal breaks, and “thought control” of vehicles is about to begin, Mention should also be made of the interesting possibilities in TSP for checking the correct functioning of circuits at very low cost, because there are so few circuits to check. An example is the addition of two SRPS’s (as explained in Section 2) by the use of a mask and a complementary mask. It is clear that by exchanging the masks and oRing, we must retrieve all pulses not contained in the original mask positions. This is an easily checked property. In SSP, i.e., bundle processing, fascinating designs can be evolved using optic& bundZes. By using polarized light and quarter-wave plates it becomes possible to design a processor that has no active elements and that can work in the presence of strong electromagnetic pulses and nuclear radiation. EquaIIy interesting is the search for natural bundles, i.e., bundles occurring in nature as a consequence of the laws of physics. A promising direction leads to bundles where each (‘wire” is a vortex in a type I1 superconductor. The possibilities of burst processing are also very far-ranging, not only because of the simplicity of encoding and decoding numbers or voice information, but also because of the inherent noise immunity and the ease of multiplexing. It might well turn out that in systems in which bandwidth
228
W. J. POPPELBAUM
reduction is not a primary objective, burst processing will become a strong contender for integrated communication and processing schemes; both ships and aircraft are in line for such investigations. Perhaps it should also be noted that BP is not only compatible with most of the PCM technologies, but that recoding information from, for instance, delta modulation to bursts, or vice versa, is a reasonably trivial exercise. In summary we can say that statistical processing has grown, in less than ten years, from a mere academic curiosity to a serious alt,ernative to classical weighted binary methods. There seems to be little doubt that in the near future TSP, SSP, and BP will take their rightful place among the standard methods of information processing.
ACKNOWLEDGMENTS The work reported in this paper has been supported by ONR contracts N00014-67-A0305-0007 and N00014-67-A-0305-0024 and by AEC contract 1469-P. The vast bulk of the research was done by Chuchin Afuso, Dan Coombes, Jim Cutler, John W. Esch, Arlyle Irwin, Orin E. Marvel, Trevor Mudge, David Ring, Larry D. Ryan, and Yu K. Wo. To all of them goes my deepest gratitude. I would also like to thank Professor Brian Gaines of the University of Essex for several interesting discussions.
REFERENCES Afuso, C. (1964). Quart. Tech. Prog. Rep., Circuit Res. Sec., Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. (Starting with January) Afuso, C. (1968). Analog computation with random pulse sequences Ph.D. Thesis, Rep. 255. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Anderson, G. C., Finnie, B. W., and Roberts, G. T. (1967). Pseudo random and random test signals. HewZettPuckurd J . (Sept..) Coombes, D. (1970). SABUMA-safe bundle machine. M.S. Thesis, Rep. 142. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Cutler, J. R. (1974). ERGODIC: computing with a combination of stochastic and bundle processing, Rep. 630. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Esch, J. W. (1969a). A display for demonstrating analog computations with random pulse sequences (POSTCOMP), Rep. 312. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Esch, J. W. (1969b). RASCEL, a programmable analog computer based on a regular array of stochastic computing element logic, Ph.D. Thesis, Rep. 332. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Ferrate, G. A., Poigjaner, L., Agullo, J. (1969). Introduction to Multichannel stochastic computation and control. Proc. IFAC Congress, Vursow. Gain%, B. R. (1967a). Stochastic computing. A F I P S Proc., S f i n g Jt. Cornput. cod. 30,149-156.
STATISTICAL PROCESSORS
229
Gaines, B. R. (1967b). Techniques of identification with the stochastic computer. Proc. IFAC Symp., Prague. Gaines, B. R. (1967~).Stochastic computer thrives on noise. Electronics 40, 72-79. Gaines, B. R. (1968). Stochastic computing. In “Encyclopedia of Information, Linguistics and Control” pp. 766-781. Pergamon Press, Oxford. Gilstrap, L. O., Cook, H. J., Armstrong, C. W. (1966). Study of large neuromime networks. Adaptronics Interim Eng. Rep. 1, Air Force Avionics Lab., USAF Wright-Patterson AFB, Ohio. Golomb, S. W. (1967). “Shift Register Sequences.” Holden-Day, San Francisco, California. Hirsch, J. J., and Zirphile, J. (1971). Implementation of distributed parameter system models using stochastic representation. Proc. IFAC Cong., Banfl. Korn, G. A. (1966). “Random Process Simulation and Measurement.” McGraw-Hill, New York. Marvel, 0. E. (1969). “Model T” a demonstration of image multiplication using stochastic sequences, Rep. 349. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Marvel, 0. E. (1970). TRANSFORMATRIX, an image processor. Input and stochastic processor sections, Ph.D. Thesis, Rep. 393. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Papoulis, A. (1965). “Probability, Random Variables, and Stochastic Processes.’’ Sections 3-2, 3-3 and 8-5, McGraw-Hill, New York. Peterson, W. W. (1961). “Error-Correcting Codes.” M.I.T. Press, Cambridge, Massachusetts and Wiley, New York. Petrovic, R., and Siljak, D. (1962). Multiplication by means of coincidence. Int. Analog Computation Meet., Srd, ACTES Proc. Poppelbaum, W. J. (1968). What next in computer technology?, Aduan. Computers 9, 1.
Poppelbaum, W. J. (1972). “Computer Hardware Theory.” Macmillan, New York. Poppelbaum, W. J. (1973). Record of achievements and plans of the information engineering laboratory, Rep. 568. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Poppelbaum, W. J. (1974). Burst processing, Rep. 670. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Poppelbaum, W. J., Afuso, C., Esch, J. W. (1967). Stochastic computing elements and systems, Proc. Fall Jt. Comput. Conj. Ribeiro, S. T. (1964). Comments on pulsed data hybrid computers. IEEE Trans. Electron. Comput. 13. Ribeiro, S. T. (1967). Random-pulse machines. IEEE Trans. Electron. Comput. 16. Ring, D. (1969). BUM-bundle processing machine, M.S. Thesis, Rep. 353. Dept. Computer Sci., Univ. of Illinois, Urbana, Illinois. Ryan, L. D. (1971). System and circuit design of the TRANSFORMATRIX: coefficientprocessor and output data channel, Ph.D. Thesis, Rep. 435. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Schugurensky, C. M., and Olaravria, J. M. (1969). Direct simulation of enzyme systems: first results with a direct simulation basic element. Univ. Nac. Tucunian, Repub. Argentina. Tausworthe, R. C. (1965). Random numbers generated by linear recurrence modulo two. Math. Computat.
230
W. J. POPPELBAUM
Wo, Y. K. (1970).The output display of TRANSFORMATRIX, M.S. Thesis,Rep. 381. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. Wo, Y. K. (1973). A novel stochastic computer based on a set of autonomous processing elements (APE), Ph.D. Thesis,Rep. 556. Dept. Comput. Sci., Univ. of Illinois, Urbana, Illinois. vonNeumann, J. (1963). Probabilistic logics and the synthesis of reliable organisms from unreliable components, I n “Collected Works,” Vol. 6. Macmillan, New York.
Information Secure Systems DAVID K. HSIAO and RICHARD 1. BAUM Deporfment o f Computer and Informofion Science
The Ohio Sfofe University Columbus, O h i o
1. Prologue. . 2. Introduction . . 2.1 The Jeweler’s Problem 2.2 Three Levels of Access Control and Privacy Protection in Information Secure Systems . . 2.3 A Logical Access Control Model. 3. Toward an Understanding of Logical Access Control Mechanism 3.1 Data Base Semantics and Acceas Control Mechanisms . 3.2 Concealment Protection . 3.3 Protection of Single-Attribute Data . 3.4 Limitations of Concealment Protection . 4. Some Thoughts on Information-Theoretic Protection . 4.1 The Measurement of Knowledge of a Data Base . . 4.2 Information-Theoretic Protection . 5. Can We Build Information Secure Systems? 5.1 Information Secure Systems with Single-Attribute Protection . 5.2 Information Secure Systems with Context Protection . 6. Summary and Prospectus. . References .
.
. .
.
. . .
231 234 235
.
236 238 241 244 246 250 253 254 254 255 256 257 262 270 271
. . . . . . . . .
.
. . .
.
1. Prologue
The increasing awareness of information security during the last few years is phenomenal. Although one can always claim that this has been prompted by the Ellsberg and Watergate cases, there are several direct attributions which instigated current levels of awareness on information security. They came from private industries, government agencies, professional societies, and academic research comniunities. We shall elaborate some of them as follows. A t the Spring Joint Computer Conference in 1972, T. V. Learson, then outgoing Chairman of the Board of IBM, announced that IBM would engage in a multi-million-dollar information security study with academic, 231
232
DAVID K. HSIAO AND RICHARD I. BAUM
governmental, and industrial communities. Shortly after the announcement, three different sites-MIT’s computation center, TRW, and the State of Illinois-were chosen t o study an information system with access control capability known as the Resource Secure System (RSS). The Resource Secure System (Hoffman, 1973) was originally designed to meet the rcquirements of the Department of Defense’s World-Wide Military Command and Control System (WWMCCS) . Because IBM did not get the contract from the Defense Department, RSS was not adopted for WWMCCS. The MIT computation center was charged with the study of the authorization technique employed in RSS. The study appears to indicate that the authorization technique in RSS is restrictive. For example, the file owner must ask the system administrator to assign access rights to his file for other users. There is no way that the file owner can assign these access rights directly to legitimate users of the system. The authorization technique in RSS is also dated. For example, RSS has the capability to limit a program to be executed only when the input to thc program is from a particular file. However, careful examination indicates that this is a special case of a morc gcncral one in which two files can be opened a t one time for access, where one file is the program file with the execute access and the other file is the data file with the read access. Such general capability has been available in many systems for some time (Organick, 1972). TRW was asked to develop a quantitative or qualitative measure to “scale” the degree of security provided in RSS and a certification method to validate thc correctness of RSS. For scaling, TRW proposed something called risk factors, which resemble a check-out list of possible risks when a certain application is to be conducted in RSS. As far as the certification is concerned, TRW suggests that if one wants t o certify the correctness of a secure program, one should ask agencies such as the National Security Agency to scrutinize the program. The State of Illinois was asked to test RSS in operation. Because RSS is a subsystem of OS/360, the access control to RSS files can be facilitated only if all users use the same subsystem. In other words, the user should not use other subsystems such as TSO (Time-sharing Option) and IMS (Information Management System). One way to circumvent this problem is t o excludc all other subsystems from OS/360. As a result of this exclusion, OS/36O-RSS as a standalone system is of little use in an information utility cnvironmcnt since the user is prevented from program preparations (say, using TSO to write PL/1 programs) and application development (say, an inventory package using IMS) . Thus, the study did not result in the introduction of RSS to the general public and the providing of related findings as evidence of support to PO-
INFORMATION SECURE SYSTEMS
233
tential IBM customers. Instead, at the close of the study, IBM began a series of symposia on data security, with the first symposium being held in Boston, Massachusetts in April 1973. The symposia were directed into general discussions on computer and data security and specific demonstrations of IBM systems with some security measures, such as the IBM 360 Advanced Administrative System, the System 7’s Controlled Access System, and System 370’s VS2 Release 2. The Advanced Administrative System (Wimbrow, 1971) is a stand-alone data base system using a password authorization scheme. However, the main concerns of the system are in system reliability and data integrity, which are necessary parts of an information security system but are by no means sufficient parts. Thus, the system is noted for its elaborated back-up and recovery procedures. The Controlled Access System (IBM, 1972a) is an identification system which controls the locking and unlocking of doors on the basis of encoded information on wallet-size plastic cards. The VS2 Release 2 (Scherr, 1973) is an operating system utilizing virtual memory spaces which can isolate one application running in one virtual space from another application in the other space. Because none of these systems represents IBM’s answer to future information secure systems, the main emphasis and impact of the symposia is in the area of physical security (IBM, 1973). In particular, the symposia were benefited by the fire of September 1972 at the IBM Program Information Department. IBM not only documented the entire accident and its recovery (IBM, 1972c), but also produced manuals on protective measures which may reduce the risks associated with such hazards (IBM, 1970, 197213, d). The government agencies and professional societies have also provided the impetus on the awareness of information secure systems. For example, in September 1972 the Office of Naval Research, with the help of the Naval Ship Research and Development Center, sponsored a three-day conference on the protection and sharing of information in ADP Systems. In addition to a large number of Department of Defense (DOD) and Navy organizations, universities (MIT, Maryland, Ohio State, Pennsylvania, and Southern California), major computer industries (IBM, Univac, Honeywell, and Control Data) and research and development organizations (MITRE, Rand, SDC, SRI, and TRW) were represented. The proceedings (Department of Navy, 1973) of the conference touched upon the state-of-the-art on information secure systems technology as well as the data security and sharing problems facing the DOD and Navy communities. The National Bureau of Standards, in conjunction with the Association of Computing Machinery and the National Science Foundation, organized a workshop on Controlled Accessibility in December 1972 with 70 participants. The workshop was devoted to generating position papers on infor-
234
DAVID K. HSIAO AND RICHARD I. BAUM
mation secure systems related to access controls, audit, management controls, identification, and security measures. Subsequently, a report was issued (Reed and Brainstad, 1974). Computer conferences such as the National Computer Conference of 1974, the Eighth Hawaii International Conference on System Science, the Second USA-Japan Computer Conference, and others all have data security sessions. The growing interest in information secure systems research and development is overwhelming. On the other hand, the academic research community has long been active in information secure system research. As far back as 1966, Dennis and Van Horn suggested access control to memory segments by means of access indicators, known as capabilities, which marked the first important advancement in information secure systems utilizing (hardware) segmented memory organization. I n 1968, Graham outlined a program protection scheme, known as the ring mechanism, that enabled executing programs in an information system to have different access control requirements relative to each other. Again in 1968, Hsiao (1968, 1969) demonstrated the use of boolean expressions of data attributes, known as authority-items, to control access to data aggregates such as files, subfiles, and records, which advanced information security from hardware and program protection to data protection. It is in the areas of memory, program, and data protection that academic and other research communities have made advances in information secure systems. We shall articulate these advances by way of examples and analogies. However, we will not present any theory. The lack of a theoretical treatment in these advances is largely due to the pioneering stage of the technology. Furthermore, we shall point out difficulties in making advances in these areas.
2. Introduction
Information secure systems involve security considerations in management procedures, computer hardware, system software, and data base. Thus research and development in information security has been aimed at resolving one or more of these problem areas, A completely information secure system, of course, requires successful resolution of security problems in all four areas. To aid our understanding of information security in general and the security roles of hardware, software, and data base in particular, we introduce the following jeweler’s story as an analogy.
INFORMATION SECURE SYSTEMS
235
2.1 The Jeweler’s Problem
A jeweler not only sells gems but also handles for his customers the jewels which he has sold to them. In addition to repair and service, the jeweler must sometimes keep his customers’ jewels in his safe, especially if these are regular and important customers. He knows that the safety of the jewels is only as good as the security of the safe. To this end he must have well-made safes with elaborate protection mechanisms and combinations. Quite often he has to transport the jewels from one place to another. In this situation, he must devise a secure procedure for transportation. Although safes may be included in the transportation, there is still the need for special vehicles (e.g., armored cars) to carry the safes. Furthermore, he must take great precautions when the jewels are being moved from one vehicle to another (say, from a car to a boat). Until he can secure very good safes and devise very reliable means and procedures for the transportation, the jeweler will not be able to safeguard his customers” jewels adequately. The security problem is compounded when one day the jeweler realizes that in an effort to safeguard the jewels, he has kept them hidden, and has thus interfered with one of the most important uses of the jewels, i.e., display. He then asks himself whether it is possible to provide maximum display of jewels on their rightful owners or their designated users, while simultaneously providing maximum security. Being a jeweler, he naturally investigates the possibility of developing security devices right in the bracelets, necklaces, and ornaments. Obviously, the investigation requires a thorough knowledge of the construction of the bracelets, necklaces, and ornaments. Such devices, if effective, can protect the jewels while they are displayed by their rightful users. I n this case, both the intended use and the security of jewels are realized. There is a close correlation between jewels in the jeweler’s story and information in information systems. Just as jewels are intended for display, information in information systems is destined to be shared. It is through sharing of information that the users of the systems can benefit each other intellectually. However, the basis of information sharing must be voluntary. In other words, the system must be able to protect private information and to control access to shareable information. Let information in computer systems be represented as data and programs. Protective measures in computer systems can then be viewed in different levels. As in the case of using safes to protect jewels, the use of real memory and virtual space units to protect data and programs is one approach. Such an approach is termed memory protection. Like jewels which
236
DAVID K. HSIAO AND RICHARD I. BAUM
must be transported from one place to another, data must also be moved in computer systems. The active agents for such data movement are referred to as processes or tasks. These entities are composed mainly of programs. Thus, another approach is aimed at developing procedure protection. Memory and procedure protection mechanisms, which deal with physical hardware and software elements, are collectively called physical protection and access control mechanisms. Finally, in an attempt to incorporate protection mechanisms along with the data, there is the need of more subtle protection mechanisms, which we refer to as logical protection and access control mechanisms. 2.2 Three Levels of Access Control and Privacy Protection in Information Secure Systems
The study of access control and protection mechanisms in an information system is therefore concerned with effective means of protecting private information, on the one hand, and of regulating the access to shareable information on thc other hand. Effective means for access control may now be considered on three levels-memory, procedure, and logical. A t the memory level, access control mechanisms are those which regulate access to memory in terms of units of memory. The main point is that protection applies to the containers, i.e., the memory units, not the contents. As a consequence, evrrything inside the container is subject to the same access control as the enclosure itself. Furthermore, thc contents are safe only as long as they are kept in protected containers. Typically, physical memory protection schemes employ memory bounds registers or storage protection “keys” which control access to bounded memory areas. Other, more sophisticated schrmes are possible. The idea of having an m X m matrix of control bits to keep track of access rights to m memory areas has been advanced (LeClerc, 1966). For example, an entry A,, would detcrmine the access rights to the ith area from the j t h arca. The .4,,may correspond to various access rights such as read-only, read/write, execute-only, and privileged mode, and are consulted in the course of hardware decoding of instructions (such as fetch, store, and transfer instructions). In general, one user’s access rights to an area may differ considerably from another user’s access rights t o the same area. In a multiprogramming and shared data base environment, the system must therefore provide dynamically different access matrices for differcnt users. The use of virtual space may, therefore, cnhance the implementation of the matrix scheme. Here, page and segment tables are consulted by the hardware at instruction decoding time. Each user is assigned his own tables and therefore can-
INFORMATION SECURE SYSTEMS
237
not get into a segment of space which does not have a n entry in those tables. As a result, elaboratc schemes such as the access matrix are more easily implemented with virtual space. Yet, evcn in virtual space, we note that the protectcd areas are again units of space. The second level of access control is concerned with procedure access control and protection. A procedure is simply a set of programs and associated data. Thus, unlike memory protection, the notion of procedure yrotection and control is concerned with access t o and protection of programs. To this end, the mechanisms must determine when and under what conditions programs can pass control from one to another. I n other words, the mechanisms must be ablc to monitor the execution of programs in terms of their calls, returns, and transfer of parameters. An elaborate procedure access control and protection mechanism known as the “ring mcchanism” has been proposed (Graham, 1968). This concentric ring mcchanism allows one program to give control to another without violating any of the access control rights of either program, thereby safeguarding each program’s working tables, data, intermediate results, etc. Conceptually, the concentric ring mechanism requires the uscr to arrange his proccdures hierarchically, i.c., procedures a t the lower part of the hicrarchy (i.e., outer rings) have less privileged access rights. It is a generalization of a simple hierarchy of two rings. In the simple hierarchy of two rings, proccdures are run either in the inner ring (i.e., supervisory state) or the outer ring (i.e., thc uscr-program state). To communicate with procedures in the supcrvisory state, a procedure in the user-program state must go through a gate or a set of gates. The implementation of gates varics from onc computer system to another. It may be implemented as hardware supervisory calls (SVC) or as software system macros, with gate names kept in the file system. On the othcr hand, to gain access to procedures in thc user-program state no procedure in the supervisory state is required to go through the gates. With this generalization the ring mcchanism allows many program-running states in which cach state is realized in a concentric ring. It is therefore possible to have, for example, the systemsupervisory state (in the innermost ring), the user-supervisory state (in the next ring), the uscr-subsystem-monitor state, the user-subsystem state, and the user-subsystem-program components (in the outermost ring). This mechanism can bc implemented in a computer whether or not it has virtual space. Therefore, one should not indulge in the misconception that virtual spacc protection and procedure protection are one and the same. Schroeder and Saltzcr (1972) have shown a clever hardware implementation of the ring mechanism in a segmented virtual memory system. The highest level of access control is logical. It is natural that, in handling information in a computer system, the user will first want his information
238
DAVID K. HSIAO AND RICHARD I. BAUM
to be represented as structured data. He will then refer to his structured data in terms of logical entities such as fields, arrays, records, subfiles, and files. The important point is that these entities are logical units of information that may have little resemblance to their physical or virtual storage images. By allowing the user to associate access control requirements and protection measures with the logical units, the access control mechanism can facilitate direct control and protection of the information regardless of the whereabouts of that information. Furthermore, the mechanism does not require the user to be familiar with the physical or virtual storage structure of the computer system. Logical access control mechanisms must therefore have the facility for the user to specify his shareable and private data in terms of logical entities of the data base, to assign access rights and protection requirements to these entities, to determine the collections of these entities and the types of access that other users may have, and to incorporate additional authentication and checking measures. 2.3 A Logical Access Control Model
Let us present a simple model in which some of the important concepts and salient features of these mechanisms can be elaborated, and from which an overall understanding of them may be achieved. Basically, the logical access control model consists of three parts: (1) a shared data base to which access must be controlled; (2) a group of users whose access to the data base must be regulated; (3) a mechanism which governs the accessing of the data base by the users. I n this model, all the time-variant information specifying the types of access to the data base that users have is regarded as constituting the access control information of the mechanism. At a given time, the access control information may be represented by an access matrix A , with users identifying the rows and logical entities of the data base the columns. The entry A ( U , D ) contains access righls held by user U with respect to entity D. Figure 1 shows an example of such an access matrix A , as originated by Lampson ( 1971 ) . I n this matrix, the Ui denote the users and the D ithe entities. The entries A ( Ui, 0,) denote the access rights, for example, R, W, El and P (i-e., Read, Write, Execute, and Print, respectively). I n examining the matrix, we make the following observations : 1. There are usually more entities than users. This is particularly evident in an information system. Thus, the number of columns in the matrix will be much greater than the number of rows.
INFORMATION SECURE SYSTEMS
239
...
D¶
FIG.1. An access matrix.
2. The matrix is sparse, especially when each user has access to a relatively small number of entities in the data base. 3. In the same matrix two or more rows may be identical, indicating that two or more users have identical access rights with respect to the same entities. As far as the access control mechanism is concerned, these users are alike (e.g., the rows identified by U, and Uh in A are identical).
Because the access matrix is sparse and has many columns (i.e., many entities involved) , attempts have been made to organize the access control information into manageable pieces for effective use. Consider the following two approaches. One is to organize for each user a list of access rights of all the entities accessible to that user. In this approach, the user-row approach, there is a list of access rights specified for the entities which the user has been authorized to access, whereas the inaccessible entities do not have access rights associated with the list. Thus, the list is compact and user-oriented. Examples of the use of the user-row approach for managing access control information in contemporary operating systems are the so called capability-2ist systems (Lampson, 1969). On the other hand, in the entity-column approach, for every entity in the system there is a list of users who have been given access with appropriate access rights to the entity. Obviously, users who have no access to an entity do not have their access rights included in the list for the entity. Thus, the list is again compact. However, it is entity-oriented. Examples of the use of the entitycolumn approach for the management of access control information in current operating systems are the so-called access-list systems (Daley and Neumann, 1965; Organick, 1972). I n both the capability and access list systems, the aim is to implement the access matrix as depicted earlier with fewer redundant entries.
240
DAVID K. HSLAO AND RICHARD I. RAUM
By organizing the access matrices into capability lists and linking the entries for the entities in one list with the same entities in another list, it is possible to have a capability list system whose listed entities constitute effectively an access list, system. Thus, in this approach both user-row and entity-column are emphasized. Although the linkage requires additional cost and processing, this approach incorporates considerable flexibility. An example of the use of this approach in data base systems can be found in the authority-item systems (Hsiao, 1968, 1969). The access matrix as implemented in the capability-list, access-list, and authority-item systems is well-suited to contain access control information about the users and entities singly. By singly we mean that before a user Ui is granted or denied the access to an entity Di, the access control mechanism simply checks the access rights associated with the entry A ( Ui, 0;). There is no need for the access control mechanism to consult any other entry, say, A (TI;,Dj) for access to the entity Di by U,. Furthermore, in this model the access control mechanism is not concerned with the possibility that the permitted access to one entity may violate a denied access to another entity. In other words, conventional access control mechanisms, as characterized in this model, for the most part treat a data base as a collection of independent data items. Security is achicvcd by denying access to the protected items. We shall point out that this type of information security is very primitive. In a real-world situation, information represented in a data base system should be considered as a collection of semantically interconnected data items. Semantic interconnections in a data base are characterizable through specification of inference structures that indicate how a set of connections implies the existence of other connections. The function of an access control mechanism is therefore to protect data in spite of connections; the mechanism must limit access to not only data and connections protected from the user but also to those connections which in some way allow inference of the protected data and connections. The “intelligence” of the mechanism has a significant effecton the operation of the system and usc of the data base. Consider what happens if the mechanism is incapable of taking advantage of the connections present in the data base. In this case, the user must be sufficiently knowledgable of the data base structure to specify correctly protection of all connections that imply the existence of access-restricted data and connections. This typc of mechanism has two distinct disadvantages. The first is that the user must be familiar with the complete structure of the data base if the data base is rich in semantic connections. In other words, to protect a part of a data base the user is required to know the entire data base. This is be-
INFORMATION SECURE SYSTEMS
241
cause information security must account for all ways that protected items may be derived. For example, if a user wants to protect some structured data whose structural information is reflected in certain schemata, the user must have access to all those schemata, thus precluding the possibility of protecting the schemata themselves. The second disadvantage is that due to the limitations of the mechanism the user’s security requirement may cause the mechanism to protect more information than is really necessary. This problem, for example, can easily arise in a record-oriented data base system where the smallest protectable item is the record. A record has many implicit relations among its fields. Thus, protection of just one of the fields requires protection of the entire record. This conceals other fields which are otherwise accessible. In fact, if other records contain information which reveals the presence of the field, then they too must be restricted from access in their entirety. To overcome these problems the data base system must employ a mechanism sufficiently intelligent to take advantage of the connections and automatically protect all necessary information, without being overly restrictive in the course of enforcing a security requirement. Such a system needs an explicit representation of semantic connections among data, together with a suitable access control mechanism that uses this representation to carry out the security requirements, This is in marked contrast to conventional information systems, in which the implicit connections among data require the user to specify explicitly each data item that is to be protected. The burden of complete knowledge of the data base structure and its inherent meaning is thus shifted from the average user to the system and system administrators. In the following sections, we shall expand the model of independent data items t o a model of semantically connected data items. Furthermore, some properties of various access control mechanisms are examined by way of the expanded model. The model and the examples are simplified to allow the fundamental properties of such mechanisms to be apparent.
3. Toward an Understanding of logical Access Control Mechanisms
We shall represent attributes by boldface lower case letters (e.g., salary) and a data item by uppercase letters in parentheses. The expression [x] represents a set containing all data items with attribute x (e.g., [salary] is the set of all data items which represent salary figures). Let a binary mathematical relation express a semantic property that is potentially satisfiable by each of its elements. Such a relation is called a (semantic)
242
DAVID K. HSIAO AND RICHARD I. BAUM
connection. A semantic connection is represented by the form R ( x , y) where x and y are attributes and R denotes the semantics of the connection. For example, if salary and name are attributes, then the connection “salary is the annual earnings of name’’ could be represented as SALARY(name, salary), A connection R ( x , y) is therefore a subset of the Cartesian product [ x ] X [y]. A set of elements (more precisely, data element pairs) that satisfy a connection R ( x , y) is represented by R”(x, y ) . For our convenience, we denote an element of R (x, y ) or RE(x, y ) by ( X , Y ), Data bases are usually rich in semantic connections. Thus, data are implied by other data due to the semantics of the data base. This we shall illustrate with the following example. Example 1 . A Case of Rich Semantic Connections. Suppose a data base contained the connections ‘(idis the employee identification code of name,’’ “salary is the annual earnings of name’’ and “salary is the annual earnings of the person whose identification code is id,” denoted by I D (name, id) , SALARYN (salary, name) and SALARY1 (salary, id), respectively. Assume that the connection represented by ID(name, id) is one-to-one. The elements (DOE, 67) from I D (name, i d ) , (10000, DOE) from SALARYN (salary, name) and (10000, 67) from SALARYI(salary, id) contain semantically interconnecting information since (10000, DOE) is intuitively seen to be derivable from the other two elements.
The type of information shown in the above example may be formally represented by composition. The composition of two connections is possible only when the connections have a common attribute and a resultant connection is defined. The composition of R ( x , y) and R’(z, y) on the common attribute y yields the relation R“(x, z) , which is denoted by
R ( x , Y ) .R’(z, Y)
+
R”(x, z>
and means that each element of the set
( ( X ,2)
I (x,Y ) E R ( x , Y> A
(2,y ) E R ’ b , Y)
satisfies R”(x, z). In this case, the connections R ( x , y) and R’(z, y) are said to derive the connection R ” ( x , z) which is called the derived connection. For completeness we will say that every connection derives itself. A data base schema ( D B S ) is a set of explicitly specified connections and their valid compositions which, of course, are well-defined connections. The collection of all the schemata and the collection of data items on which the schemata are defined form a data base ( D B ).
INFORMATION SECURE SYSTEMS
243
To distinguish an element ( X , Y ) that appears in more than one connection, we adopt the following triples:
( R i ; X ,Y ) indicating that the element ( X , Y ) is a member of R;. The set of all such triples of a data base is called the component data base of the data base. Intuitively, the elements, called components, of the component data base are the most elementary data items that are accessible by a user. Unless otherwise noted, we will not distinguish a component data base from its data base and will use the terms components and data items interchangeably. Example 2. A Sample Data Base.. We will present a realistic data base using the above natation. Let the data base consist of connections over the attributes name, title, and salary. Specifically, these three connections have the following meaning: TITLE(name, title) means that title is the title of name; SALARY(name, salary) means that salary is the annual earnings of name; and LINESAL(title, salary) means that salary is a salary payable to a person with title. The data base schema is therefore composed of
{ TITLEE(name, title), SALARYE(name,salary), LINESALE(title, salary) ).
For each connection, we list the data items of the connection as follows : TITLEE(name, title) { (BAUM, GRA ), (NEE, GRA 1, (KAFFEN, GRA ) , (HSIAO, PROF) , (KERR, PROF) , (FORD, VP ), (NIXON, P ))
LINESALE(title, salary) {(GRA, 5 >, (GRA, 4 1, (PROF, 22), (VP, 62) 1
SALARYE(name,salary) (BAUM, 5 1, (NEE, 5 1, (KAFFEN, 4 ), (HSIAO, 22), (KERR, 22), (FORD, 62) ]
244
DAVID K. HSIAO AND RICHARD I. BAUM
The component data base must be { (TITLE; RAUM,
(TITLE; NEE, (TITLE; KAFFEN, (TITLE; HSIAO, (TITLE; KERR, (TITLE; FORD, (TITLE; NIXON,
GRA ), GRA ), GRA ), PROF), PROF), V P ),
I?
(LINESAL; GRA, (LINESAL; GRA, (IJNESAL; PROF, (LINESAL; VP,
5 ),
4 ), 22), 62),
),
(SALARY; BAUM, (SALARY; NEE, (SALARY; KAFFEN, (SALARY; HSIAO, (SALARY; KERR, (SALARY; FORD,
5 ), 5 ), 4 ), 22), 22), 62) 1
The semantic interconnections of this data base are indicatcd b y the compositions as follows:
TITLE(name, title) .LINESAL( title, salary) -+ SALARY (name, salary) TITLE(name, title) .SALARY (name, salary) + LINESAL(title, salary) LINESAL (title, salary) SALARY (name, salary) TITLE (name, title) This example will bc used later to demonstrate various features of access control mechanisms.
-
---f
3.1 Data Base Semantics and Access Control Mechanisms
The term “protection of data” in a data base system is usually synonymous with concealment>of that data. Thc rationale for this follows from the fact that in many data base applications the system need only prevent a user from accessing a piecc of data to insure that it remains unknown to that user. In a data base with semantically interconnectcd information this simple conccalment scheme cannot provide adequate protection, since access t o other data items may establish the existence of the original item. Example 3. A Case of Semantic Derivation. Consider the data base discussed in Example 1. Suppose we wished to protect data on the annual earnings of namcd employees, i.e., SALARYN (salary, n a m e ) . To enforce this security policy access must be denied not only to clcments of SALARYN(salary, name), but also to some clcments I D ( n a m e , i d ) and SALARYI(salary, id). For instance, a user should not be ablc to access both (10000, 67) of SALARYI(salary,
INFORMATION SECURE SYSTEMS
245
id) and (DOE, 67) of ID(name, id), since from these elements he could derive the protected element (10000, DOE) of SALARYN (salary, name). Such derivation is of course an instance of the composition of SALARYI(salary, id) and ID(name, id). Data bases which are rich in semantic connections among their data items allow a great deal of composition. This in turn makes protection of data more complex. The above example points out a property of access control mechanisms that operate on a semantically rich data base: in general, protection of a data item which is an element of a connection requires that more than that data item actually be protected. This property may be characterized by defining some “side effect” measurements of enforcement mechanisms. Let E represent an access control mechanism. At any instant the data base DB (see Fig. 2) may be partitioned into three sets: a set DB.A representing all elements that have been revealed to the user, a set DB.P representing elements that may not be accessed, and a set DB.M representing all elements that have not been accessed and are not yet prohibited from access. As will be seen, the dynamic characteristics of these sets are highly dependent on E. All of the elements that appear in DB.A are said to be simultaneously accessed, If DB.M is empty, then DB.P and DB.A are in a j n a l conjguratwn and define a static partitioning of DB. That is, all elements of DB may be classified as accessed or protected, thereby obviating the need for any dynamic decisions by E concerning the status of DB.M. For a given data base DB and enforcement mechanism E there may be many possible static partitionings of DB. Let the notation A ( E , DB) [or P ( E , D B ) ] represent a set containing all possible final configurations of DB.A (or DB.P) for the set DB and the access control mechanism represented by E. Example 4. A Data Base Partitioning. Let DB be the set {Xl,X2, Xt, Xq, X S ,XS,X,, Xe, X , } where X , represents a suitable data item. Suppose that it is required that X1,Xz and X , be protected by the mechanism E. Also assume that the data base DB contains semantic connections. After applying E to DB, we find that in this case the following sets of data items constitute the only possible sets of data items which are accessible to the user: jX51
x ~ ) or
{Xb,
X,, X,}
or
{Xa,&I.
FIG.2. The partitioning of a data base DB.
246
DAVID K. HSWO AND RICHARD I. BAUM
The final configuration of DB.A is therefore
D B ) = {{XS, X81, {X5, x7,X9), {&, X*)I. An element of DB is explicit& protected if there is a security requirement directly prohibiting access to that element. Let f‘ represent all elements of DB that are explicitly protected. To protect the elements of p , elements of the rest of the data base, DB - P , may also have to be protected. The nature of the enforcement mechanism will determine how many additional elements must also be protected. In the above example, data items XI, X z , and Xsare explicitly protected. On the other hand, the set P ( E , DB) consists of additional data items which must also be protected, due to the nature of E, on the basis of the security requirements. Intuitively, the “restrictiveness” of an access control mechanism is a measure of its detrimental “side effects,” that is, the manner in which it prohibits access to elements of DB that are not explicitly protected. The notion of restrictiveness is formalized by the following measure. Define the protection precision p p ( E , D B ) of E on DB as the average cardinality of the elements of A ( E , DB) divided by the, cardinality of DB - P . This ratio is a measure of the average number of elements that are not accessible even though they are not explicitly protected. Absolute protection precision (i.e., p p ( E , DB) = 1) implies that every inaccessible data item is an explicitly protected data item. In a data base with rich semantic connections, absolute protection is difficult to achieve. Example 5. The Protection Precision and Access Restrictiveness of an Access Control Mechanism. Here we compute the protection precision of the enforcement mechanism shown in Example 4. P = {XI,x2,X 3 ) . p p ( E , DB) = (2 3 2)/3 t (cardinality of
+ +
x6,x’/, X8, x 9
( x 4 9x6,
=
7/3
+ 6 = 7/18.
This represents the fact that even though only a third of the elements are explicitly protected the mechanism E must on the average conceal close to two-thirds of the elements to enforce the security requirement. Since protection precision indicates that the elements of DB are necessarily concealed to provide the required protection, it implies, in a sense, the degree of access restrictiveness to concealed data items. 3.2 Concealment Protection
We now examine a class of access control mechanisms for concealment protection. All mechanisms must deny access to any element that is ex-
INFORMATION SECURE SYSTEMS
247
plicitly protected. They differ in the way in which they prevent access to other elements of the data base. 3.2. I . Derivation-All Protection
A basic function of the mechanism is to prevent derivation of data on the basis of built-in connections. An expedient way to accomplish this is to deny access to any element of any connection that may be used to derive a protected data item. Example 6. An access control mechanism is used for derivation-all protection to enforce the security requirement that “data on the annual earnings of named employees are not to be revealed” on the data base in Example 2. This mechanism not only denies access to all elements in SALARYE(name, salary) because these elements must be explicitly protected, but also denies access to all elements of TITLEE(name, title) and LINESALE(title,salary), since TITLEE (name, title) and LINESALE(title, salary) may be composed to yield elements of SALARYE(name, salary). I n this case the mechanism effectively denies access to all data items of the data base, giving a protection precision of 0.
In general, this mechanism operates by denying access to all R’(x, y) and R“(y, z) if R ( x , z) is explicitly protected and if R’(x, y) -R” (y, z) -+ R(x, z) is a connection in the data base under consideration. Is derivation-all protection the only way to control access in Example 6? Certainly not, since revelation of either TITLEE(name, title) or LINESALE(title, salary) exclusively does not allow derivation of (and therefore relevation of) any element in SALARYE(name,salary). 3.2.2. Derivation-Selective Profection
Another access control mechanism is presented which is less restrictive than derivation-all protection. Derivation-selective protection allows access to data items of a connection if the connection cannot be composed with one or more previously used connections to derive data which are explicitly protected. Example 7. Let a derivation-selective protection mechanism be used to enforce the same security policy that “data on the annual earnings of named employees are not to be revealed” on the data base of Example 2. This mechanism denies access to all elements of TITLE (name, title) or LINESAL(title, salary) but not both. There are two possible final configurations of DB.A :
248
DAVID X. HSIAO AND RICHARD I. BAUM
{(TITLE; BAUM, GRA ) { (LINESAL; GRA, 5 ) (TITLE; NEE, GRA ) (LINESAL; GRA, 4 ) (TITLE; KAFFEN, GRA ) (LINESAL; PROF, 22) (TITLE; HSIAO, PROF) or (LINESAL; VY, 62) } (TITLE; KERR, PROF) VP ) (TITLE;FORD, (TITLE; NIXON, P. )) The protection precision is (7 4)/2 + 11 = 3. In general we have p p (derivation-all protection, D B ) 5 p p (derivation-selective protection, DB) Access control mechanisms for derivation-selective protection may deny complete access t o data of one or more connections of the data base. This still seems to be rather restrictive, since knowledge of some elements of several composable connections does not imply user knowledge of any element of the derived connection. Thus, in the above example, knowledge of (TITLE; NEE, GRA) and (LINESAL; PROF, 22) does not allow dcrivation of any element of SALARY (name, salary).
+
3.2.3. Derivation-Some Protection
If data of a connection in RE(x,y) are explicitly protected, then derivation-some protection prevents simultaneous access to a collection of elements of other connections that allow derivation of any element that satisfies R E(x, y ) . Example 8. Derivation-some protection is used to realize the same security requirement on the same data base as in the previous examples. This mechanism prevents simultaneous access to elements (TITLE; X , Y ) and (LINESAL; Y , 2) since they derive the element (SALARY; X , 2). Some of the final configurations of DB.A in this case are f (TITLE; BAUM, (TITLE; NEE, (TITLE; KAFFEN, (TITLE; HSIAO, (TITLE; KERR, (TITLE; NIXON, (LINESAL; VP,
GRA ) GRA ) GRA ) PROF) or PROF) P ) 62 11 or
{ (TITLE; HSIAO, PROF)
(TITLE; KERR, (TITLE;NIXON, (LINESAL; GR.A, (LINESAL; GRA, (LINESAL; VP
PROF) P ) 4 ) 5 ) 62 )I
{(LINESAL; GRA, (LINESAL; GRA, (LINESAL; PROF, (LINESAL; VP,
4) 5 ) 22) 62) }
249
INFORMATION SECURE SYSTEMS
In general we have pp(derivation-selcctive protection, DB) 5 pp(derivation-some some protection, DB) and in many cases it is much greater. This indicates that derivation-some protection is usually much less restrictive than derivation-selective protection. 3.2.4. Highly Derivation-Selective Protection
The highly dcrivation-selective protection is less restrictive in that it allows access to elements if they do not derive an actual element of an explicitly protected connection. Example 9. A highly derivation-selective protection is used to realize the same security requirement on the same data base as in the previous examples. This mechanism prevents simultaneous access to elements (TITLE; X , Y ) and (LINESAL; Y , 2) if the element (SALARY; X , 2 ) is a member of the component dat,a base. The distinction between protection in this example and protection in Example 8 is that here simultaneous access is prevented to a collection of data items if they derive an existing element of the data base. Some of the final configurations of DB.A in this case are
( (TITLE; BAUM, (TITLE; NEE, (TITLE; HSIAO, (TITLE; KERR, (TITLE; NIXON, (LINESAL; GRA, (LINESAL; VP,
GRA ) GRA ) PROF) PROF)
P 4
) )
62
1)
( (TITLE; KAFFEN, (TITLE; HSIAO, (TITLE; KERR, or (TITLE; NIXON, (LINESAL; GRA, (LINESAL; VP,
GRA ) PROF) PROF)
P 5 62
) )
1)
In general we have pp (derivation-some protection, DB) 5 p p (highly derivationselective protection, DB) 3.2.5. Elemental-Derivation-Sensitive Protection
We have previously described four mechanisms that provide protection with decreasini restrictiveness on explicitly protected data. However, these explicitly protected data constitute an entire connection. Thus, protection of the data amounts to protection of the connection. We now consider security requirements which require protection of subsets of connec-
250
DAVID K. HSIAO AND RICHARD I. BAUM
tions. The first three protection mechanisms discussed in this section are incapable of providing this kind of protection since they are riot dependent on actual occurrences of elements in a connection. The highly derivationselective access control mechanism may be generalized into the elementalderivation-sensitive access control mechanism, which provides protection for subsets of connections. This mechanism operates by preventing simultaneous revelation of a collection of elements which derive a protected element of a connection. Example 10. Elemental-derivation-sensitive protection is used to realize the security requirement on the data base of Example 2, that data on annual earnings of employees NEE and HSIAO, i.e., (SALARY; NEE, 5 ) and (SALARY; HSIAO, 22), may not be revealed. In this case, the access control mechanism operates by preventing simultaneous access to (TITLE; X , Y ) and (LINESAL; Y , 2) if (SALARY; X , 2) is protected. A final configuration of DB.P in this case is
((SALARY; NEE, (SALARY; HSIAO, (TITLE; HSIAO, (TITLE; NEE,
5 22
) )
PROF) GRA ) ]
3.3 Protection of Single-Attribute Data
I n the previous section some access control mechanisms that are applicable to protection of connections involving two attributes may be extended to protection of connections having only one attribute. A single-attribute connection is represented by the notational form R (x) where x is an attribute and R is the name of the connection. A connection R ( x ) is a subset of [XI. A set of existing elements that satisfy a connection R ( x ) is represented by RE(x) . The notion of projection and a new definition of composition are introduced here. The x projection of R ( x , y) is the set
((XI I ( X , Y ) E R ( x , Y) I ; and the y projection of R (x, y) is the set { ( Y )I ( X , Y ) E R ( x , Y) 1. Composition in this case represents the situation in which two elements of single-attribute connections are combined to form an element that satisfies a connection. This is represented by the notation R ( x ) . R ’ ( y ) 3 R“(x, y) and it means that each element of the set R ( x ) X R ’ ( y ) satisfies R ” ( X ,y ) .
INFORMATION SECURE SYSTEMS
251
Relating projection to composition, we represent the constraint that a single-attributc connection R ( x ) is the x projection of R ’ ( x , y) by the notation
R ’ ( x , Y>
-+
R(x)-
The inclusion of single-attribute connections t o the model requires two other additions t o the model: (1) data base schemata may contain singleattribute connections; (2) the couple ( R ; X ) is a component of the component data base if ( X ) is an element of R (x) . Example 11. A Sample Data Base with Two Types of Semantic Connections. SALARYE(name, salary) SALARY.NAMEE(name) {(BAUM, 5 { BAUM, 5 1 NEE, (NEE, KAFFEN, (KAFFEN, 4 ) 22) HSIAO, (HSIAO, (KERR, 22) KERR, 62)i FORD 1 (FORD, SALARY.EARNE(salary) (5,
>
4,
22, 62 I
The component data base is
{(SALARY; BAUM, (SALARY; NEE, (SALARY; KAFFEN, (SALARY; HSIAO, (SALARY; KERR, (SALARY; FORD,
(SALARY.NAME; BAUM ), (SALARY.NAME; N E E ), 4 ) (SALARY.NAME; KAFFEN), 22) (SALARY.NAME; HSIAO ), 22) (SALARY.NAME; KERR ) , 62>, (SALARY.NAME; FORD ), (SALARY.EARN; 5 ), (SALARY.EARN; 4 ), (SALARY .EARN; 22), (SALARY.EARN; 62) 1 5 ) 5 )
Valid derivations are :
SALARY (name, salary)
-+
SALARY.NAME(name)
a projection SALARY.EARN (salary) a projection SALARY.NAME (name) SALARY.EARN (salary) -+ SALARY (name, salary) a composition
SALARY (name, salary)
4
252
DAVID K. HSIAO AND RICHARD
I.
BAUM
The following examples show the applicability of the various access control mechanisms to the extended data model. Example 12. A derivation-all access control mechanism is used to enforce the security requirement “do not reveal data on the salary of a named employee” on the data base in Example 11. This mechanism requires that access be denied to all elements of the component data base, since SALARY.NAME(name) and SALARY.EARN (salary) derive SALARY (name, salary). Example 13. A derivation-all access control mechanism is used to enforce the security requirement “do not revcal the names of salaried employees” on the data base in Example 11. The mechanism denies access to all elements of SALARY.NAME(name) and SALARY (name, salary) and allows access to SALARY .EARN (salary). Example 1 4. A derivation-selective access control mechanism is used to enforce the security requirement “the data on annual earnings of named employees are not allowed” on the data base in Example 11. This mechanism denies access to SALARY (name, salary) and one of the two connections SALARY.NAME(name) and SALARY. EARN(sa1ary). We observe in this example that the finer “granularity” of the component data base (as is possible with single-attribute connections) allows a security requirement to be enforced with less restrictiveness [if SALARY (name, salary) were the only connection in the data base, then the above security requirement would deny access to all elements of the component data base]. This shows that the restrictivcness of a security requirement is ultimately dependent on the type of the enforcement mechanism and the structure of scmantic connections of the data base. Example 15. A highly derivation-sensitive access control mechanism is used to enforce the same security requirement on the data base in Example 11. In this case simultaneous access is denied to the element (SALARY.NAME, X) and the element (SALARY.EARN, Y) of DB if (SALARY, X, Y) is a member of DB.
For data bases with only single-attribute connections, both thc derivation-some access control mechanisms and the derivation-selective access control mechanisms prevent access to the same data items for a given security policy. In other words, the two mechanisms are equally restrictive. The application of elemental-derivation-sensitive access control mechanisms to protect subsets of data items with a single attribute is straightforward. We shall not elaborate here.
253
INFORMATION SECURE SYSTEMS
3.4 Limitations of Concealment Protection
Suppose the connection SALARY.NAMEE(name) and the connection SALARY.EARN"(sa1ary) of Example 11 are revealed to the user: what does the user know about SALARY"(name, salary) ? Assuming he has no source of external (to the data base) information, the user can establish that the actual elements of SALARYE(name, salary) are a subset of the cartcsian product SALARY.NAMEE(name) X SALARY.EARNE (salary). Nevcrtheless, he cannot establish which dements actually make up this subset. The access control mechanisms described so far assume that the user is capable of determining which elemcnts of SALARY.NAMEE (name) X SALARY.EARN"(salary) are actually in SALARYE(name, salary) by preventing derivations of a potential element. This assumption may be overly strong. What the assumption implies is that every meaningful derivation of existing semantic connections must be known to (i.e., included in) the data base; otherwise, the mechanisms do not know how to conceal elemcnts of existing connections which may lcad to meaningful information about elements of connections unknown to the data base. If a derivation, for cxample,
-
SALARY.NAME (name) SALARY.EARN (salary) + SALARY (name, salary) is not included in the data base, then the enforcement mechanisms in this case assume that thc user has no knowledge about SALARY(name, salary). Consequently, they allow acccss to elements of SALARY.NAMEE (name) and SALARY.EARNE(salary). This seems unreasonable. By receiving these dements, the user effectively has the superset SALARY. NAMEE(name) X SALARY.EARNE(salary), which does give the user some information about SALARY (name, salary). The following example is used to illustrate the point. Example 16. The Relation among Semantics, Accessed Data, and Knowledge. Suppose the following elements of the component data base of Example 2 have been accessed by the user:
(TITLE; BAUM, (TITLE; NEE, (TITLE; KAFFEN, (TITLE; HSIAO, (TITLE; KERR, (TITLE; FORD, (TITLE; NIXON,
GRA ) GRA ) GRA ) PROF) PROF) VP ) P )
(LINESAL; GRA, (LINESAL; GRA, (LINESAL; PROF, (LINESAL; VP,
5 )
4) 22) 62)
254
DAVID K. HSIAO AND RICHARD I. BAUM
What information do these elements give the user about SALARY (name, salary)? By way of composition, he can establish that SALARY(name, salary) must be a subset of
{(SALARY;BAUM, (SALARY; BAUM, (SALARY; NEE, (SALARY; NEE, (SALARY; HSIAO,
5 ), 4 ), 5 ),
(SALARY; KAFFEN, (SALARY; KAFFEN, (SALARY; KERR, 4 ), (SALARY; FORD,
5 ), 4 ),
22), 62) }
22) ,
From this composition the user knows the salary of HSIAO, KERR, and FORD and knows that the salary of BAUM, NEE, and KAFFEN is either 4 or 5. The structure of the data base plus the semantics of the data previously accessed allows the user to establish the existence of some elements in the data base.
This problem could be resolved if a way of recording the fact that some information has been gained about a conncction (although perhaps not enough to establish its existence in the data base with certainty) could be developed. In the next section a form of “knowledge” recording along with a new type of rncchariisms is proposed which lays the groundwork for dealing with the problem set forth in this section.
4. Some Thoughts on Information-Theoretic Protection
Information-theoretic protection is based on a probabilistic appraoch to assess the user’s knowledge of the data base. For the assessment, the mechanism assigns to each element of thc data base an “existence probability” on the basis of the component information revealed to the user and the schemata of the data base. The term information-theoretic protection is motivated by the probabilistic nature of the assessment mechanism. 4.1 The Measurement of Knowledge of a Data Base
Given that he has complete knowledge of the structural information of the data base, the assessment mechanism measures the user’s knowledge of the elements of the component,data base. The data base schemata provide the user with knowledge of the attributes of all of the data items in the data base. In other words, for each connection R E ( x ,y) in the data base, he knows that RE(x, y) is a subset of {allplausible data items with attribute x ] X fall plausible data items with the attribute y } (1)
INFORMATION SECURE SYSTEMS
255
For each connection R E ( z ,) he also knows that R E ( x )is a subset of { the set of all plausible data items with attribute z 1.
(2) He can thercfore establish a superset [i.e., the union of (1) and ( 2 ) ] of the elements in the data base. This superset set of all plausible components of the data base is called the primordial component data base ( P r D B ). Subsequently, the user interacts with the data base to determine which elements of PrDB are actually in DB. During the user-data base interaction, the mechanism monitors the user’s knowledge of a n element ( R ;X , Y ) of DB and also his knowledge of an element ( R ;2) of DB. An element is known with certainty if its existence within the component data base has been established. At any instant the primordial component data base may be partitioned into two sets: a set PrDB.K containing all elements that are knoum (to the user) with certainty and set PrDB. U containing all other dements. Obviously, the user wants to establish those elements of PrDB. U which are indeed in the data base. Similarly, the set PrDB. U can be partitioned into PrDB. U.K and PrDB. U. U. Such a partition is termed a tentative configuration of PrDB. U. A tentative configuration of PrDB.U is implausible if some of the elements of PrDB. U.K cannot be in the data base, and is plausible otherwise. The set of a11 plausible tentative configurations of PrDB.U is called the configuration space. Ideally, there is no a priori reason to assume that any configuration of the configuration space is more likely than any other to be the actual partitioning of U into those dements which are in the data base and those which are not. The existence probability of an element is the ratio of the number of plausible tentative configurations of which it is a member to the number of possible tentative configurations. The degree of user knowledge of an item is directly proportional to the existence probability of that item. The structural information known to the user is not static. For example, each access request provides new structural information that must be considered by the mechanism thereafter. I n principle, the notion of statistical protection and access control (Hansen, 1971) can be embodied within the framework by properly defining the structural information induced by such accesses. 4.2 Information-Theoretic Protection
Protection of a data base whose access history is characterized by a collection of existence probabilities is achieved by specifying existence probability thresholds, called protection threshoZds, for each element of the data base. An access is denied if it causes one or more elements’ existence probability to rise over its protection threshold. A data base security requirement in this type of protection determines the protection thresholds.
256
DAVID K. HSIAO AND RICHARD I. BAUM
5. Can We Build Information Secure Systems?
For implcmentation of an information secure system based on information-theoretic protection, a simple mcchanism that approximates existence probabilities in a consistent manner is necessary. The information-theoretic protcction as described above does not provide a viable method for actually implcmcnting a system, since the amount of processing and storage required is astronomical. Furthermore, specification of all structural information known to the user is practically impossible. This protection consideration does introduce a consistent way of allowing an elcment of the data base to be considered by the system as being less than certainly known but more than definitely unknown. On the other hand, there are solutions to the implementation of mechanisms bascd on concealment protection. We shall consider briefly all five types of the. acccss control mechanisms. Furthermore, wc shall single out two of the mechanisms for elaboration on their implementation. All the mcchanisms for conccalment protection may induce multiple static partitionings of the data base. In some cases certain partitionings may be considcrcd “better” than othcrs. For example, it may be undesirable to have a final configuration of DR.A which has no elcments from onc or morc connections. Thc lack of connections indicates that the set of accessiblc data is poor in semantics. To prevent such undesirable partitionings, a look-ahead scheme may be needed to shape the final configuration into a n acceptable form. Except for the derivation-all, all other concealment protection mechanisms require acccss-history keeping. Dcrivation-all protection does not require such a history, sincc all acccss decisions are independent of previous acccss operations. Thus, derivation-all mechanisms can be implcmented in a memory-less environment. Derivation-selective protection requires that a history of acccssed connections be noted. The remaining three types of protection all require that a histtory of each accesscd component of accessed connections be kept. The amount of storagc required for this history may be great relative to the data basc itself since each user potentially requires an independent history store. Derivation-all protection is the easiest to realize sincc protection is based only on the names of connections and does not require a history. Derivation-selective protection is expensive since some history information (keeping track of noted connections) must bc accessed before protection decisions can bc made. Derivation-some protection is much more expensive than dcrivation-selective protection since the access decision requires that access history on components of connections be used in conjunction with valid derivations to determine all possible derivable components. Highly deriva-
INFORMATION SECURE SYSTEMS
257
tion-selective protection and elemental-derivation-sensitive protection must also access the component data base to determine the validity of components, further raising their cost over that of derivation-some protection. Elemental-derivation-sensitive protection is more expensive than highly derivation-selective protection since a specified subset of a protected connection (rather than all of its elements) must be accessible to the system. The basic operation of an acccss control and concealment protection mechanism is as follows: whenever an access request requires that a component be revealed t o the user the information secure system must determine whether or not its revelation (and subsequent addition to DB.A) results in derived data items that are not permitted; when this happens the component may not be revealed. The degree of complexity introduced into a mechanism by requiring it to perform in this manner should not be underestimated. The system must retrieve a large amount of data and perform a considerable amount of processing to build up all derivation sequences. Adequate performance of the system may require some form of grouping or partitioning of components to facilitate rapid processing of derivations. 5.1 Information Secure Systems with Single-Attribute Protection
Since derivation-all access control mechanisms require no access history keeping and since semantic connections that are only single-attribute are easier to handle, there have been very good advances in developing information secure systems which can handle single-attribute-based data with effective access control and privacy protection mechanisms. McCauley called the security requirements which demand no history keeping on the part of the system the context-free protection specification (Hsiao et al., 1974a). More specifically, he defined the context-free protection specification as a protection specification which does not depend upon the previous access attempts (permitted or denied) by any of the users governed by the specification. An example of the context-free protection specification, known as TYPE.5 specification in Hsiao et al. (1974a) , is given as follows: TYPE.5 ( U , Q,
f:rI/)
where U is a set of users and Q is a boolean expression of data attributes. Note that this is a logical protection specification, and that the specification is posed completely in terms of the user’s view of the data base, i.e.,
258
DAVID K. HSIAO AND RICHARD I. BAUM
boolean expression of data attributes. It may be that the set of data (say, records) satisfying Q is null, in which case the specification has no effect. Let us illustrate the use of the TYPE.5 protection specification with a simple example. A small data base consists of ten records which are characterized by four different attributes. The record addresses and their attributes are depicted in Fig. 3 and the structure of the data base in Fig. 4. For the structure, we use the numbered circular node to denote the record at the address so numbered. Along with each edge directed t o a node there is an attribute indicating that the node (therefore, the record) is characterized by the attribute. If there are several edges directcd to a node, then the node is characterized by several attributes. For example, the record at 8 is characterized by attributes A1, As, and Aq. Obviously, Fig. 4 is a graphical representation of Fig. 3. In this discussion the directory is a special node which can only be accessed by the system, thus no attribute leading to the directory is known to the user. In general, the directory may be records; access to and protection of directories can be handled in the same way as records. However, for ease of discussion, we shall not consider the generalization in this example. Now consider the following specification of a protection pattern: TYPE.5 ( { U1 1, Q, deny) where Q = A2A ( (AIA
A,) V
(&A A d ) ) .
M
M 7
Addresses of Addresses of Addresses of Addreases of
8
9
10
Records Characterized by the Attribute Ai: 1, 2, 4, 5, 7, 8, 10 Records Characterized by the Attribute A2: 2, 7, 10 Records Characterized by the Attribute Aa: 1, 4, 5, 8 Records Characterized by the Attribute Ad: 3, 5, 6, 8,9
FIG.3. The records of a data base.
INFORMATION SECURE SYSTEMS
259
FIG.4. The structure of the data base.
This specification indicat,es that the system must deny user U1 access to any record in the data base for which Q is true. Thus, this user has only a portion of the data base for access and his view of the data base is depicted in Fig. 5. By comparing Figs. 4 and 5, we note that protected data (i.e., those that satisfy the TYPE.5 specification) are “concealed” and do not appear in the user’s view of the data base. To improve the system accessing performance and to isolate one data aggregate from another aggregate of different security requirements, McCauley applied the concept of the attribute-atom (Wong and Chiang, 1971). The attribute-atoms are minterms of the data attributes in the system. For example, in the data base as depicted in Fig. 3, there are four such attribute-atoms.
Attribute-atoms A i A A2A A,A
An
Addresses of records satisfy the atom 1, 4
A i A A2A ~ 4 3 AAn
2, 7, 10
A i A A2A A3A An
3, 6, 9
A i A A2A A3A A4
5, 8
260
DAVID K. HSIAO AND RICHARD I. BAUM
FIG.5. The user’s view of his data base.
Let us now return to the same TYPE.5 specification. The boolean expression Q can be cxpandcd into disjunctive canonical form as follows:
Q
=
A2A ( ( A i A
A,) V
= ( A i A AzA A,)V
(&A A4))
(AzA A3A A4)
= ( A i A AzA AsA A4)V
( A AAzA AsA A,)
V (AIA A2A & A A4)V ( A i A A2A A3A A4) By comparing the abovc four conjuncts derivcd from Q with the atoms of the data base, we learn that only thc conjunct ( & A A 2 &~A Ad) is an atom. Furthermore, we also learn that the records for which thc atom is true are the records at 2,7, and 10. Thus, we can conclude from the TYPE.5 specification that the user U1 is to be denied access to records at 2, 7, and 10. Thus, the uscr Ul’s view of his data base, as depicted in Fig. 5 , does not include the records at 2, 7, and 10. The use of atoms to partition the data base into mutually exclusive subsets for protection specification is powerful and effective. It is powerful because all retrievable information can be protected. Since any data retrieval is a response to the user’s query and a query is a boolean expression of attributes, the same expression can be used for protection specification. I t is also effective because the atom is a logical specification which is independent of the structure and implementation of the data, The reliance on attributes is not a restriction, since attributes may be either symbolic names in their most sophisticated form or numeric identifiers in their primitive form. Further, McCauley suggested that the data base is to be organiaed by security-atoms (Hsiao and McCauley, 1974). In this way, not only is it
261
INFORMATION SECURE SYSTEMS
never necessary to reorganize the data base no matter how frequently the protection specification is changed, but it is also possible to implement the logical partitions as physical “fire-walls” utilizing boundaries of memory units and storage devices. Let us illustrate these points with the same data base depicted in Fig. 3 and create a structure of the data base in Fig. 6, in which the numbered circular node denotes the record a t the address so numbered and the edge directed to a node is associated with the attributeatom which characterizes the node. We note in Fig. 6 that the list of records satisfying one attribute-atom does not intersect the list of records satisfying another attribute-atom, although they all begin a t the directory. Assuming that the TYPE.5 protection specification requires the Q to be (A2A A4) , then the data base can be partitioned into those records which satisfy (AZA A4) and those which do not. A partition of records is shown with dotted lines as depicted in Fig. 7. In this case, data satisfying (&A A4) are said to constitute a security-atom. It is interesting to note that the edges in the security-atom do not lead to records outside of the securityatom. I n addition, it is possible to have a security-atom as “fine” as the attribute-atom. In other words, data characterized by different attributes in this data base can be protected with different security requirements. Furthermore, the protected data are logically separated from other data and can be physically stored in separate storage units. If we interpret the edges as access paths, it is apparent that access to shareable data need not
>
FIQ.6. The attributeatom-baaed structure of the data base.
262
DAVID K. HSIAO AND RICHARD I. BAUM
FIQ.7. A security-atom-based data base structure.
pass through the protected data. Not only are protection and access precision improved, but the security restriction is also maintained. 5.2 Information Secure Systems with Context Protection
The work of McCauley essentially resolves the problem of protection of data that have one or more attributes in common. However, there are advances in the case of protection of semantic connected data whose attributes are different. I n other words, using the terminology established in Section 3, we will point out solutions to protect data ( X , Y ) whose attributes are x and y, where x and y arc not the same and whose semantic connection is a certain R(x, y). This is of course a more complex form of protection than the simple attribute one. Furthermore, it requires access history keeping. Nee introduced a semantic connection for protection Q(x, y) (Hsiao el al., 1974b), known as context protection, which determines the access rights to the set [XI (i.e., the set of all data items with attribute x) if the set [y] has been accessed, The determination is based on a set of enforcement and resolution rules which we will elaborate in the sequel. Meanwhile, let us familiarize ourselves with the notion of context protection and its terminology. We shall use some graphic representations again. In Fig. 8, the node 2 represents the set [XI of all data items with attribute x and is
INFORMATION SECURE SYSTEMS
263
called the text node. The node y is called the context node. These two nodes are connected by a directed edge from context to text. Let A ( x ) and A ( y ) denote the sets of access rights to [x] and [y], respectively. We have, in addition, a set S of access rights for access rights in A (y) . The user is denoted with a square node. then S = {S(al),S(az)),where S(a1) For example, if A ( y ) = (all a), and S ( a z )are sets of access rights associated with the edge and imposed by al and az,respectively. When an access to [y] is granted with access right ait o a user, the new access rights to [XI will be determined jointly by A ( x ) and #(a,). The ways to determine the new access rights to [x] based on the rights to [XI [i.e., A ( x ) ] and the imposed rights (Le., X) will be given later in this section. On the other hand, if the data set [y] have never been accessed, the imposed access rights have no effect on the subsequent Intuitively, A (5)and A(y) are used to determine acaccess t o data [XI. cess requirements of [XI and [y], respectively, when no context protection is involved. However, if there is any need of context protection, a set S of accesss requirements is imposed which exercises control over the access reThis control is related to every quirements of the text, in this case, [XI. access control requirement of the context [y]. I n other words, for each access requirement ai of A ( y ) there is a related set of access requirements S(a,) of S. The S(a,) will be used in conjunction with A ( x ) to form new With these new access conaccess control requirements for accessing [XI. trol requirements, the protection of [XI in the context of [y] can be assured. We note that in addition t o the edge directed from a context node to a text node, there are request and grant edges. When there are two or more context nodes in a user-directed graph, the 8’s may be subscripted by the names of the context nodes. For example, if y and z are two context nodes, then we may have S, and S,, each of which is associated with a n edge leading from the respective nodes. Let us consider a simple example of two data sets.
D1 can be printed (by a user). D2 can be printed. D1 cannot be printed if 0 2 has been printed. S
El FIG.8. A semantic connection of [XI and [y].
264
DAVID K. HSIAO AND RICHARD I. BAUM
From the above specifications, we can construct the relation C ( D l , D 2 ) with A ( D 1 ) = A ( D 2 ) = ( P } (P for print) and S = { S ( P ) )= (none), where “none” means no access at all. Graphically, none
Now the new access right of D1 will be “none” if 0 2 has been printed, since the rule (to be discussed in the sequel) says that the existing access right P to Dl must be replaced by the new right “none” for accessing D1. Two rules are now included which can determine the new access rights to a text node and govern the granting and denying of an access to a context node. The granting and denying of an access to a text node require no rules and are straightforward. Before introducing the rules, we introduce the following conventions. The access rights given in S are represented by lower case letters, e.g., r (read), w (write), p (print), etc. This is to distinguish them from the access rights which are initially granted to a circular node. Another point to be mentioned is that the set A (x), where x is a data set, usually consists of more than one access right. In order to distinguish which access right is granted to a user, the letter representing the granted right is flagged. Although a grant edge is needed each time a new right to the data set is granted to the user, multiple grant edges for the same data set are not necessary. Since granted rights are flagged, we can simply identify the flags. Thus, a single grant edge for the data set will suffice. The operations in the rules are modified set operations because A (2) , S, and A ( y ) are considered as sets of access rights. In the rules, “-”, “A”, and “t” are set difference, intersection, and assignment, respectively, which are constrained by the additional considerations. Formally, if A and B are two sets, then
A - B = ( x 1 x E A and z and
B
B),
A A B = ( x I x E A and z E B ) , A+B=
(z~xEB)
where z is an access right, and whether it is in a capital letter, small letter, or flagged letter, it is used as the same right. Thus, r, R, and R are treated in the set operations as the same element. For example, if A = (R, P ) and B = (Ithen ), A - B = (P). That is, R and r are considered as the same element in the set operation. Furthermore, small letters override the corresponding capital letters in the set intersection operation. For instance, if A = ( W , P ) and B = (w), then A A B = { w ) . Finally, all flags are
265
INFORMATION SECURE SYSTEMS
retained in the assignment and intersection operations in order to keep track of the granted accesses. Suppose A = ( W,P} and B = (w, e ) . Then
A t B = (G,e)
and
A A B
= {Sf].
Now the rules are given below. I n addition to the notations, CN for the context node, T N for the text node, we use the symbol “ar” to mean the access right to CN for which the user has just made an access request. Enforcement Rule i : If ar n A ( C N ) is not empty, then the access ar is permitted on C N . If ar n A ( C N ) is empty, then granting of the user’s request will result in a violation of context protection. Enforcement Rule 2 : Whenever an access ar is granted to C N the following resolution algorithm must be applied. 1. Since the access right ar is permitted to the data set x, flag ar in A (2). 2. Determine if there is a nonempty 8,. If none, the algorithm is complete. 3. Since S, # 0,replace A ( T N ) by A ( T N ) n S ( a r )
.
With the availability of the two rules, the access control mechanism of an information secure system can grant and deny requests and reveal violations. We now illustrate the application of context protection to job processing. Enforcement of a more advanced case of context protection. User data base: user name U ;data sets D1 and 0 2 . Protection specifications:
Sample 1:
D1 can be written or printed, i.e., W and P E A (Dl) ; 0 2 can be written or printed, i.e., W and P E A ( 0 2 ) ;and D1 cannot be printed if 0 2 has been printed, i.e., S. User-directed graph ( U D G ) :
@:
s = jS(W).SCP)} where S(W) = {w,p} and S I P ) = { w }
Jobs (For ease of discussion, we suppose that only one job is ever submitted by the user U.This job will be one of the following four jobs.) Job 1: Write D1 Write 0 2
266
DAVID K. HSIAO AND RICHARD I. BAUM
According to the security requirements, both requests of the job can be granted. The first request is granted because the proper access right W is in A ( D 1 ) . After the first request is granted, the UDG appears as follows:
where S ( W ) = { w , p } and S (P ) = { w }
Grant
We note that access right W to D1 is flagged, indicating that a write re quest has been granted. After applying the resolution algorithm of rule 2 the UDG remains unchanged since $01 is cmpty. The second request is cleared by enforcement rule 1 with A ( D 2 ) = { W, P]. After application of the resolution algorithm the final graph becomes
U
We note that the access rights to D1 are replaced by { i?,p ) as dictated in rule 2, since A ( T N ) n S ( a r ) implies A ( D 1 ) A S(W) = (n,p]. Job 2 : Write D1 Print 0 2 Again, both requests in the job can be granted according to the specifications. The second request is granted because the first request does not prohibit print access to 0 2 . The final UDG is
edge
edge
Job 3 : Print D1 Print 0 2 In this case both requests can be granted. The first request is granted since
INFORMATION SECURE SYSTEMS
267
P is in A ( D 1 ) . After applying Rule 2 the UDG becomes
Gront
When the second request is received, it too is granted, since P is still in A ( D 2 ) . The final UDG then becomes n
) not include P it Will not be possible to print Notice that since S D ~ ( Pdoes D1 again. Job 4: Print 0 2 Print D1 In this case only the first request of the job can be granted. The second is not granted because for 0 2 to be printed D1 must not have been printed. More specifically, the UDG after the first request is
It is now clear that the second request of the job for printing data unit D1 cannot be granted because P B A ( D l ). Sample 2: Enforcement of a more elaborate case of context protection involving several contexts.
User data base: user name U ; data sets D1, 0 2 , and 0 3 . Protection specifications:
D1 can be written or printed; 0 2 can be written or printed;
268
DAVID K. HSIAO AND RICHARD 1. BAUM
0 3 can be executed; D1 cannot be executed if D2 has either been printed or written; and D1 can be executed only if 0 3 has been executed.
It is worth noting that there are five protection specifications in this data base. The first three specifications dealing with data sets individually are therefore free from context protection considerations. The last two specifications are context-dependent. Furthermore, we note that the data set D1 is the text node of the context nodes 0 2 and 0 3 . More specifically, the above specifications can be represented symbolically as follows :
User-directed graph:
Jobs (The assumption that only one job is ever submitted by the user U for processing is intended to simplify the discussion. The one job will be either of the following two.) Job 1: Write 0 2 Print D1 Execute 0 3 The first request of the job can be granted since W
E A ( 0 2 ) . By employ-
INFORMATION SECURE SYSTEMS
269
ing the resolution algorithm the UDG becomes:
Note that A(D1) n SD2(W)is (w, p} n {W, P, E} = (w, p}. Since p E A ( 0 2 ) the second request is also granted. Since there is no Sol, the UDG appears as follows:
The third request is also granted since E E A ( D 3 ) . Since Sm(E) A A ( D 1 ) equaIs (e, w, p ) A {w, p}, which is (w, p], the final UDGis
Job 2 : Write 0 2 Execute 0 3 Execute Dl
270
DAVID K. HSIAO AND RICHARD I. BAUM
The first two requests can be granted, but the last one must be denied due to context-dependent access control requirements. After D2 has been written, the UDG for this job is the same as the first graph of Job 1. The second request is granted since E € A ( 0 3 ) . For this case we have A(D1) = {w,p}, S D I ( E ) = (e, w, p ) andso A(D1) S D ~ E=) {w, PI. Therefore, after the second request the UDG appears as
Now the third request cannot be granted since E f 7 A ( D l )
=
0.
6. Summary and Prospectus
By using a simple model we are able to show that an important part of an information secure system is the access control and privacy protection mechanism. Being represented as data in the data base, the information is rich in semantics. The semantically connected data structure has an overwhelming impact on the restrictiveness and complexity of the mechanisms. Restrictiveness is a measure of the amount of information denied to the user in the course of enforcing a security requirement and can be formalized in terms of protection precision. Complexity here is a reflection of the design and implementation difficulty of a mechanism in computer systems. We have formalized these notions and illustrated them with examples. Five mechanisms of increasing complexity and decreasing restrictiveness (i.e., increasing protection precision) are outlined. Some of their implementation considerations are discussed. Another mechanism, information-theoretic protection, is proposed as a solution to some of the problems of concealment protection. Some thoughts on information-theoretic protection are presented in an attempt to lay the groundwork for future study. Among mechanisms for concealment protection, we have reported the advancement on attribute-based mechanisms which protect data having one or more attributes in common. The advance in context protection is encouraging. Because context protection can enforce new access to data
INFORMATION SECURE SYSTEMS
27 1
sets in a context where other data sets have been accessed, there is the need for access history keeping. Not only original and new access rights to the data sets (i.e., text) must be remembered by the system, but the rights to the previously accessed data sets (i.e., context) must also be recorded by the system. Access history keeping is therefore time consuming and space consuming in an information secure system. Nevertheless, this advance represents the first breakthrough in protection of semantically rich data bases. We foresee that highly efficient information secure systems with an attribute-based protection mechanism will soon be realized in computer hardware. Computers with built-in security will likely have single- or multiple-attribute-based protection mechanisms. Software experimentation of context protection will not be far away. With decreasing cost of secondary memories and increasing performance of processors, the task of keeping certain amounts of access history may not be unrealistic. Theoretical work on more exotic access control and privacy protection will be forthcoming. The mode1 presented in this paper can serve as a theoretical framework for the study of protection mechanisms for future information secure systems. ACKNOWLEDGMENT The authors gratefully acknowledge the continuous support of the Office of Naval Research. This research was supported under grant N00014-67-0232-0022. We also wish to express appreciation to E. J. McCauley I11 and C. J. Nee, who collaborated in some of the research. We also thank our colleagues for reading and commenting on the work reported herein: Stockton Gaines of Rand, Giorgio Ingargiola of Caltech, Gerald Popek of UCLA, Michael Stronebraker of the University of California at Berkeley, and D. S. Kerr, H. S. Koch, and M. T. Liu of Ohio State. We acknowledge the influence on the choice of ordered pairs and relations for semantic modeling. The former is due to Hsiao and Harary (1970) and the latter is prompted by Codd (1970).
REFERENCES Codd, E. F. (1970). A relational model of data for large shared data banks. Commun. Ass. Comput. Mach, 13, No. 6, pp. 377487. Daley, R. C., and Neumann, P. G. (1965).A general purpose file system for secondary storage. AFZPS Con!. Proc., 1966 FJCC, Vol. 27, pp. 213-229. Dennis, J. B., and Van Horn, E. C. (1966). Programming semantics for multiprogrammed computations. Commun. Ass. Comput. Mach. 9, No. 3, pp. 93-155. Department of Navy. (1973). “ADP Data Security and Privacy: Proceedings of the Conference on Secure Data Sharing.” Naval Ship Res. Develop. Cent., Bethesda, Maryland. Graham, R. M. (1968). Protection in an information processing utility. Commun. Ass. Comput. Mach. 11, No. 5, pp. 365-369. Hansen, M. H. (1971). Insuring confidentiality of individual records in data storage and
272
DAVID K. HSIAO AND RICHARD I. BAUM
retrieval for statistical purposes. AFIPS Con!. Proc., 1971 FJCC, Vol. 39, pp. 579-585. Hoffman, L. J., ed. (1973). “Security and Privacy in Computer Systems,” IBM’s Reeource Secure System (RSS). Melville Publ., Los Angeles, California, Hsiao, D. K. (1968). A file system for a problem solving facility. Ph.D. Dissertation, NTIS No. AD671 826. University of Pennsylvania, Philadelphia. Hsiao, D. K. (1969). “Access Control in an On-Line File System,” FILE ORGANIZATION Selected Papers from FILE 68-An I.A.G. Conf., Swets & Zeitlinger, Amsterdam, pp. 240-257. Hsiao, D. K., and Harary, F. (1970). A formal system for information retrieval from files. Commun. Ass. Cornput. Mach. 13, No. 2, pp. 67-73. Hsiao, D. K., and McCauley, E. J., “A Model for Data Secure Systems (Part II),” Tech. Rep. OSU-CISRCTR-74-7. Computer and Information Science Research Center, Ohio State University, Columbus. Hsiao, D. K., Kerr, D. S., and McCauley, E. J. (1974a). “A Model for Data Secure Systems (Part I),” Tech. Rep. OUS-CISRC-TR-73-8. Computer and Information Science Research Center, Ohio State University, Columbus. Hsiao, D. K., Kerr, D. S., and Nee, C. J. (1974b). “Context Protection and Consistant Control in Data Base Systems (Part I),” Tech. Rep. OSU-CISRC-TR-73-9, Computer and Information Science Research Center, Ohio State University, Columbus. IBM (1970). “The Considerations of Data Security in Data Processing Operations,” G520-2169. IBM, White Plains, New York. IBM. (1972a). “Controlled Access System,” G520-2540. IBM, White Plains, New York. IBM. (1972b). “The Considerations of Physical Security in a Computer Environment,” G520-2700. IBM, White Plains, New York. IBM. (1972~).“The Fire and After the Fire . . . ,” G520-2741. IBM, White Plains, New York. IBM. (1972d). ‘‘42 Suggestionsfor Improving Security in Data Procmsing Operations,” G520-2797. IBM, White Plains, New York. IBM. (1973). “Data Security Symposium, April 1973, ”G520-2838. IBM, White Plains, New York. Lampson, B. W. (1969). Dynamic protection structures. AFZPS Con!. Proc., 1968 FJCC, Vol. 35, pp. 27-38. Lampson, B. W. (1971). “Protection,” Proc. 5th Annu. Princeton Conf. Inform. Sci. Syst., pp. 437-443. Department of Electrical Engineering, Princeton University, Princeton, New Jersey. LeClerc, J. Y. (1966). “Memory Structures for Interactive Computers,” Tech. Rep. Proj. Genie Doc. 40-10-1 10. University of California, Berkeley. Organick, E. 1. (1972). “The Multics System: An Examination of its Structure.” M I T Press, Cambridge, Massachusetts. Reed, S. K., and Brainstad, D. K., eds. (1974). “Controlled Accessibility Workshop Report,” NBS Tech. Note 827. Nat. Bur. Stand., Washington, D.C. Scherr, A. L. (1973). Functional structure of IBM virtual storage operating systems. Part 11. OS/VS2-2 concepts and philosophies. IBM Syst. J. 12, No. 4, 382-400. Schroeder, M. D., and Saltrer, J. H. (1972). A hardware architecture for implementing protection rings. Commun.Aaa. Cornput, Mach. 15, No. 3, pp. 157-170. Wibrow, J. H. (1971). A largescale interactive administrative system. IBM Syst. J . 10, No. 4, 260-2232, Wong, E., and Chiang, T.C. (1971). Canonical structure in attribute based file organization. Cmnaun. A$$. Comput. Mach. 14, NO. 9, pp. 59-597.
Author Index Numbers in italics refer to the pagss on which the complete referenm are listed.
A
E
Afuso, C., 187, 190, 191, 192, 193, 195, 228, 229 Agullo, J., 228 Aho, A. V., 2, 8, 9, 13, 32, 41, 82, 103, 106,116, 127, 132, 166, 184 Anderson, G. C., 196, 228 Armstrong, C. W., 229
Earley, J., 76, 79, 122, 127, 184 Elgot, C. C., 26, 42 Esch, J. W., 187, 190, 191, 193, 197, 198, 228, 229 Even, S., 20, 42
B Baker, F. T., 46, 72, 75 Baker, T., 39, 41 Book, V. R., 29,42 Borodin, A. B., 1, 2, 42 Bouckaert, M., 139,184 Brainstad, D. K., 234, 872 Biichi, J. R., 26, 42
C Cheatham, T. E. Jr., 75 Chiang, T. C., 259, 272 Codd, E. F., 271, 271 Cook, H. J., 229 Cook, S., 3,8, 9, 10,20,32,42 Coombes, D., 207, 212, 228 Cutler, J. R., 207, 213, 228
F Fagin, R., 8, 42 Ferrate, G. A., 228 Finnie, B. W., 196, 228 Fischer, M. J., 26, 42, 140, 141, 142, 145, 166,184
G Gaines, B. R., 190, 228, 229 Gallaire, H., 181, 184 Garey, M. R., 8, 42 Gill, J., 39, 41 Gilstrap, L. O., 229 Golomb, S. W., 196, 229 Gottlieb, C. C., 76 Graham, R. M., 234,237, 271 Graham, S. L., 181, 184 Greibach, S. A., 30,@, 79, 119, 176,186
H D Dahl, O-J., 46, 75 Daley, R. C., 239, 271 Denning, P. J., 75 Dennis, J. B., 234, 271 Dijkstra, E. W., 46, 76
Hansen, M. H., 255, 271 Harary, F., 271, 87.9 Harrison, M. A., 181, 184 Hartmanis, J., 1, 2, 5, 6, 10, 12, 16, 20, 22, 27,29,30,31,42,@, 183,185 Hirsch, J. J., 229 Hoare, C. A. R., 46, 76, 76 273
274
AUTHOR INDEX
Hoffman, L. J., 232, 272 Hopcroft, J. E., 1,2, 5, 8,9, 13,22,28,32, 41, 42, 82, 83, 103, 106, 116, 166, 177, 182, 183, 184, 185 Hotz, G., 139, 186 Hsiao, D. K., 234,240,257,260,271, 272 Hunt, H. B., III., 2, 5, 8,12,20,21,23,24, 27, 29, 42
J Johnson, D. S., 8, 42
K Karp, R., 8, 13, 42 Kasami, T., 79, 107,185 Kerr, D. S., 257, 279 Knuth, D. E., 76 Korfhage, R. R., 76 Korn, G. A., 196, 229 Kuroda, S. Y., 27, 42
L Lampson, B. W., 239, 272 Landweber, P. S., 27, 42 LeClerc, J. Y., 236, 272 Lewis, P. M., 22,4S, 183,185 Low, J. R., 69, 76
0 Olaravria, J. M., 2.29 Oppen, D. C., 26, 43 Organick, E. I., 232, 239, 272
P Pager, D., 139, 185 Papoulis, A., 229 Peterson, W. W., 188, 229 Petrovic, R., 191, 2.99 Pirotte, A., 139, 184 Poigjaner, L., 228 Poppelbaum, W. J., 187, 190, 199, 202, 216, 229 Pratt, V., 6, 31, 39, .@ Pressburger, M., 25, 26, 43 Probert, R. L., 140, 145, 184
R Rabin, M. O., 6, 26, 31, 39, 42, 43 Rangel, J. L., 24, 43 Reed, S. K., 234, 272 Ribeiro, S. T., 190, 229 Ring, D., 187,205,211, 2.99 Roberts, G. T., 196, 228 Rogers, J., Jr., 6, 10, 27, 43 RUZZO, W. L., 181,184 Ryan, L. D., 194, 199,929
S
M McCauley, E. J., 260, 272 McNaughton, R., 19, 43 Marvel, 0. E., 196, 199, 2.99 Meyer, A. R., 2, 4, 22, 26, 43, 141, 142, 166,184 Miller, G. L., 16, 43 Mills, H. D., 46, 76 Minsky, M., 17, 38, 43 Munro, I., 2, 42, 141, 142, 166, 186 Myhill, J., 27, 43
N Nee, C.J., 257, 272 Neumann, P. G., 239, l 7 l
Saltzer, J. H., 237, 272 Savitch, W. J., 4, 27, @I Scherr, A. L., 233, 272 Schroeder, M. D., 237, 27d Schugurensky, C. M., $89 Shank, H., 10, 16,48 Siljak, D., 191, 229 Simon, J., 6, 30, 31, 42, 43 Snelling, M., 139, 184 Solovay, R., 39, 41 Standish, T. A., 53, 76 Stearns, R. E., 22, 43, 183, 186 Stockmeyer, I,. J., 2, 4, 6, 8, 21, 31, 39,
4% 43
Strassen, V., 140, 145, 185 Szymanski, T., 8, 42
275
AUTHOR INDEX
W
T Taft, E.A,, 53,55, 76 Tarjan, R.E.,20,42 Tausworthe, R. C.,196,289 Tompa, F.W., 76 Torii, K., 107,185 Townley, J. A., 75, 128,185
U
Warshall, S., 52, 76 Wegbreit, B.,56,71,75, 76 Wimbrow, J. H., 233,HI Winograd, T.,76 Wirth, N., 76 Wo, Y. K.,199,202, 230 Wong, E.,259, $72
Ullman, J. D.,2, 5,8,9,13,22,28,32,4 1 , Y 42, 82, 83, 103, 106, 116, 127, 132, Yamada, H., 19,43 166, 177,182,183,184,186 Younger, D.H.,79, 107,186
V Valiant, L. G.,79, 140, 141,185 Van Horn, E.C., 234,271 von Neumann, J., 187,230
2 Zirphile, J., 229
Subiect Index
A
Authority items, in information security, 234, 240 Averaging for higher accuracy in burst processing, 216-217 in statistical representation, 187-190
Accepting states, Turing machine and, 103 Access control levels of, 236-238 matrix in, 23%240 Access control information, 238 Access control mechanism B data base semantics and, 245-246 in information security, 236-238 BASIC, for medium-scale programs, 49logical, w e Logical access control 51, 55 mechanism Binary tree Access-list systems, 239 nonassociative products and, 149-158 Access matrix simplest, 153 capability-list systems in, 239 smallest, 156-157 in information security, 239-240 typical, 156 ADDUP programming, for random access Block sum burst adder or subtractor, 222 machines, 36-37 Block sum register, in statistical repreADP systems, information security and, sentation, 218-224 233 Boolean matrix multiplication, Strassen’s ALGOL algorithm and, 145-149 context-free grammar and, 87 Boolean operations, in random access for large-scale programs, 52 machines, 31 ALGOL-60,61 BP, w e Burst processing ALGOL-68, 54,61 BREAK procedure, in programming, 65 Algorithms, deterministic vs nondeter- BSR,see Block sum register ministic, 3 BUM, 8ee Bundle machine nee also Cocke-Kasami-Younger algo- Bundle machine rithm; Earley’s algorithm; Strassen’s examples of, 211-215 algorithm; Valiant’s algorithm safe, 212-213 Ambiguous grammar, 82 Bundle processing failsafe features of, 209-210 APE (autonomous processing element) system, 202-205 notation for, 206 APL system optical bundles in, 227 vs PPL system, 54-55 remapped, 208-209 for small-scale programs, 49-50 in statistical representation, 205-211 Arithmetic Burst multiplication, 226 in burst processing, 216 Burst processing probability theory and, 191 arithmetic in, 188 Association of Computing Machinery, 233 averaging in, 216 ATE (automatic teat equipment), 52-53 future applications of, 227-228 Audio encoding and decoding, in burst low precision arithmetic in, 216 objective of, 216 processing, 225 276
277
SUBJECT INDEX
overview of, 216-224 preliminary results in, 224-226 in statistical representation, 188 Burst representation, principles of, 218
C Capabilities, in information security, 234 Cascade multiplier, 222-223 Case statement, in programming, 6 5 4 6 Central submatrices, matrix reduction and, 158-160 Chomsky normal form for context-free grammar, 93, 97, 140 in parsing of context-free grammar, 108-109 COBOL, for large-scale programs, 51-53 Cocke-Kasami-Younger algorithm, for context-free languages, 107-122, 125126, 140 Cocke-Kasami-Younger recognition matrix, 136, 141 Complex programs, abstract version of, 73 Composition Principle, 154 Computation feasible, see Feasible computations memory bounded, 17-27 Computational complexity global properties of, 2 theory of, 1-6 Computing, quantitative aspects of, 1-2 Computer companies, 231, 234 Concealment protection in information security, 246-247 limitation of, 253-254 Constraints, in programming systems, 74 Context-free grammar ambiguous, 82 Chomsky normal form for, 93, 97, 108109, 140 cycle-free, 98 defined, 79-80 Greibach normal form for, 95-96 linear, 119-120 linear normal form of, 120 recognition problem in, 83 reduced unambiguous, 132 “useless” rules for, 87 Context-free languages
Cocke-Kasami-Younger algorithm and, 107-122 formal languages and, 79-80 “hardest,” 78, 176-180 parsing of, 77-184 Context-free protection specification, 257 Context mode, in information security, 263 Context protection, in information secure systems, 262-263 Control Data Corporation, 233 Controlled accessibility, workshop on, 233 CRAM model, 34-35
D Defense Department, U.S., 232-233 Derivation-all protection, in information security, 247 Derivation-selective protection, 247-248 Derivation-some protection, 248-249 Deterministic lba, 28 DOTBEFORE variable, in Earley’s algorithm, 133-136 Dyck set, nondeterministic version of, 177
E Earley recognition matrix, 122 parsing algorithm and, 136-139 Earley’s algorithm, 181 correctness of, 126 as on-line recognition algorithm, 127 in parsing of context-free languages, 122-139 recognition algorithm and, 122-126 time and space bounds of, 128-136 ECL system, 47 defined, 55 n. extended mode facility in, 67-69 procedures in, 63-65 Effectively computable function, 6 EL1 language, 47, 54-69 basic goals for, 56-58 Elemental-derivation-sensitiveprotection, in information security, 249-250 Ellsberg case, 231 Entity-column approach, in information security, 239 Ergodic checking circuit, 211 ERGODIC machine, 213-216
278
SUBJECT INDEX
Ergodic processing, in statistical representation, 205-21 1 Ergodic strobe, 211 Extended mode facility, in ECI, programming, 6 7 4 8 Extensible languagcs, in small-scale programs, 53-54
F Feasible computations defined, 7 languages as problems in, 2 memory-bounded computations and , 17-27 nondeterminism and, 6-16 nondeterministic tape computaOions in, 27-30 random access machines and, 30-39 structure of, 1-41 FIND function, in random access machines, 33-35 First-order propositional calculus, time sentences for, 21 FORTRAN, 49-50, 55 for medium-scale programs, 51 Fourier transform, by TRANSFORMATRIX, 203 Function, effectively computable, 6-7
G Generalized Riemann hypothesis, 16 Grammar context-free, 79-80, 82, 93-97, 108-109, 119-120, 132, 140 equivalent, 80 language generated by, 80 non-left recursive, 97 unambiguous, 82 Greibach normal form, for context-free grammar, 95-96
Honeywell, Inc., 233 Hydrodynamic simulation, TSI’ and, 227
IBM, see International Business Machines Corp. IMS, see Information Management System Information, derivation-seleetive protection of, 247-248 Information Management System, 232 Information protection, context-free protection specification in, 257 Information secure systems, 231-271 see also Information security building of, 256-270 context mode in, 263 context protection in, 262 coverage of, 234 security-atoms in, 260-261 with single-attribute protection, 257262 Information security see also Information secure systems access control and privacy protection in, 236-238 authority items in, 234, 240 capabilities in, 234 concealment protection in, 246-247 data base partitioning in, 245-246 data base system in, 244-246 derivation-all protection in, 247 derivation-sensitive protection in, 249250 derivation-some protection in, 24&249 highly derivation-selective protection in, 249 information-theoretic protection and, 254-255 jeweler’s problem in, 235-236 limitation of concealment protection in, 253-254 logical access control mechanisms in, 237-238 logical level of access control in, 237-238 memory protection in, 235-236 physical protection and access control mechanisms in, 236 procedure protection in, 236 protection of single-attribute data in, 250-252 protection “keys” in, 236
279
SUBJECT INDEX
protection threhsolds in, 255 ring mechanism in, 234, 237 shared information in, 235 supervisory calls in, 237 Information sharing, security and, 235 Information-theoretic protection, 254-255 Inherently ambiguous language, 82 International Business Machines Corp., 231, 233
J Jeweler’s problem, in information security, 235-236
K Kleene star, in tape-complete problem, 19-20
L Language(s) context-free, see Context-free languages context-sensitive, 3-5 extensible, 53-54 for nonexpert programmer, 69-70 N P , 3-5 NP-complete, 10 in programming, 58-63 PTAPE, 3 4 tape-complete, 18 unambiguous, 82 Lba, see Linearly bounded automaton Left-recursive nonterminal set, 95 Linear grammars linear language generated by, 119 special case of, 119-122 Linearly bounded automaton, 27-30 LISP-2, 54 Logical access control mechanism attributes in, 241-242 data base semantics in, 244-246 in information security, 237-238 protection precision in, 246 sample data base in, 243 semantic connection in, 242 simultaneous access in, 245 understanding of, 241-254
Logical access control mode, in information security, 23&241 Logical theory, second-order, 26 Logic burst adder or subtractor, 222 Logic multiplier, 223
M Maryland, University of, 233 Massachusetts Institute of Technology, 232-233 Matrix, transitively closed, 141 Matrix multiplication, boolean, 145-149 Matrix reduction, central submatrices and, 158-160 Memory, saving of with nondeterministic computations, 27 Memory-bounded complexity classes, defined, 17 Memory-bounded computations, 17-27 Memory protection, in information security, 235-236 MITRE, 233 Mode behavior, user-defined, 67-69 Mode-behavior definition facility, in programming, 74 Modularisation, “tuning” process and, 47 M R A M , 34 see also Random access machines differentiated from R A M , 31-32, 39 simulation of Turing machines by, 38 Multiplexing, in pulse code modulation, 219
N National Bureau of Standards, 233 National Computer Conference, 234 National Science Foundation, 233 Natural languages, advances in study of, 77-79 Naval Ship Research and Development Center, 233 Neurophysiology, TSP applications in, 227 Nonarithmetic objects, aids for, 72 Nonassociative products, binary trees and, 149-158 Nondeterministic lba, 28 Nondeterministic tape computations, 2730
280
8UBJECT INDEX
Nonexpert programmer, aids for, 69-72 Nonterminal set, left or right recursive, 95 N P , problem in, 2-3 NP-complete language, 13-16 defined, 10 universal, 10 NP-TAPE language, 17-18, 27 NP-TIME language, 8-9 N-TAPE languages, 4
0 Office of Naval Research, 233 Off-line algorithm, in parsing of general context-free languages, 111 Off-line computation, for Turing machine, 104 Ohio State University, 233 On-line algorithm, in parsing of general context-free languages, 111 On-line computation, for Turing machine, 104 On-line recognition algorithm, Earley’s algorithm as, 127 Optical bundles, in bundle processing, 227
P Parsing Cocke-Kasami-Younger algorithm in, 107-122 Earley’s algorithm in, 122-129 of general context-free languages, 77-
I84 hardest context-free language and, 76180 linear context-free language and, 19121 recognition algorithm in, 107-112 rules in, 83 time and space bounds in, 181-184 upper bound for, 176 Valiant’s algorithm in, 140-176 Valiant’s lemma and, 149-166 Parsing algorithm for context-free languages, 112-115 in Earley’s algorithm, 136-139 PCM, see Pulse code modulation PENDING variable, in Earley’s algorithm, 133-136 Pennsylvania, University of, 233
Photocell, random pulses in, 226 Physical protection mechanisms, in information security, 236 PL/l programs in information security, 232 in large-scale programs, 52, 54 P = NP? problem, 3, 9, 41 POSTCOMP (portable stochastic computer), examples of, 197-198 PPL system, vs APL system, 54-55 PREDICT function, in Earley’s algorithm, 123, 128, 130, 133-134 Pressburger arithmetic, decision procedures for, 25-26 Privacy protection, in information security, 236-238 Probability theory, arithmetic operations and, 191 Procedure protection, in information security, 236 Programmer aids for nonexpert type of, 69-72 for medium-scale programs, 50-51 Programming, 45-75 aids in, 69-72 basic language in, 5-3 BREAK procedure in, 65 case statement in, 65-66 changes in, 73 classes of programs in, 48-53 closure in, 70-72 constraints in, 74 ECL procedures in, 63-65 extended modes in, 66-69 extensible languages in, 53-54 facilities for small-scale programs in, 53-55 mode in, 58-59 mode-behavior definition facility in, 74 mode-valued expression in, 62 and production of complex programs, 72-75 program reduction in, 70-72 rewrite facility in, 75 software tailoring and packaging in, 69-70 structural, see Structural programming testing aids in, 75 user-defined mode behavior in, 66-69 variables in, 58
28 1
SUBJECT INDEX
Programming languages, spectrum of, 45 Programming system, formation of, 72-75 Program reduction, 7&72 Programs classes of, 48-52 complex, 72-75 large-scale, 51-53 medium-scale, 50-51 small-scale, 49-50 Protection thresholds, in information security, 255 PTAPE languages, 3-4, 17, 27 and first-order propositional calculus, 21 PTZME language family, 5-8 M R A M and, 40 vs PZ'APE, for Turing machines, 30-31 Pulse, appearance of in time stochastic processing, 190 Pulse code modulation multiplexing in, 219 statistical representation in, 188-190 time stochastic and burst processing systems in, 188-190
Q Quantized quasi-random analog signal, 196 Quasi-noise, in phototransistor output encoding, 200 Quasi-random number generator, 196 Quasi-random number sequence defined, 196 generation of, 195-197
R R A M model, 34-35 see also Random access machines Rand Corporation, 233 Random access machines complexity measures for, 32 feasible computations with, 30-39 FIND function in, 33-35 vs Turing machine, 103, 105-106, 115116 Random pulses, examples of, 22f3-227 generation of, 195-197 RASCEL (regular array of stochastic computing elements), &s time stochastic machine, 198-199
Recognition, as transitive closure problem, 141 Recognition algorithm, Chomsky normal form of grammar and, 107-108 Resource Secure System, 232 Right recursive nonterminal set, 95 Ring mechanism, in information security, 234, 237 RSS, see Resource Secure System
S SABUMA (safe bundle machine), 212-214 Second-order logical theory, 26 Security-atoms, in information secure systems, 260-261 Semantic connection, in logical access control mechanisms, 242 Single-attribute data protection, 250-252 information secure systems with, 257262 Small-scale programs, facilities for, 53-55 Software, tailoring and packaging of, 6970 Southern California, University of, 233 Space complexity, Turing machine and, 103-104 Spring Joint Computer Conference, 231 SRPS, see Synchronous random pulse sequence Statistical processing, outlook for, 226-228 Statistical processors, 187-228 bundle processing and, 205-211 ergodic processing of, 205-211 fluctuations and precision of stochastic sequences in, 194-195 generation of random and quasi-random sequences in, 195-197 time stochastic machines and, 197-205 Statistical representation, 187-190 autonomous processing element system in, 202-205 block sum register in, 218 burst processing in, 188, 216-226 defined, 187 pros and cons of, 187-190 time averaging in, 187 time stochastic processing in, 188 transducer problem in, 189-190 weighted vs unweighted systems in, 187
282
SUBJECT INDEX
Stochastic processing, bundle processing and, 205 Stochastics, “fundamental trick” of, 191 Stochastic sequences, fluctuations and precision of, 194-195 Strassen’s algorithm, boolean matrix multiplication and, 145-149 Structurally ambiguous grammar, 82 Structural programming clear programs and, 72 defined, 46-47 modularization in, 46-47 Submatrices, central, 158-160 Supervisory calls, in information security, 237 Synctironous random pulse sequences generation of, 195-197 mask and complementary mask in, 227 in time stochastic processing, 190-194 subtraction and division of, 192-193
T Tape-complete language, 18 construction of, 20 Tape-complete problem, Kleene star in, 19-20 Tape computations, nondeterministic, 2730 Time and space bounds for Earley’s algorithm, 128-136 in parsing of general context-free languages, 181-184 Time-averaging, in statistical representation, 187 Time-sharing Option, 232 Time slip cascade multiplier, 223 Time slip register, 223 Time stochastic machines, examples of, 197-205 Time stochastic processing arithmetic in, 188 hydrodynamic simulation and, 227 in ncurophysiology, 227 for numerical and communication purposes, 189 overview of, 190-194 pulse in, 190 in statistical representation, 188
synchronous random pulse sequence in, 190-194 Tm, see Turing machine TRANS(A) procedure, 167-168 Transducer problem, in statistical representation, 189-190 TRANSFORMATRIX, as time stochastic machine or parallel processor, 199-202 Transitive closure problem, recognition as, 141-145 Trees,binary, see Binary trees TRW-, Inc., 232-233 TSO, see Time-sharing Option TSP, see Time stochastic processing Turing machine unpinput for, 104 accepting states and, 103 algorithmic analysis and, 104 deterministic vs. nondeterministic, 5, 12, 20 effectively computable function and, 6-7 finite state control for, 103 and hardest context-free language, 176 implementation of in Earley’s algorithm, 139 instantaneous description in, 13 linearly bounded automaton and, 27-30 MRAM languages and, 34-35 for NP-complete language, 10 N P problem and, 2-3 off-line computation in, 104 in parsing of general context-free languages, 115-119 in polynomial tape or time, 17 and P = NP? problem, 41 P T I M E vs P T A P E for, 30-31 valid and nonvalid computations for, 13-15, 19, 22 vs random access computer, 115-116, 176 simulation of by MRAM’s, 38 space complexity of, 103 tape length in, 24-25
U Unambiguous grammar, 82 Univac, Inc., 233 Universal N P language, 10
283
SUBJECT INDEX
Unweighted systems, in statistical representation, 187 User-row approach, in information security, 239
Valiant’s lemma, 149-166, 170-172 statement and proof of, 161-166 Vernier divider, principle of, 224 Vernier encoding and addition, in burst processing, 225
V
W
Valiant’s algorithm, 140-176, 181 in computing D+ in less than O(n*) time, 161-176 and recognition as transitive closure problem, 14I- 145 Strassen’s algorithm and, 145-149
Watergate affair, information security and, 23 1 Weighted systems in statistical representation, 187 World-Wide Military Command and Control System, 232
Contents of Previous Volumes Volume 1 General-Purpose Programming for Business Applications CALVINC. GOTLIEB Numerical Weather Prediction NORMAN A. PHILLIPS The Present Status of Automatic Translation of Languages YEHOUSHUA BAR-HILLEL Programming Computers to Play Games ARTHURL. SAMUEL Machine Recognition of Spoken Words RICHARD FATEHCHAND Binary Arithmetic GEORGEW. REITWIESNER Volume 2 A Survey of Numerical Methods for Parabolic Differential Equations JIMDOUGLAS, JR. Advances in Orthonormalizing Computation PHILIPJ. DAVISA N D PHILIPRABINOWITZ Microelectronics Using Electron-Beam-Activated Machining Techniques KENNETH R. SHOULDERS Recent Developments in Linear Programming SAULI. GLASS The Theory of Automata, a Survey ROBERTMCNAUGHTON Volume 3 The Computation of Satellite Orbit Trajectories SAMUEL D. CONTE Mu1tiprogramming E. F. CODD Recent Developments of Nonlinear Programming PHILIPWOLFE Alternating Direction Implicit Methods GARRETBIRKHOFF,RICHARD S. VARGA,AND DAVIDYOUNQ Combined Analog-Digital Techniques in Simulation HAROLD F. SKRAMSTAD Information Technology and the Law REEDC. LAWLOR Volume 4 The Formulation of Data Processing Problems for Computers WILLIAMC. MCGEE All-Magnetic Circuit Techniques DAVIDR. BENNION AND HEWITT D. CRANE Computer Education HOWARD E. TOMPKINS 284
CONTENTS OF PREVIOUS VOLUMES
Digital Fluid Logic Elements H. H. GLAETTLI Multiple Computer Systems WILLIAM A. CURTIN Volume 5 The Role of Computers in Election Night Broadcasting JACK MOSHMAN Some Results of Research on Automatic Programming in Eastern Europe WLADYSLAW TIJRKSI A Discussion of Artificial Intelligence and Self-organization GORDON PASK Automatic Optical Design ORESTESN. STAVROUDIS Computing Problems and Methods in X-Ray Crystallography CHARLES L. COULTER Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL An Introduction to Procedureoriented Languages HARRY D. HUSKEY Volume 6 Information Retrieval CLAUDE E. WALSTON Speculations Concerning the First Ultraintelligent Machine IRVING JOHN GOOD Digital Training Devices CHARLES R. WICKMAN Number Systems and Arithmetic HARVEY L. GARDER Considerations on Man versus Machine for Space Probing P. L. BARGELLINI Data Collection and Reduction for Nuclear Particle Trace Detectors HERBERT GELERNTER Volume 7 Highly Parallel Information Processing Systems JOHN C. MURTHA Programming Language Processors RUTHM. DAVIS The Man-Machine Combination for Computer-Assisted Copy Editing WAYNE A. DANIELSON Computer-Aided Typesetting WILLIAM R. BOZMAN Programming Languages for Computational Linguistics ARNOLDC. SATTERTHWAIT Computer Driven Displays and Their Use in Man/Machine Interaction ANDRIESVAN DAM
285
286
CONTZINTS OF PREVIOUS VOLUME8
Volume 8 Time-shared Computer Systems THOMAS N. PYEE,JR. Formula Manipuiation by Computer JEAN E. SAMMET Standards for Computers and Information Processing T. B. STEEL,JR. Syntactic Analysis of Natural Language NAOMISAQER Programming Languages and Computers: A Unified Metatheory R. NARASIMHAN Incremental Computation LIONELLO A. LOMBARDI Volume 9 What Next in Computer Technology? W. J. POPPELBAUM Advances in Simulation JOHN MCLEOD Symbol Manipulation Languages PAT-L W. ABRAHAMS Legal Information Retrieval AVIEZRIS. FRAENKEL Large Scale Integration-an Appraisal L. M. SPANDORFER Aerospace Computers A. S. BUCHMAN The Distributed Processor Organization L. J. KOCZELA Volume 10 Humanism, Technology, and Language CHARLES DECARLO Three Computer Cultures: Computer Technology, Computer Mathematics, and Computer Science PETERWEQNER Mathematics in 1984-The Impact of Computers BRYAN THWAITE6 Computing from the Communication Point of View E. E. DAVID,JR. Computer-Man Communication: Using Computer Graphics in the Instructional Process FREDERICK P. BROOKS, JR. Computers and Publishing: Writing, Editing, and Printing ANDRIESV A N DAMAND DAVIDE. RICE A Unified Approach to Pattern Analysis ULF GRENANDER Use of Computers in Biomedical Pat,tern Recognition ROBERTS. LEDLEY
CONTENTS OF PREVIOUS VOLUMES
287
Numerical Methods of Stress Analysis WILLIAMPRAGER Spline Approximation and Computer-Aided Design J. H. AHLBERG Logic per Track Devices D. L. SLOTNICK Volume 11
Automatic Translation of Languages Since 1960: A Linguist’s View HARRY H. JOSSELSON Classification, Relevance, and Information Retrieval D. M. JACKSON Approaches to the Machine Recognition of Conversational Speech KLAUSW. OWEN Man-Machine Interaction Using Speech DAVID R. HILL Balanced Magnetic Circuits for Logic and Memory Devices R. B. KIEBURTZ AND E. E. NEWHALL Command and Control: Technology and Social Impact ANTHONYDEBONS Volume 12
Information Security in a Multi-User Computer Environment JAMES P. ANDERSON Managers, Deterministic Models, and Computers G. M. FERRERO DIROCCAFERRERA Uses of the Computer in Music Composition and Research HARRYB. LINCOLN File Organization Techniques DAVIDC. ROBERTS Systems Programming Languages R. D. BERGERON, J. D. CANNON, D. P. SHECTER, F. W. TOMPA, A N D A. VAN DAM Parametric and Nonparametric Recognition by Computer: An Application to Leukocyte Image Processing JUDITH M. S. PREWITT Volume 13
Programmed Control of Asynchronous Program Interrupts RICHARD L. WEXELBLAT Poetry Generation and Analysis JAMES JOYCE c 8 Mapping and Computers PATRICIA FULTON F 1 Practical Natural Language Processing: The REL System as Prototype G 2 FREDERICK B. THOMPSON AND BOEENA HENISZ THOMPSON Art,ificid Intelligence-The Past Decade J 5 B. CHANDRASEKARAN
ED
This Page Intentionally Left Blank