W
J: L. Jolley c
Data Study
World University Library M c G r a w - H i l l Book Company N e w York Toronto
@ J. L. ...
55 downloads
1013 Views
61MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
W
J: L. Jolley c
Data Study
World University Library M c G r a w - H i l l Book Company N e w York Toronto
@ J. L. Jolley 1968 Library of Congress Catalog Card Number 67-22980 Phototypeset by BAS Printers Limited, Wallop, Hampshire, England Printed by Officine Grafiche Arnoldo Mondadori. Verona, Italy
Contents
0 Introduction 1 Items and features
2 Questions and vehicles 3 Identity and quantity
4 Many disguises
5 Orderly description
6 The semantic continuum 7 The behaviour of subcodes
8 Direct and textual coding
9 The pattern of meaning 10 Separate data vehicles 1 1 Connected vehicles and computers 12 Relations 13 Setting up a system
Bibliography Index
Acknowledgments
Acknowledgment is due to the following for kindly supplying the photographs in this book. The number refers to the page on which the photograph appears. 32-33,62-63,191 J. L. Jolley and Partners Ltd; 61,178,182 ICT Ltd; 171 Copeland-Chatterson CO Ltd; 172,173 C.W.Cave and CO Ltd; 177 De La Rue Bull Machines Ltd ; 180 Friden Ltd ; 181 Kalamazoo Ltd ; 184 Adrema Ltd; 185,187,194 Carter-Parratt Ltd ; 189 Synoptic International ; 200 Kodak Ltd; 205 Solartron Ltd The diagrams were designed by Leonard Whiteman, Gordon Cramp Studio.
.
- !.P ', )*A .a -
a...
4-
.?
;-&* I
-q
:,,-<.!-P L'.
.
Information
To begin with Chapter Nought looks like a gimmick to make a dry subject amusing. In the present case, however, it is an attempt to be consistent. In many different parts of our subject nought is the beginning. Public libraries, for example, often arrange the nonfiction in a numerical order along the shelves, the numbers being related to subject matter, and beginning with 000. On the face of most punched feature cards the first position is 0. The decimal system of numerals begins with 0, if its ten characters are arranged in ascending order. Binary, octal, and other systems also begin with 0, for 0 is the orign, zero, the place we start from before some change in the situation gves us some information. Information, whose handling is the topic of this book, is generated by change, and whatever is our unit of change is our unit of information. In a situation in which only three possible moves can be made, only three fundamental units of information can be found, no matter how complex a structure we may build upon them. Our topic is therefore a special sort of change, the change brought about in acquiring or keeping or transmitting elements of knowledge. Gaining, storing and transmitting knowledge is a very widespread activity. The following chapters are meant for all who have found the need to record and handle data of any type, and who have discovered a hint of pattern in their work, a feeling that there are rules in operation, making some things easy and other things harder, relating the type of equipment in use to the type of question it is to answer, governing the behaviour of ideas much as the laws of arithmetic govern the behaviour of numbers. The object is to describe something of this pattern, to bring it into the light and to show a little of what we can learn from it. In so fundamental an affair, it would be surprising if much work had not already been done. Our subject has been trenched on by mathematics, logic, philosophy, semantics, librarianship, taxonomy, business, and indeed almost every other discipline we may consider. For some, data study is part of the material dealt with; for all, it is
necessary for the handling of that matenal. AS a rcsult, each has discovered the same design, or part of it, described it in its own words, and treated it from its own point of view. To choose a standard language, even for the purpose of a single book, is no easy matter. To employ the words preferred by one discipline may be to forfeit the goodwill, and even the understanding, of those trained in another; yet to invent new ones is to add complexity to a subject which is already overburdened. In these pages, an attempt is made to choose words carefully and to use each technical name in one sense only. The scope of the subject We may think of data study as the background of theory and of pattern-explanation from which we may derive rules for information handling. Information handling, conversely, is the practical aspect of data study. It is a technique based on manipulating things which represent other things; a sort of abstract mimicry. The things represented are usually too large, or too distant, or too intangible, or too numerous, or otherwise unsuited to being dealt with by themselves. We therefore provide a convenient substitute in the form of a card, a piece of paper, a strip of film, or something else which is easier to manage. The problems of information handling therefore include those of choosing a helpful substitute, which demands that we know a good deal about the way the substitute represents the behaviour of the objects or events with which we are concerned. We therefore need an understanding of this behaviour, which requires a knowledge of the possible relations between things and their characteristics. These are the relations we find in what is known as the data field. They are part of the studies of statistics, the calculus of classes, the algebra of sets. The data field, the collection of items (things represented) and features (their attributes) is the subject of the first part of this book. Armed with a knowledge of its properties we then turn to the way
things and their characteristics may be represented. Here we meet the problems of classification and coding, syntax and accidence, notation and translation. Much as the first part of the book revolves round the data field, so this part has a theme - the semantic continuum, which is the range of partial, single and multiple concepts with which data study is concerned. Other problems arise with those topics that are mathematically treated by the established discipline of information theory. These concern the measurement of information, the transmission of the marks which represent choices between alternative conditions, and the difficulties which arise when we encounter noise, the confusing background which garbles the message we send. Beyond representation, we meet embodiment and handling, the realms inhabited by the systems analyst, the computer programmer, the work study engineer, the organisation and methods officer, and the management consultant. We encounter the detailed design and the techniques of use of the hardware: the simple and not-so-simple devices which select, rearrange, reproduce, read, interpret and otherwise handle the cards, films or other data vehicles on which information is recorded. These are the final problems of our essay in mimicry, whose solution permits us, for example, to sort cards representing packing cases with a needle instead of shifting the cases themselves with the aid of a fork-lift truck. The w a y ahead
Dealing with information in large quantities is typical of midtwentieth-century technology, like ergonomics, bioengineering, cybernetics, or operational research. It is a curious mixture of the abstract and the concrete, jostling binary subcoding against research into the purchase of washing machines, the laws of Boolean algebra against the selection of personnel for job safety courses, and the pattern of meaning against the creation of private indexes of stamps, colour slides, plants, recipes, or ancestors. Some of its aspects (communications theory, computer programming, docu-
Figure 1.
D a t a study CS concerned w i t h : b
Facts The entities and attributes of daily life, fixed by the relations within and between them
1
1
Hardware The cards, tapes, films and other data vehicles and equipment by which these are imitated
I
4 Words The names these are given in everyday language
4
I
Software The rules which govern how the h a m 3 may be recorded on the vehicles and handled by the processing devices
mentation, for example) command a wide literature and a large body of specialists: each of these last may fairly think it a part of his own subject instead of a separate study. For those to whom the matter is new, the way ahead may lead into any such related field, whether it is one of the new cross-disciplines or one of the older, more majestic, studies. It may also lead direct into practice, as when one has to set up a specialised filing system for office or factory, an index for individual research, or a set of records for a team enquiry. Aid may be sought from the specialised 'manuals issued by the makers of equipment, or produced in the normal course of industrial publishing. At times the advice of a consultant may be valuable, even if only to speed up the rate at which discoveries of the way things work are made.
Within this field, this book seeks to provide a unified framework, to derive some simple rules, and to show the reader with a particular problem where that problem fits into the picture as a whole.
Summary Data study deals with the relations between things and their characteristics, with how these may be represented upon data vehicles, and with how the vehicles may be handled as a substitute for handling the things themselves. In treating this subject, data study impinges on a wide range of general and special activities, from office paper flow and the design of forms to statistics and logic. It is not concerned with describing these in depth, but with providing a background against which their contribution to the handling of information can be seen in clear relief.
Two types of term
To talk about information, we must agree upon standard names. The clear outlines of the subject have been hidden in a wilderness of words. Every art and science has developed its own vocabulary in which to discuss the way it handles its data. The salesmen of equipment have devised many trade names to make similar things look as different as possible, and the pundits of communications theory have knitted the whole together in a mathematical language whlch is distant from everyday speech. None of this is intended as criticism. Each art and science has a right to its own terminology, which may have rendered long years of service before any overlap with that of other studies was discovered. The job of the salesman is to sell, and he will not do so unless he makes his product desirably different from others. The most general, and therefore the most fertile, way of analysing a problem is by means of mathematics. To make inroads into the tangle, we must have a single word for all the things of which records must be kept. Information is what we know about these things. Let us call them items. In a personnel index the items are the people, in a library the items are the books, in an opinion survey the items are the completed sets of answers. Items may be objects, materials, processes, events, or anything else, real or imaginary. They may be heroes of science fiction, shapes, chemical elements, sales of paving slabs, bacteriophages or countries of the mind. What matters about them is that their features interest us. The features are the attributes of the items. They are important for many reasons, and we may start by thinking of their value in carrying out a search. When we are looking for something whose name or whose whereabouts is unknown, we describe it by listing the features it possesses, not necessarily all of them, but enough to distinguish it from other things. We then explore, and on finding it, we proceed to the next activity: we read it, or add it to others, or sell it, or write offering it an increase in salary, or put it on the
1
gramophone, according to our needs and the nature of the item we have discovered. It will be helpful to possess a word which means items-orfeatures. Let us therefore call both by the name of terms. In doing so we should recall that this name has sometimes been used to signify items exclusively, and at other times (as when 'term cards' are spoken of in documentation) has been used to mean what we here call features. This is a typical case of terminology trouble. However, in using the word to mean items, or features, or both together, we fall in line with the standard practice of logic. The data field : extensions
Items and features together make up a data field. We may think of this as a network of rows and columns, the rows representing features and the columns items, though it could well be the other way about. If an item possesses a feature, we may show this by placing a circular blob at the crossing of the two ; if it does not, we may place a ring there. If the crossing of item and feature is left empty, this may show either that information is lacking or that it is not relevant. In many cases, emptiness is used as a sign of absence, with the result that absence can be mistaken for true lack of knowledge. Presence and absence are equally positive in feel, and each requires a symbol of its own. As an immediate practical example, we may think of a set of cards in a personnel index. Each card bears a special mark if the person to whom it refers is male. The card for Pat Jones bears no mark; Pat may stand for Patrick or for Patricia; is Pat female? Or is he or she a newcomer whose name is all we know about him or her? If we also had a special mark for female, the answer (assuming a diligent indexing clerk) would be straightforward. In such an index, Pat is an item and male and female are features. The word 'Pat' and the words 'male' and 'female' are names for the item and features to which they respectively refer. It may be useful to employ the word 'name' to include single letters,
Figure 2.
C
0 .V)
C
a + X
a,
a
L
3
4-'
m
a,
LL
Item extension numbers, or code sequences as well as written or spoken words. Then we may note that, in order to speak of an item or a feature we must give the term concerned a suitable name. Further, we need a word for the horizontal and vertical extensions of the data field. The word 'extension' itself will do for the purpose. We may talk of the item extension and the feature extension of the field, and we may say these are transverse to each other.
The diagram (figure 2) shows five items and five features, and contains fourteen blobs, representing fourteen cases in which one of the items possesses one of the features. It also contains eleven rings, marking the cases in which an item does not possess a feature. Its items are named by the use of numerals, and remembering Chapter Nought, these begin at zero. Its features are named by letters. Binary terms and unary terms
If we consider row B in this data field, we see that, like the other rows and columns, it is composed of blobs and rings. Let us call such a situation 'binary'. For simplicity, we have said that row B, like the other rows, represents a feature, but it is clear that it really represents two features, one of which is absent and the other present. If we take the present feature to be bashfulness, then the absent one is the complement of bashfulness. In fact, row B represents two features, and we may coin the name 'ambifeature' for this, thus adding to the technical terminology of whose profusion we have already complained. As an excuse we must offer the hope that it makes it possible to talk about the data field with -greater ease. The use of the prefix 'ambi' also allows us to use the name 'ambiitem', and the name 'ambiterm' to mean either of these binary terms or both. Thus, ambiterms are binary, while terms are unary (shown in our diagram by blobs only, or by rings only). Furthermore, all our everyday thought is carried out by the use of present terms, and not by absent terms or by ambiterms. Normally, when we think of Tom (~tem4, shall we say) we think of him as he is, not as he is not. Our attention is directed to the set of blobs, the features which he posseses, and not to the set of rings, or to both blobs and rings at once. In accordance with ordinary common sense, we shall use the words 'item' and 'feature' and 'term' henceforth to mean present, unary, items, features and terms. This means we must add to our notation. Currently, a letter stands
against each row of cur data field for each ambifeature. Let us place a bar over it (thus: B) if we wish to refer to the absent component of the ambifeature, and underline it (thus: B) if we wish to refer to the present component. It can remain unmarked if the ambiterm itself is intended. The operation of switching B to B, and the reverse is called complementation, and the two terms are known as complements of each other. Between them, they fill out the entire extension of the data field. Data units: reciprocal membership
Presence and absence are conditions of a type we may call 'states'. They are the two states which a mark at the crossing of an item and a feature may bear. Let us call such a crossing a data unit. Such a unit appears in a unary form, either as present or as absent ;but in a sense it is binary, since it gives us information about both parts of any ambiterm of which it forms an element. For example, if we take A to mean 'artful', then A4 is the binary data unit at the crossing of Tom and artfulness, and also at the crossing of not-Tom and not-artfulness (naivety), since both Tom and not-Tom occupy one column and artfulness and naivety both occupy one row. The unit as displayed in our example of a data field shows (about artfulness) that Tom possesses it, and (about naivety) that he does not. A binary data unit is a special form of what is known as a 'bit' in information theory. A bit is, in effect, a statement of which choice is made when there are only two alternatives, and a data unit in our present sense is a bit restricted to the presence or absence of daily-life attributes in daily-life objects. We use two names to define a data unit - one being that of an ambiitem and the other that of an ambifeature. The reason why items and features, whether unary or binary, need only one name is that the transverse names which might be used are taken for granted; they form the entire set of such names in the transverse extension of the field. We need not write C 01234 when we can just write C.
1 l
i l
1 1 I
I
1 /
Absent data units act as complements of present units, as we should expect. With C meaning cheerfulness, we may examine the unit at C4, which tells us that Tom is not cheerful. It stands at the crossing of the absent item, not-Tom, with the absent feature, notcheerful, and therefore assures us that not-Tom is not not-cheerful. By cancelling any two, say the first and last, of these negatives, we achieve the desired result. If we were to change our views about which part of the ambifeature C we wished to treat as present, and to decide to entitle it 'gloomy', then all the data units in that row would change their state, unit C4 would gain a blob instead of retaining a ring, and would tell us 'Tom is gloomy7.Thus each data unit represents a statement: Eric (item we may decide) is energetic (feature E), and so on. A data field is a pattern of statements. Now let us take a data unit at random, say D3. Let 3signify Jack and let D represent discernment. Since D3 is present, it tells us that Jack is a member of the class or set of people who are discerning; and at the same time it tells us that discernment is a member of the set or class of qualities which go to make up Jack. We are concerned with a relation between members of different sets - between Jack, from the set of items, and discernment, from the set of features. This is a relation of reciprocal membership. The same relation obtains between Tom and artfulness (A), and between Charlie (1) and cheerfulness. It also holds between not-Jack and not-bashfulness, from which we may conclude that Jack is forward. There is no commonly accepted symbol for this relation. This arises from a special property of the data field, which we may call symmetry, the property that as items are to features, so features are to items. It does not matter which extension of the data field we take as representing which sort of term; the result is always the same and the arguments do not change. Consequently, people need consider only one extension of the field. The other extension works in the same way as the one they are studying. This being so, the relation normally described is simple one-way membership, not the reciprocal variety. It is symbolised by the character E. We may write BE&for instance, to show that artfulness
is a member of the set of features which makes up Tom. Transversely, we may write $EA.To combine these two, we may introduce a reciprocal membership sign, X , which enables us to write 4 X 4, X B, and the like. To be aware of this relation helps us to become familiar with the behaviour of the data field. If we wish to deal with binary terms (ambiterms) instead of with the unary sort, we may invert the symbol. 'A X 4 and 0 X D are quite unambiguous, the latter implying that 0 and _D are in reciprocal non-membership.
The state of the field We may choose either of the complementary parts of an ambiterm to be that which we take as present; thus the same information may be represented by many different patterns of blobs and rings, depending on the choice. This means that, in practice, every data field is the result of a process of standardisation. It will be useful to extend the meaning of the word 'state' to include complex displays composed of simple unary states of presence and absence. Thus an ambiterm which has two present and three absent data units in its make-up is in a state which is intermediate between entire presence and entire absence. If we think of the present data units only, we think of the present term. If we alter the way we regard a transverse term, by complementation, then the set of present units in the original term will change, and the state of the original ambiterm will alter. On the other hand, what it refers to in the outside world will not have altered in the slightest. Thus a term must be thought of as independent of its set of units, just as it must be thought of as independent of its name. Runs and subfields; collapse In discussing a data field, we may wish to mention, not only data units, but specified sets of these, occurring, for example, all ,hone ambiterm. Examples are B123 and CDE4. These may be named
l
1
l I
(
1
'runs'. An ambiterm is a run which stretches across the whole extension of the field and so does not need its transverse ambiterms cited as part of its name. A run is thus an ambisubterm - although this latter name is probably too outrageous for common use. We may also wish to mention a collection of data units so arranged that one occurs at each crossing of a set of ambiitems with a set of ambifeatures. An instance is ABD134, consisting of the data units Al, A3, A4, B1, B3, B4, D1, D3 and D4. Such an arrangement is simply a subfield. In the present case, if we were to drop C and E, 0 and 2 from our data field, the field would shrink to this subfield. A data unit may be thought of as a minimal data field, minimal subfield, minimal ambiterm or minimal run, and a run may be considered a subfield which happens to have only one ambiterm in one of its extensions. An example is A024. Following this line of thought, we may consider a subfield which happens to consist of data units which are all in the same state. In this circumstance, the subfield can collapse into a data unit. As an example, we may take ABD12 from our data field. Instead of using the name ABDl2, we could represent ABD by the signal letter X, which we could apply to all items possessing the three features concerned. We could also lump I and 2 together under a single identity number, such as 9. Then X9 would have exactly the same meaning as ABD12. Such collapse, and its opposite, expansion, has a large part to play in information handling. Expansion consists in representing one data unit by more than one, and collapse in representing many units by a single unit only. The two processes may take place in only one extension of the field or in both. They may move a subfield into a run and the reverse, and a run into a unit or the reverse. All that is required is that all the data units involved be in the same state. Since we can choose which of two complementary terms we shall treat as present, we have a certain ability to manipulate the field in order to ensure this, but the ability is not complete. These are some patterns of data units which remain obstinately binary however we play about with complementation.
Description
As we have implied in passing, the items and features of a data field describe each other. We recognise, for instance, feature D because it is made up of the collection of items 1,2and 3. Such a collection must be known to be unique, otherwise we have no assurance that it is not like some other collection in the field. If this is the case, the two collections will to all appearances be the same, that is, two examples of the same term. This is an entirely practical matter. Two items which are alike in every feature will serve as well as each other, and differences, if any, are outside the field. However, to be sure a collection is unique we must consider ambiterms, not merely features or items. For if we take D, the mere unary set L 2 and 3, we have no assurance that, somewhere in the field, we shall not find a present unit D4, thus making D another case of A. There are two ways in which we may convince ourselves that this is not so, that D is truly, and not just apparently, unique. The first is to find that the unit at D4 is absent. This, however, makes us concerned with a binary term, D in both its states. The other method is to possess a rule whereby features are always described by the same fixed number of items (or the transverse, if we are working in the other extension of the field). In our example, the number might be three, D's three being L, 2 and 1.In this case, A could not possess four present units, as it does in our example, but three only, and we should see straightaway whether these were different or the same as those possessed by Q. We should have a clear answer, but the requirement that a term be made of a fixed number of transverse terms is a demand that every term be binary, for it rules that every data unit in the ambiterm, apart from the number selected as present, must be absent. Commutation
A further quality of the data field is commutativeness. The order in which the terms occur along its extensions does not affect any
arguments based upon it. Thus, if our items ran 31042 along the top, and our features ran CABED down the s~de,unit A0 would still be absent and unit D3 would be present. Like switching items for features and features for items in our discussions, it makes no difference. The field is invariant under these types of operation. Invariance under commutation, the fact that changes of place within sets of items and features do not change the behaviour of the field, arises from the fact that all the terms of the field are available together, in no sort of order. They only appear to be ordered because we display them in two dimensions on paper, and because we use alphabets or numerals (which are ordered) to name them. The field as a textile We have already noted that a term is independent of its name and of the state of its set of data units. In our example, we can produce any pattern of data units we wish, down the 4 column (describing Tom) by a suitable choice of present features, and we can give Tom any alternative title. In handling a data field we should never quite forget that these preliminary standardising decisions have been made. Once made, the decisions make the field what it is. It may, for those whose imagination runs to pictures, be shown as a set of interwoven strips. We take the strips in one extension to be ambiitems, and those in the other to be ambifeatures. One face of each represents presence, and one represents absence. Complementation is, in effect, taking out a strip and then threading it through in position again but counterways - in where it came out, out where it went in. This corresponds to the fact that the ambiterm is the same free ambiterm (it is the same strip in the field) despite the change thus made. Every data unit in such a meshwork is shown at the crossing of one strip over another. If we take the front of the horizontal strips to stand for presence, then the front of the vertical strips represents
Figure 3. The data field as a textile. The field of figure 2 is shown here as the green portion of the woven strips representing the ambiterms.
absence. A present data unit appears when a horizontal strip is on top at a crossing, and an absent unit is shown when a vertical strip is on top. We may thlnk of reciprocal membership as the glue which holds the strips together where they cross. Every crossing sees the strips with their absent faces in contact or their present faces in contact. The collapse effect is now easy to observe. In this interwoven web, a subfield whose data units are all of the same state is represented by a set of strips which all overIie the set which travels in the other extension. There is no weaving in this case, nothing to hold the field together. All the strips in the top layer can be slid into coincidence with each other without interference; so can all those of the bottom layer; we are left with just one crossing, the typical single unit of the collapsed field or subfield. A picture of such a woven field is shown in figure 3. Fields, nets and displays To qualify as a data field, two attributes are required of the structure concerned: all the terms it contains must be named, and all the relations it holds must be shown. If the terms are named but the relations are not shown, the result can be depicted as a net of empty squares. Each square has a co-ordinate reference, being at the crossing of two terms whose names may be used for the purpose; but we cannot tell what these names mean within the field, because their associations with the names of the transverse terms are not available. On the other hand, if only the relations are shown, the result is the (nameless) display of units of the field. In this case, the memberships and non-memberships are clear, but we do not know how to refer to the terms which each relation joins together. However. since the net of names and the display of units fit together to make the data field, we have a remedy. If we address questions to the field, and not to the net or the display, alone, we obtain answers. Here is a display (figure 4), taken from our example of a data field, together with the net which can be applied to it and the
subfield formed by the two together. The split between display and net is important in data study, since it is related to the distinction between the problems of representation by language, coding, subcoding, notation and the like, and those of embodiment in cards, sheets, tape and other vehicIes handled by the many devices by
Figure 4.
Display
Net
Subfield
which a data field may be manipulated. This distinction is reflected in the later chapters of this book. We may now show how a data field may be used to answer a question. Suppose we are given a feature name, and that we possess a field of which that name forms a part. The name will direct our attention to that portion of the field which is occupied by the display of data units which makes up the name's feature. These in turn will show the items the feature possesses and, by way of these, the names of the items concerned. Transversely, we may use an item name to enter the data field and to find a set of features. Types of question
In the above example, we mention one type of term, a feature (it could well have been more than one feature), and we have to find the other. Let us call such a question a 'feature question', naming it from the type of term specified in it. Transversely, if a question names an item or items and asks for features, the known type of term is an item and we ask an item question. These two types of question may be known, collectively, as 'term questions'. They call, as a general rule, for one extension of the data field to be scanned, in order to find terms which are in the transverse extension. We could, however, mention both types of term in a question, as when, for instance, we ask 'is Tom cheerful? or 'are Dick and Eric discerning and energetic? Here we mention one or more data units and ask about their state. Although we use names which appear to be concerned with the present state, these names in fact
refer us to ambiterms. To ask 'is Tom not-cheerful?' or 'is not-Tom cheerful?' or even 'is not-Tom not-cheerful?' are all ways of directing us to the same data unit. Let us call such questions 'unit questions' no matter how many units occur in them, just as we call term questions by that name whether they mention one term or many. Clearly, unit questions may name data units, runs, or subfields, calling in each case for a scan of all the units named in order to determine their state. It is, of course, possible to ask a question which mentions no names whatever, since the whole of the data field is concerned. An example is 'which items have three present features?' Such a question may be called a 'field' question. Unit, item, feature and field questions all play a part in the handling of data fields. The study of which types of question are asked of collections of information may be named 'question analysis', and on this our next few chapters concentrate. Summary
The fundamental structure which is dealt with in data study is the data field. This consists of two sets of terms, these being items (the things we wish to represent) and features (the characteristics of these things). Items and features are related to each other by reciprocal membership, an item being a set or class of features, and a feature a set or class of items. Each term divides the set of transverse terms into two parts, one consisting of terms which possess it and the other of those which do not. Items form one extension of the field, and features make up the other. The crossing of two terms forms a data unit, present if the two are members of each other and absent if they are not. Presence and absence are relative, in that we can always choose which of two complementary terms in the same extension of the field to take as present. They are states and every term is made of a set of units, all of which are in one state. Pairs of complementary terms ('ambiterms') exhibit both presence and absence, and are therefore
entities whose state is intermediate between the entirely present and the entirely absent. The state of an ambiterm, and of any other section of the data field, is that of the collection of data units it contains. A data field can be taken apart, into a display of units, and a net of names which is applied to the rows and columns of the display. Subfields which are of one state can collapse into single data units. Questions may be given names according to the type of structure they specify. Thus we may meet with unit, item, feature and field questions. The unit questions may mention single units, or runs of these, or subfields. Item and feature questions merit special attention.
Questions
Imagine, for a moment, an index of documents about chemistry. The documents are items, the features are their qualities, and the data units are the associations between the two. If a research chemist asks whether a given report mentions a given chemical, he asks a unit question, specifying both an item and a feature. If he names several different documents and asks if they all have a particular set of qualities he mentions a net of names which together refer to several data units, which is another unit question, of the subfield type. Suppose, however; he makes a more general enquiry, simply to find out what is known about a given report. In this case, he asks an item question, naming the item, the document, and leaving its features unstated. He cannot mention these, for he does not know them. Similarly, if the chemist names certain characteristics and calls for the documents, unknown to him, which possess them, he asks a feature question. Feature questions are often met in this context of searches through technical literature. To take another example, a marriage bureau is trying to arrange a meeting for one of its clients. Her name is Gloria, and she is looking for a husband who is tall, kind and easy to handle. First, the bureau may ask a feature question : who is male, tall and kind ? The answer turns out to be Ted. A unit question follows: is Ted easy to handle? The answer is yes, so as a beginning, the bureau decides that Ted may suit Gloria. But will Gloria suit Ted? Ted wants a slim, blonde and vivacious wife. The bureau asks an item question :what does it know about Gloria? By good fortune, she has these desirable qualities. It remains to arrange an introduction, and so a subfield question is called for: are both Ted and Gloria free on Saturday, and do they live within easy reach of London? These two examples assume that the knowledge sought is already available, having been discovered and recorded. However, the types of enquiry we may make also include those propounded in the course of research and experiment. When Mendelteff developed the periodic table of elements, he used it to forecast the properties of
substances then unknown. To these he gave names, such as ekasilicon, ekaboron, ekatantalum. He was then able to frame feature questions, such as, 'look for an element which has atomic weight 72-6, specific gravity 5.5, and is dirty grey in colour; it is ekasilicon'. When this element was found (it is now called germanium) it proved to have properties very like those predicted - atomic weight 72.6, specific gravity 5-47, colour greyish white. Item questions are also asked in the course of research, such as, 'here is an unknown substance - what are its features ?' Often such questions are answered unit by unit: for example, is this substance toxic? is it acid? does it contain carbon? will it dissolve in water? Here again we meet the inherent graininess of knowledge. Fundamentally, it comes to us a unit at a time, in quanta of information not unlike the quanta of energy appearing in sub-atomic physics. Data vehicles
Outside our memories, information is recorded on many different vehicles - card, punched tape, magnetic tape, photographic film, paper, plastic. The vehicle carries the record, and the record is the physical change brought about in the vehicle to represent the information. Thus, in Babylon, the vehicle would often be a tablet of baked clay, the record would be the marks of the stylus upon it, and the information might be the Epic of Gilgamesh. Today, the vehicle might be a packet of soap powder, the record would be the printing upon it, and the information would be its name, its price, and the weight of its contents. The separate elements of a record on a data vehicle may, as with the tablet of clay, be called the marks. Examples are notches, slots and tabs in or on the cards of card indexes. The basic properties of a data vehicle depend on the part of the data field it displays. It may represent a unit, a run, an item, a feature, a subfield or the entire data field ; and it may show a binary pattern or one which is unary (normally, in this case, present). If a data vehicle represents a unit, we may call it by that name. A data
Figure 5 . A typical punched feature card on an illuminated vlewing frame. The card represents a characteristic and the positions upon it represent the items which possess it. It is binary: holes indicate presence and blanks indicate absence when it is fully prepared. The card is corner-cut and tabbed for vertical-visible storage, although others may be square-cut. Capacities may be 10,000 items or more, and the numbers may be laid out in squares of a hundred, as shown here, or in rows across the card. The latter is more usual for machine-read indexes. Figure 6 detail of the punched feature card.
handling system whose rules arrange for the vehicles to be of this type may be called a unit index. If, for the moment, we think of each vehicle as a card, such an arrangement will form a unit card index. It will normally be unary, making use of one card for each present unit, and showing absence, as well as lack of information, by the lack of a card. If each card bore the name of an item and a feature and also a mark to show whether the item did, or did not, possess the feature, then the index would be binary. A missing card would then uniquely show lack of knowledge. If a vehicle represents a feature, or an ambifeature, we may call it a feature vehicle. Again we may take it to be a card. The vehicle is then a feature card and shows, if unary, the items possessing the feature or, if binary, both the items which possess the feature and those which do not. A punched feature card is binary. On its face it bears a series of numbered positions, one for each item in the field or subfield, and these are punched if the item concerned possesses the feature which gives the card its name. This set of holes represents the present units in the feature field, and the set of unpunched positions, assuming that there are no spares and no
Figure 7. A transfer sheet, in this case based on the data field taken as an example in chapter one. The items have here been put down the side, as rows, and the features have been allotted to the columns. There is a column for each of the independent features, but features A and E have been taken to be mutually exclusive; they therefore occupy one single column. In use, items are described row by row as they are encountered. In due course, the feature vehicles are
unknown cases, represents the absent units. As a general rule, the name of the card is the name of the present feature, but the card, clearly, shows an ambifeature. By contrast, if a feature card or other vehicle is unary, it carries a list of the items possessing the feature, but gives no news about those which do not. It is usually plain, a word we may here use to mean 'unpunched'. It could, alternatively, list those which do not possess the feature, omitting those which do. This is usually felt as an equally present arrangement, non-possession of a feature being thought of as possession of the complementary feature. For example, absence of eatability is felt as a very present quality in such a remark as 'this dinner is inedible'. This ability to think of every unary feature as present makes it possible for us to forget absent features for much of the time, and to work only with the present variety. This is not unlike the way in which the symmetry of the field makes it possible for us to work in one dimension only, and to leave the other aside - since it behaves just the same as the one with which we are concerned. To turn to item vehicles, these too may be unary or binary, and represent items or ambiitems. Examples of unary item cards include those from personnel record systems, of the sort which list the details of each employee on a plain card without notching or punching. They also include abstract cards in libraries, case-history cards in hospitals, and similar vehicles. Punched item cards, on the other hand, are generally binary; the unpunched positions show the features the items do not possess, and the cards represent ambiitems. Subfield vehicles list many items against many features; occasionally they may cover the entire field in both its extensions. They occur in many varieties, a typical example in everyday life being a grocer's bill. The items are listed line by line, and their features date of purchase, quantity bought, product bought, price per unit, total price and so on, appear column by column. A bill of this sort is deceptively simple for it telescopes the feature extension so that many features (many prices, for instance) occur in one column. It
prepared by reading off the information about each feature down the appropnate columns. In this example, the names of the terms are not underlined: presence is taken for granted, and absence is shown by the lack of any mark, as cheerfulness is shown as absent in William.
can do so because the prices are mutually exclusive in respect of any given item. We shall deal with this at a later stage. Another use of a subfield vehicle is for the transference of information which reaches us item by item into a form which can be taken off feature by feature, as with the 'transfer sheet' used in connection with many punched feature card indexes. Tags and forms
To convey information, a data vehicle must link an item to a feature. Before the link is made, the vehicle is often a blank form, prepared to accept information but not completed. Examples are questionnaire forms, unused cards prepared for punching, and unfilled petty cash vouchers. At times, a document may carry one name only, for example, name tickets on bedding-out plants, placecards at a dining table, ancient Greek ostraka, raffle tickets, price labels in a shop window, or conference delegates' name badges. Such tags may represent things, or be attached to things. For example, to tie a price ticket to a jar of pickles attaches a feature to an item; the string represents a present data unit. From this point of view, a printed but unused feature card, intended for punching, is a form. With the feature name on it, ready for punching, it is an unattached tag. With the holes in it, it is at last a data vehicle. The actual method of marking a form to turn it into a data vehicle may be of any degree of complexity. Anything which makes a detectable difference in the vehicle will do, from a simple jotting in a diary to a collection of bits in the memory of a computer. Such bits are of two types, normally two different magnetic states. Since a computer only uses two such types while writing uses many, we may say that the jotting is by far the more complicated, even though we may read it with ease. Yet, complicated or not, the jotting does not all for a machine to handle it. Many of the feelings of awe aroused by large data handling installations are due to their machinery, which seems mysterious, voracious and large in scale. It swallows cards or tape with impressive speed, and
its ministers are familiar with microseconds. Experience of its use, and the reduction in its size which results from technical progress, together with simpler rules for its operation, will remove this feeling. Meanwhile we may note that these devices are governed by principles no different from those which control the simplest homemade index of colour slides or ice-cream recipes. Method and equipment: software and hardware
The data vehicles, each bearing the records, and the equipment for storing and moving, marking and interpreting them, form one important part of a data handling installation. This part, the equipment, is often known as the hardware. By constrast, there is the set of rules whereby the information embodied- in the equipment is arranged, collected, processed, extracted, which is known as the software. Hardware and software, equipment and method, compose the installation. Even the smallest index has both, although the equipment may be as simple as a sheet of paper and a pencil, while the method may be so obvious that it is never written down. Normal and transverse indexes
Making use of our standard words, we may talk of item indexes when the rules call for information to be arranged item by item, and of feature indexes when they call for a feature by feature arrangement. This nomenclature helps in the choice of informationhandling methods and is easy to understand, but a clear warning must be given. A feature index is not, repeat not, necessarily embodied in feature vehicles, nor is an item index necessarily embodied in item vehicles. If indexes are, in fact, embodied in this way, we may call them 'normal', but if they are embodied in vehicles representing terms transverse to those which place the vehicles in order, we may call them 'transverse' indexes. Thus if item cards are kept in an order dictated by some of the features on them, they form the index which is transverse. Further, it is a feature index.
The method for the job
Let us now draw some simple conclusions from the analysis we have made so far, restricting ourselves to normal item and feature indexes. To do this, we may suppose that our typical data field has been broken up into ambiitems. Of these there are five, numbered from 0 to 4. Each corresponds to a column in the diagram, and we may think of these columns as pieces of card, bearing holes where the present data units are shown. The columns thus represent punched item cards. One of the most usual ways of embodying a data field is to use a set of these. We may think of such cards with their holes as standing for all those edge-notched, slotted, bodypunched, machine-punched, or needle-sorted methods in which the separate data vehicles represent items (binary, because the unpunched positions show the features the items do not possess). If we now ask an item question, for example, 'what is known about item 3? the answer is easy to find. If card 3 is in its place, we need only pick it out to learn that item3 possesses features A, C and D. After a short test of this nature we should return the card to its place. This provision is important, for speed of answering a first question may be bought at the price of leaving things in such a mess that the second and subsequent questions can hardly be answered at all. Fair comparisons between indexes and their various embodiments in equipment are hard enough to make at the best of times, and cannot be made at all unless matters are kept in good order. Only if the position of a card does not matter at the start of a search is it reasonable to care nothing about where it is replaced. Now let us ask a feature question, still of our item cards. Let us ask which items possess the feature E. This is a different matter. A glance at the data field shows that item 0 is the only one possessing E_, but in practice, in a real search situation, we cannot glance at the field. We can only look at the individual item cards, making certain we examine every one. In the present example, four-fifths of our effort is wasted, since only one fifth of the items possess the feature. Many methods are available for scanning large numbers of item
Fzgure 8. Answering a feature question - a search by scanning a set of cards. The cards are shown as tabbed, but the same principle applies if punched cards are passed one by one under the sensing device of a machine. Some punched cards (slotted and edge-notched) can be interrogated simultaneously, in quantities up to several hundreds.
Code: Tall 9 D a r k Handsome m m
cards rapidly, or for handling packs of them in such a way that all the cards are interrogated simultaneously. Although they help to make light of this work they do not get rid of the need to contact every data vehicle. Suppose, however, that the data field were split into rows instead of into columns. The punched cards would represent ambifeatures. To find out which items possess feature E_ we would consult one card only. The question is a feature question, and the minimum number of card handlings required, namely one, appears to arise, in this case, when the data field is embodied in a feature system. On the other hand, this alternative division of the data field makes it hard to answer an item question. Instead of just examining card 3 to find out all we wish to know about the item it records, we must examine every feature card in the collection. Of each we ask, is item 3 recorded on this card? Once more, we must look at five cards instead of at one.
We conclude that questions are related to indexing methods in the following way: normal feature card indexes are likely to answer feature questions with ease, and item questions with difficulty; normal item card indexes are likely to answer item questions with ease, and to be inefficient when feature questions are propounded. Later, we shall see how this conclusion is affected by the existence of feature indexes made of item cards (transverse feature indexes) and the transverse. This relationship between questions and vehicles can be expressed in another way. If item vehicles are arranged in item order, making a normal item index, or if feature vehicles form a normal feature index, then to ask an item question of an item index or a feature question of a feature index is to work from the known to the - unknown -from the type of term mentioned in the question to the other type, which we seek. On the other hand, to ask a feature question of a normal item index, or an item question of a normal feature index, is to work from the unknown to the known. We must search through all the data vehicles to find those which carry the transverse terms we have specified. The conclusion is that it is not, in general, efficient to ask a transverse question of a normal index. We have already imagined term cards (item or feature cards, cards representing terms) as slices of the data field, punched where the data units are in the present state. If this method of embodying a data field in data vehicles is used, then to stack the cards representing any set of terms reveals, in the form of holes passing through the stack, the set of transverse terms which the normal terms possess. Thus to find all items possessing features B, C and D in common, we stack the punched feature cards concerned (as if they were strips from the data field), and scan the top of the resultant stack. It is clear that any hole passing through the stack must represent an item of the type required. In the present case, item 1 will be shown. Stacking is the fundamental operation in normal punched feature card indexes, which are almost exclusively designed to answer feature questions. It is obvious that punched item cards can also be stacked, the
Figure 9.Answering a feature question - a search by stacking punched feature cards. Item two is shown as possessing all three qualifications.
.
Handsome
Dark
1
Tall
G
2
/a3 result being an answer to the question 'which features are common to these named items?' However, this question seldom appears, and so punched item cards are rarely stacked in practice. Instead, the holes in them are used as an aid to serial or simultaneous search and selection, or as a means of instructing printing or other devices, as we shall see later. When the holes are used as an aid to search, they are a slow means of answering feature questions because every card in the index has to be interrogated. U n i t questions
By contrast with term questions, we have already met unit questions, in which at least one of each of the two types of term or
ambiterm is mentioned. A single unit question specifies a single one of each; a 'run' question mentions one of one type and many of the other, and a subfield question mentions many of each. If our data field is represented on item or feature cards, a rule of economy can be applied to these questions. Since we have named all the terms we need for the answer, and know where to go to find the vehicles, or the positions on vehicles, representing these, it will usually be best to employ the type of card of which we need the smallest number. Thus if we ask whether Jack is bashful, discerning and energetic, to use our example of a data field, we ask about the run B3, D3 and E3. If we employ feature cards for the answer, we must look at three cards, but if we employ item cards we need consider only one. Yet, in making this point, we must not forget the claims of other circumstances. Cards may not be designed for rapid inter-card comparison. The cards of which we need fewest may be dficult to reach, and many other practical difficulties may in a real-life situation make our theoretically best method unusable. It is, of course, possible to represent a data field by a set of single unit or of subfield vehicles, but a field always contains more of these than it does of items or of features, so such methods are prolific of cards or other devices. Thus our five-by-five example of a data field contains twenty-five data units, leading to the use of this number of unit vehicles if all are recorded, and to fourteen if the absent units are treated by omission. An ordinary small index of two hundred headings and one thousand documents would contain two hundred thousand units, and a large index would contain millions, that is, the number of ambiterms in one extension times that in the other. Numerous as units are, they are rare compared with subfields. Our five-by-five field contains one five-by-five 'subfield', ,the field itself. It contains ten five-by-fours, twenty-five four-by-fours, one hundred three-by-threes, not to mention the three-by-twos and the four-by-threes and the four-by-twos and the three-by-fives, and so on. Yet it has only five items and five features. It is clearly unreasonable to try to handle subfield questions by means of an index
which carries a vehicle for every possible subfield. It is far better to make the subfields required, as they arise, by bringing items or features together according to whatever mechanism is available. Incidentally, if we stack punched feature cards, the through holes together with the stack show a fully-present subfield: all the items which possess all the features. Our example of a data field has twelve such subfields, not counting 'subfields' of one item or feature only. Specific and generic terms
It may be helpful here to consider what is meant by a single term. We may consider a single feature. In an index of people, the feature 'boy' is clearly such a term, and yet single or not, it may be represented as made of other features, of 'masculine' and 'child' for example. These, too, appear to be single, and yet one at least of them can be broken down further: 'Child' may be represented as a combination of 'human' and 'juvenile'. The common-sense conclusion seems to be that every one of these features is single. At the same time, we need some means of distinguishing between these different types of term. Let us call 'boy' a term which is relatively specific, and 'masculine' a term which is relatively generic. The generic terms combine to make up the specific ones. 'Child' is an idea which is intermediate between 'boy' and 'masculine', while 'schoolboy' is even more specific than 'boy', incorporating the ideas 'masculine', 'human', 'juvenile' and 'pupillary'. In the specific feature 'boy', to substitute 'adult' for juvenile' produces 'man'; to switch 'masculine' to 'feminine' produces 'girl'; to exchange 'human' for 'equine' results in 'colt', and so on. A similar effect can be seen if items are considered. We may take Mr MacTavish, Mr Lloyd-Jones, Mr Flanagan and Mr Brown to be members of a village community, in which, as individuals, they are not as specific as they are when they become a group. Mr Brown and Mr Flanagan may constitute the Parish Councillors ; add Mr MacTavish and they may become the Trustees of the Village
Memorial Fund; exchange Mr Flanagan for Mr Lloyd-Jones and they may be the Village Hall Committee; take all four, and they may be the Officers of the Cricket Club. Thus, changing the normal composition of a specific term makes a difference to its meaning, yet adding or subtracting transverse terms from those listed against it (as possessing it, or as possessed by it) seems to makes no difference whatever. To add or to subtract a number of boys from a group of people who all possess the feature 'boy' makes no change in the significance of the group of terms 'human-masculine-juvenile' which go to make up 'boy'. The question is, can this be reconciled with our earlier statement that features define items and the transverse, so that an alteration in either cannot help but affect the other? As an answer, let us make a distinction between a data field which is 'self-determinate' and one which is representational. All that we know about a self-determinate field is learnt by direct inspection of it. In this case, the items define the features and the features define the items, and no more is to be said on the topic. On the other hand, the terms in a representational field are only part, and often a very small part indeed, of the terms whlch exist outside it. From childhood we spend a part of our lives disentangling terms from the booming mass of significance with which we are surrounded, giving names to them or adopting for them the names given to them by our fellows. In due course we attach these names to the items and features of our information-handling installations. The terms in these are readynamed, and the names relate the parts of the terms which are within the bounds of our data fields to the larger parts which are not. In this case, the terms have been distinguished from each other outside the data field, and therefore no term is affected by what happens within it. Terms whose removal makes no difference may be called 'separable', while the others are inseparable. It is to these inseparable terms that the ideas of generic and specific apply. Such terms are those by which the data vehicles are arranged. Changing their constitution forces a change in the transverse terms listed against
them, although the reverse is not the case. The greater pattern of meaning, the collection of all available terms, in the world as a whole, may be called the 'holotheme'. We shall return to a study of it in chapter 9. It includes relations as well as terms, but of that, more later. Names for cards
We may return to the names of data vehicles. The organisation and methods officer, the information scientist, and others whose regular concern is the handling of data, will know item cards by that name, having already met the subject. The ordinary travailer in the toils of industry, commerce or the professions is more likely to have met them under a trade name, or a name derived from the things they represent or from the type of literary composition they carry. Thus we find them known as personnel cards, stock cards, accident cards, customer cards, patient cards, invoice cards, applicant cards, job cards, supplier cards and material cards. They appear as abstract cards, summary cards, record cards, and bear such names as Keysort, Findex, IBM, Bull, Kardex, Synoptic, Paramount, ICT, Roneodex, Ekaha and Remington. Plain or punched, tabbed, corner-cut, tumble-printed, in one colour or in many, almost all of them are designed to represent objects or occurrences. By contrast, feature cards have few names taken from the types of feature they represent or from the items recorded on them, but they make up for this by possessing a plethora of technical titles. In the literature we find them called aspect cards, peephole cards, peek-a-boo cards, inverted-indexing cards, coincidence cards, keyword cards and term cards,(in these pages this last name is used in a wider sense). They go by such trade names as VISIscan, Brisch-Vistem, Termatrex, Keydex, Selecto, Viroc, Delta and K-H cards. If such cards are not punched, but bear ruled columns to make it possible to list the reference numbers of items upon them, they may be known as Uniterm cards.
Figure 10.
in a normal index, t o ask
...
a n i t e m question . . .
. . . a feature
question..
.
. . . of
a set o f f e a t u r e cards
calls for a search through all the cards
callsforcomparison (usually by stacking) of a few cards only
. . . o f a set o f i t e m cards
callsfor comparison of a few cards only
callsfor a search through all the cards
Cards, indexes and behaviour
To fish in these populous waters, the angler must be able to translate all the above names, and more besides, into one or other of the two simple names 'item card' and 'feature card'. This done, he will know what sort of fish he has caught, and how he may expect it to behave. Thus, if he intends to construct a normal index, he will carry in mind the table in figure 10. By contrast, if he means to make a transverse index, for instance, one in which items are stored according to their features, he will bear in mind a different table. This can be interpreted most easily if we imagine that the cards represent items, one card per item, each card filed according to a code number which represents a particular collection of features. The contrast between normal and transverse indexes is a complicated matter, not made simpler by the possibility of duplicating the cards and filing each copy in a different position. We shall study it in a later chapter. Meanwhile, the table is shown in figure 1 1. The adjustments to be made to these tables in practice will, in part at least, appear later in these pages. Here. we may simply note that they show how the name of an index is related to the type of question it is best designed to answer. Whether normal or transverse, feature indexes fit with feature questions, and item indexes with item questions.
Figure l I.
in a transverse index, t o a s k .
. . . of a set of feature cards
. . . of a set of item cards
..
. . . an item question.. .
. . . a feature
callsfor cons~derat~on of cards occupying known places In the Index
callsfor a search through the cards to flnd one representlng the feature
callsfor a search through the cards to find one representing the item
callsfor consideration of cards occupying known places in the index
question
.
Summary
The structures we find in the data field can be used to give their names to the types of question we ask of bodies of information, and to the sorts of data vehicle on which we record this knowledge. Further, they can be used to refer to the order in which these documents may be stored. When this is done, we find that a set of relations is highlighted, connecting the types of data vehicle and of index with the varieties of question they may be expected (all other things being equal) to answer most efficiently. If we choose one extension of the field to be the normal extension, the other is transverse. Normal data vehicles are then those which represent the terms which appear in our normal extension. If they are stored normally, they form normal indexes, but if they are kept in an order which is dictated by transverse terms, then they form transverse indexes. Indexes take their name, not from the type of vehicle used in them, but from their storage order. Thus a transverse feature index is, because it is transverse, composed of item vehicles, but because it is a feature index, these are in featureby-feature arrangement. As a first approximation, we may say that feature indexes are suited to answering feature questions, and the transverse. The information-handling installations, the indexes, of which these vehicles form a part consist of method and equipment,
software and hardware. The method includes the arrangement chosen for the data field and the rules for manipulating it. The data fields embodied in the equipment may be self-determinate, but are much more likely to represent something in the outside world. The items and features which appear in them may in this case be thought of as taken from the 'holotheme', the collection of all more-or-less different external concepts, the whole encyclopaedia of ideas whose relations form our collection of knowledge. They may be generic or specific in various degrees, but a term is still an individual term, no matter how generic or how specific it may be.
Logic in the data field: complementation
It will now be helpful to concentrate for a while upon features. By the symmetry of the field, the whole of our argument could apply if we concentrated on items, but the results are more familiar if we do not. We think of a feature as a particular collection of items, which we have called a class or a set. We have displayed this as a row of present data units in our diagram of a data field. An ambifeature has been shown as a complete row, containing both present and absent units - two complementary sets, a run across the entire field. More often than not, concentrating upon features (by definition, present) leads us to forget that each one carries with it, as a sort of shadow, the complementary absent feature which completes the row. We could, of course, think especially of the absent features, in which case we might well forget the present shadow which accompanies each. The operation of moving from a present to an absent feature, or the reverse - complementation -is provided with quite a number of different symbols. Here we shall adopt the lemma, ; thus we may write -A to mean taking, or consgering, all the absent data units which together make up feature A, or (which is effectively the same thing) taking the absent set of the pair which make up A. If weuse a double-shafted arrow to mean 'results in', we may say A*A. In language, the word 'not' and various other negatives are available to stand for this opersion (in chapter one we used the word 'non-artfulness' to refer to A). Sometimes, however, an action and its result are given the same symbols, as if -A=A. This confuses operations with the conditions which precede them or result from them, and is not a safe procedure.
-
-
The universe and the plenitude
It is high time that we had a title for all the items there are in the field, taken as a whole, all together. Many such words exist. The
statistician talks about his population, the (engineering) storekeeper refers to his stock or his inventory, the museum curator discusses his collection, the librarian his library. No doubt the snipe-keeper talks about his wisp. Here, we may reasonably take the word 'universe' to refer to all the items. It is familiar in logic, where the phrase 'the universe of discourse', meaning all the things of which we are speaking, is time-hallowed. It is a curious fact that, although there are so many words to signify the item extension of a data field, there are very few to refer to its feature extension. Let us adopt the word 'plenitude' for this. Then we may say that a data field consists of a universe of items and a plenitude of features. Its capacity in data units is the number of the one multiplied by the number of the other. Intersection
We have already met the operation of looking for all items possessing a number of features in common. We visualised it as carried out by stacking punched feature cards and finding the reference numbers of the holes passing through the stack. The particular case was a search for items possessing features B, C and D. We could, instead, have thought of a normal item card index, in which we looked at every card for the compresence of marks or holes representing the three features concerned. However, as in many other cases, the logic is more easily seen by the use of the feature cards. Forming the intersect of two classes or sets such as features means directing our attention to the set whose members, in this case items, are members of both the sets which are intersected. The symbol for the operation is n , and recalling that we found item 1 to be the only item common to B, C and D we could write BnCnD-1. Intersection is far and away the most frequent logical operation when a collection of items is examined in order to find those with a given collection of features. In passing, we may note a connection between the generic and
the intersected. If we intersect the specific feature 'boy' with the specific feature 'woman' the result is the more generic feature 'human', which is common to both. If we continue intersecting s p e d c features in this way we achieve features of increasing genericity, and in the end we are left with features that we cannot intersect at all. Any attempt at this results in emptiness. We can use the word 'fully-intersected' as a synonym for 'fully-generic' in consequence. Union
Another operation of logic is union, the formation of a set whose members are members of at least one of any number of other sets. Its symbol is U; we may exemplify it in our data field by taking set A, which is a union of sets B and D. If we look for items possessing either feature D or feature B or both, we find that the set consists of items L 2, 3 and 4, those of set A. If we write U to mean the universe, then we may use our notation to write AuA-U. SimiIarIy, if w_e use P to stand for the plenitude, we have such formulae as 4u4-P. A search for a union of features can sometimes be more complicated than a search for an intersection. Much depends on how the features are represented. As an example, we may take AuC. To find the items which possess A or C or both we must first find those possessing A, which gives us those having A by itself, and also those having A-with-C. To these we must add those which possess C: alone. These are, of course, the ones defined by CnA. We are in for a double search. However, we may observe that the requirement to find the set made by AuC is a requ*em_ent to reject only those items which are in the_set_made by AnC. We could therefore carry out the operation AnC, reject the resulting items, and accept the rest. This effect, that the complementation of a union of sets is the same as the intersection of the complements of the sets concerned, is known in logic and set theory as one of de Morgan's laws. It can
I I
Figure 12. Two ambiterms, P and Q, thought of as blobs and rings on a transparent background, being conjoined. If the blobs represent presence, we see that the union of P i t h Q and the intersection of H with Q occurs at the same time.
be seen easily in our example of a data field, by imagining that we superimpose, say, ambifeature D on ambifeature C. We think of these ambifeaturesas shown on strips of transparent paper, so that the blobs and rings on whichever feature is beneath in the stack can show through whichever one is above. In this case, if a blob coincides with a ring the blob will show. Nothing but the coincidence of two rings will show a resultant ring in the completed stack. It is obvious that stacking in this way unites the set of blobs and intersects the set of rings at the same time. This type of effect would also occur if we were to look for the
I
!~
l
I l
l
union of absent features by rejecting the intersection of present ones, but then the blobs would have to stand for absence and the rings for presence. This is the other de Morgan law. Such laws are but one example of many, all arising from what is called the 'principle of duality for the algebra of sets'. This is not the place to pursue them further, but making diagrams of stacks of features taken from a data field is one of the ways of showing how these laws work. Another is the use of Venn diagrams, which are briefly described in the next chapter. At the time of writing, logic is not widely taught, if it is taught at all, by the use of punched feature cards, but their employment often helps us to see without difficulty what happens when sets and classes are manipulated in this way. Just as we may use the word 'intersected' as a synonym for 'generic', so we may use 'united' as a synonym for 'specific'. A specific feature is quite obviously a union of more generic features. However, we often think of a union as a cementing-together of terms whose distinct personalities remain visible in the result, while a specific is a mixture in which these are lost to view (we saw in the last chapter that 'boy' feels like a fusion). From time to time, in these pages, we shall meet these two ways of thinking of the same action. They arise from the contrast between separable and inseparable terms. When collections of terms are united and used to order transverse terms, they can, as collections, be given names. These, as it were, 'fix' the collections. The names are then thought of as 'generic' or 'specific', or somewhere in between, according to the number of terms they represent. Identity and quantity
We have taken a term's identity to be the collection of transverse terms, out of the universe or the plenitude, which it possesses. This identity is interlocked with a counter-identity, its complement, the two forming an ambiterm, which we have thought of as a pattern of data units. It is possible to look on this pattern as a relation between an identity and the whole collection of data units across
the appropriate extension of the field. Perceptually, we often seem to do this, as when staring at a black-and-white chequer we see a black arrangement against an all-white background, and then a white one upon a whole field of black. Many operations of logic are carried out on identities or patterns. We have seen complementation moving our attention from a present to an absent identity and the reverse; and we have seen union and intersection operating simultaneously, one upon presence and one upon absence. The procedures of arithmetic, on the other hand, deal with numbers, amounts, ratios. Thus, corresponding to identity, we find quantity - how many present units are possessed by a term, for example, as opposed to which ones it possesses. Corresponding to pattern, which relates identity to the whole extension of the data field, we find the ratio of the quantity of the term to the number of transverse terms in the field. This we may call its density. Identity and quantity, pattern and density, are properties of terms in data fields, and therefore depend on the way we have standardised the fields concerned. Thus, if we switch feature B ('bashful') from present to absent, its complement ('brash') becomes present, and the pattern of data units in (for example) items 2 and 3 becomes altered. Instead of there being three present units in each, one will rise to four and the other will fall to two. Yet these two terms will be the same as before and will only be described differently. It is often possible, by a course of complementation in both extensions of the field, and of merging identical meanings, to reduce the number of present data units (the ones which are visually those recorded) very considerably. The task is not often undertaken, for it depends on a knowledge of the field, and this is seldom available at the start of a data handling operation. However, when it is possible, it may be very labour-saving. Our example of a field can, for example, be reduced easily to a field of only four present units. The method is shown in chapter 12. The ideas of identity and quantity may be used to subdivide the item and feature questions we have already discussed. We may ask,
l
l
for instance, which items possess a given feature (posing an identity question) and we may ask how many possess the feature (posing a quantity question). To ask which items possess a given feature is to ask for the collection of present data units which forms that feature. It is a feature-identity question. To ask how many items possess the feature is a feature-quantity question. We may also ask item-identity and item-quantity questions. This method of naming a question consists in placing the stated type of term first, and the attribute required of it second. This is helpful in question analysis for it relates the types of question to the types of data vehicle, and then states which type of operation (counting or identifying) is to be carried out. With this in mind, we may turn to the problems of quantity. Statistics in the field: addition and subtraction
The operations of arithmetic upon numbers have similarities to those of logic upon sets. Subtraction, though it differs from complementation in some ways, also has affinities with it, and has still more affinities with 'relative complementation' - forming the set which remains when some set is taken away from some other set. Earlier in this chapter we wrote that A*A. This was elliptical. We omitted to mention the universe, for which we had no sign at the time. The universe, as a constant background to our thought, could indeed be omitted with no great loss. However, let us put it into the formula, making our complementation relative to it, and writing U-A*A. Now we can compare the logical and the arithmetical formulae, recalling that our universe has five items, and that so far as A is concerned four of them are present and one is absent. Parallel to U A*A we have 5 - 4 = 1. We might even use underlining and bars to r s i n d us we are dealing with present and absent items: 5 - A = 1. To consider union: here we have affinities with addition, but the comparison is close only when we deal with mutually exclusive sets. However, A and A are clearly mutually exclusive (possessing no
-
-
membersjn common) and so we may write, as parallel, A u A = U a n d 4 + 1 = 5. The operation which compares with intersection is multiplication. Here we meet a little complexity. To tackle it, we may start by considering the subfield CD0124 as a field of its own. Multiplication
In this field of four items and two features, each feature occurs in two of the items and is absent in the remaining two. The ratio of C to the universe is therefore and so is that of _D. We now consider the chance of an item having both features if the possession of one makes it neither more or less likely that it will possess the other. Every item has a chance of one in two of possessing either feature. We multiply the chances (+ X = $) to arrive at the chance, one in four, that an item will possess both. The intersected feature we may name C D does, in our example, appear in one of the four items only, namely item 1.For intersection to be analogous to multiplication, we must multiply densities, ratios, rational numbers, not mere integers or cardinal numbers. Further, the intersected sets must be related to a given universe. We should notice that the multiplication of likelihoods, densities, in this way provides us with an answer which assumes that the meanings concerned are not related to each other. If we tried the same trick on the subfield ABCE24, working, for a change, in the item extension, we should find as before that each meaning (2and 3,occurred in half the possible cases, both being present in A and B and absent in C and E. The intersection of 2 and 4 contains two features: A and B, but the calculation of X = leads us to expect only one. The fact is that numbers are essentially general; they may show us what to expect, but this may differ from what we observe. A great amount of statistics is concerned with finding when things are different from what we expect. Returning to the feature extension, we see that stacking two punched feature cards not only intersects the present features, the
+,
+
+ + +
I
I
l
I
U
I I
result appearing as a collection of holes passing through the stack, but also compares the chances. We know how many holes to expect, by multiplying together the densities of holes in the two separate cards, and we may obtain a rough estimate of this by any means which stacks the cards randomly to each other. An example is afforded by sliding them out of register by one row or one column, so that a different set of holes from the lower car4 shows through the upper. If we now slide the cards back into true register we find ourselves returned from a density which is a sample of what may be expected at random to the density of holes which exists in practice. Comparing what we expect with what we find to be the case gives us a measure of the statistical significance of the association between the two features. The principle can be taken a good deal further, and safeguards have to be inserted when gradual density variations occur across the entire width of the cards. This example, however, is sufficient to show the importance of density when interpreted as likelihood or chance. By the symmetry of the field, stacking punched item cards relates the density of the features in one to that of the features in the other; but this is a relationship in which we are seldom interested. Moreover, since the holes in such a card may often be a part of an arbitrary subcode, the result is virtually meaningless. In item systems, embodied in item cards, the type of density variation in which we are interested is usually dispersed throughout the entire pack. This is what is meant when it is said that the statistics in item card indexes are invisible. A final point, arising from the fact that we multiply ratios to obtain the mathematical analogue of intersection, is that we may represent the numbers handled by addition and subtraction as ratios also. Since the denominator in these is alwavs the number of terms in the appropriate dimensions of the field, it is a constant and can be ignored. However, if it is inserted, we achieve a more homogeneous notation.
Questions in practice
A typical feature-identity question begins with the words 'who' or 'which'. It then, usually, goes on to specify the set of items to be searched. Lastly, it names the features which the items must possess in order to be accepted. 'Which books are about mediaeval French tapestry?', 'Which salesmen are in the office and free to drive the visitor back to the airport?', and 'Which spare part will serve as a replacement for this discontinued component?' are all questions of this type. It is possible to be deceived into thinking they are item questions because they happen to mention books, salesmen, spare parts, all of which are items. But the mention of these is simply a statement of the items in the data field to be studied. The named terms in the questions are features. When we identify these features, we find the items which possess them, and thus we are led to the answer we seek. Thereafter, we may proceed to learn about the items, or to carry out whatever other operation is needed. We may read the books, ask one of the salesmen to act as chauffeur, or supply the customer with a satisfactory replacement. A similar situation is found in the case of item-identity questions, typical examples of these being 'What are the common factors in all these crimes ?, 'Which qualities are possessed by all these successful business men? and 'Which features enable this collection of butterflies to survive in these adverse conltions?'. These mention items and ask for the features forming their identity. Mediaeval battlecries ('Who's for Saint George?'), Parliamentary customs ('Who goes home?'), street calls ('Wha'll buy my caller herrin'?') . . . pose identity questions of the feature sort. Such questions lie at the heart of an entire body of literature - detective fiction. Feature-quantity questions ask for such information as the number of people who paid for admission, the number of cotton two-piece dresses in stock, the number of new houses available for allocation to suitable tenants. They may seek the frequency of accidents of a given type, the likelihood of patients given a specified treatment recovering health more rapidly than those gwen other
,
treatments. Item-quantity questions are concerned with the number of features a particular item possesses. Search
When we scan a stack of punched feature cards, or pass a series of punched item cards under the sensing head of a search device, we ask each position or card in turn whether it represents a present item in a subfield whose features are specified in the question we are answering. This is the case whether the scan is for the purpose of identification or for counting. We assume that the search is for data units in the present state. Thus, if we look through our data field for all items possessing features A and B, we may think of our search as a scan of the runs ABO, AB1, AB2, AB3 and AB4. Such a scan is in fact a series of questions, each question sharing one part (the features) with its peers, and changing one part. The constant part always represents the known factor; the changing part represents the unknown. In each case we wish to find the state of the display in the run mentioned. If this is entirely present, the item is accepted. In the current case, this occurs with 1,2 and 4, since the nets AB1, AB2 and AB4 all have fully-present displays. 'Does the coffee bar close at midnight?', 'Do these four curves pass through the origin?', 'Is dinner ready? and 'Are peers and lunatics debarred from voting at Parliamentary elections and from serving on juries? In every case we mention a net of names and seek the state of the unit, run or subfield of which it is the framework. The state need not be entirely present or entirely absent to be important to us. For example, we might find the question 'Are the guests and the dinner ready? answered by the news that the guests were ready but the dinner was not yet available. The pattern of this field is yes-no, and its density is a half. The pattern is very significant, for we need to know which half of such a question is in which condition. If the guests are not ready it may be possible to make them hurry up, but if the dinner is not ready we may
Figure 13. The instruction panel of an 80-column card sorting machine. Here the constant part of the search question is set up on the panel, and the variable part is represented by the cards which pass one at a time through the sensing devices, which check .whether or not a hole exists in the position specified.
have to rack our brains to find a means to keep them happy until it is. At other times, the identity or pattern does not matter but the quantity or density is important. We may imagine a case in which things are being selected according to the number of desirable qualities they possess. If a thing has more than three quarters of the total possible, it may be accepted, rather like an examination paper in which it does not matter which questions the student gets right, so long as he gets more than fifty per cent of the answers correct.
Figure 14. A punched feature card reader and arithmetic unit. The reader, on the right, carries a medlum-capaclty (5,000-item) punched feature card, although readers for larger cards are available. The arithmetic unit, on the left, counts the impulses generated by the reader and carries out various operations upon them. Two repsters can be seen, displaying totals and other figures.
From such a question as this it is a short step to a field question, such as an enquiry as to how many items have ten features each, or as to which are in this condition. Field questions, as a general rule, call for a complete scan of the whole field, and at times, they call for such a survey to be continuous. Such questions, though we have not placed any weight upon them, are by no means trivial. The heart of every science is the patient gathering of facts about unknown fields whose items and features are not, or are not completely, identified. The centre of
every industrial control process is a surveillance of a field in which change, wherever it is met, must be countered. In neither case can we name, in advance, a special object of study; we must look at the whole, serially or simultaneously. This applies to learning and to control situations even when all the information is fully known to someone else, though not to ourselves. We are faced with a display of units, and we need to know the net into which it fits. In the case of control, for example, we must find out which item or feature has altered as soon as a change in the state of a data unit has been
observed. We must match the display to the names and act accordingly. Summary
Two main types of operation are possible in a data field -that which acts on identities, and that which acts on quantities. There is a correspondence between the two, and there are also differences between them. Operations of the calculus of sets or classes, fundamental in set theory and logic, deal with identity and with that relation between a set's identity and the whole extension of which it is part, which we call pattern. Operations of arithmetic and simple statistics deal with quantity, and with that relation of quantity to the whole of which it is part which we have here called its density. Identity and quantity are qualities of terms, and we may divide term questions according to whether they ask for a term's identity or for the quantity of it which occurs in the field. Much data handling, especially in information retrieval, is concerned with identity, while much work in statistics is concerned with quantity. The difference between the two marks a division between disciplines which often goes very deep.
Figure 15.
The use of diagrams
Few books on logic, classification, mathematics, and the many related disciplines which contribute to data study present their subject without visual aids. Graphs, tables, charts, and diagrams appear in their pages, representing in spatial layout the abstract relations with which the books are concerned. It is helpful to compare the main types of diagram employed in order to see how different presentation of the same material may benefit different sciences, and to recognise this material in its various disguises. In logic, in the algebra of sets, the diagrams may show the relations between collections of items, the unions, complementations and intersections we have already met. In the field of traditional or hierarchical classification, the diagrams may be concerned with the formation of significant codewords or words of plain language, providing means of naming the items we index, or of finding them without the use of special cards or equipment. In statistics, they take the form of numerical tables, bar charts and other methods of showing the relations between quantities and densities. In many applications of mathematics these relations are shown by graphs of many types. To help to demonstrate these visual patterns, we may take two features from our example of a data field, together with all its items. We may imagine the items to be birds - ostrich, penguin, sparrow, gannet and duck - and the features to be the ability to swim and the ability to fly. Let C_, then, signify aerial, and let D signify aquatic. The new data field, a subfield of our principal example, now looks like figure 15.
Figure 16.
In this field, item 0 must be the sparrow, which can Ay but not swim. Item 5 must be the penguin, which m swim bat cannot fly. Iten 4 is the ostrich, wpable of neither action. Items 1 and 3 are thus the duck and the gannet. Venn diagrams
The 10giGian John V m employed a m m s sf showing how sets or classes may be related to each o~her.The diagrams which bear his name are made of circles, each of which is taken to contain all the items in a given c1sss. In the present case, two circles are required, one for tbe faturn: C and one for Q, these being the names af the two cksm of item with which we, are mncerned, The result of
intersecting these classes, C_r\ll,is represented by the juxtaposition CD, an intersection of circles, forming a-Gape we may call a lens. The area outside both circles is named CD, being the intersection of the complements of C_ and Q. The remaining two parts of the diagram, two lunes, bear the names CD and CD. Each of these contains items which possess one of the features but not the other (figure 16). In the diagram above, the reference numbers of the items concerned are shown, in brackets, after the names of the relevant parts of the picture. It thus appears, for example, that the duck and the gannet occupy the central lens. The ostrich is outside both circles, and the result of the union CUD is shown as the area encompassed by both circles taken together. It pens in the duck, the gannet, the penguin and the sparrow, each of which can either fly, swim, or carry out both of these operations. Tables
The four parts of this Venn diagram can be shown as four cells in a two-by-two table (figure 17). We may treat C and D as two dimensions in this table, each dimension having two values: entire presence and total absence. In the example which follows, C acts as a title to the columns and D acts as title to the rows. Two versions are given. In one, the names of the items are given in the appropriate cells; in the other, their total number is given. This second case is obviously a typical statistical table. We shall return to dimensions later in this chapter. To relate this to practice, we may note that the top left-hand cell in the table is obtained, in a punched feature card system, by stacking card C on card D, forming the intersect of C and Q as a set of through holes. The left-hand table shows which holes these are (1 and 3), and the right-hand table shows the result of counting them (there are two). In a punched item card system the results arrive from selecting cards which carry holes representing both C and D. The left-hand table shows which cards these are, the right-
Figure 17.
_C
C
Unions
D_
1,3
2
1,2,3
D
0
4
0.4
0,1,3
2,4
Unions
0,1,2,3,4
_C
C
Totals
D -
2
1
3
6
1
1
2
Totals
3
2
5
hand one shows how many. The other cells in the tables are filled in by similar methods. For instance, in the case of punched feature cards, the top right-hand cells are concerned with items punched in card D but not in card C. To move from tables of this variety to many-valued tables is just a step. Instead of taking two cases only of each dimension (all there and none there), we take a larger number. We interpolate intermediate amounts of the quality concerned, such as degrees of ability to swim and degrees of ability to fly. Many sets of features can be treated as dimensions in this way, an idea familiar to physicists, engineers and others, whose disciplines make use, for example, of tables and graphs plotting temperature against pressure, &stance against time. Lattice diagrams; inclusion
Various relations between sets or classes may be shown by means of lattice diagrams. Such a diagram may be &r_awn, in the present example, with the absence of both features (CD) at its head. From this point, two lines lead downward to t&e two sit~ationsin which there is one feature but not the other (CD and CD). From these; lines continue, meeting in the presence of both features (CD). The result is a single diamond shaped figure (figure 18).
Figure 18.
A diagram of this type displays both states, absence and presence. If we prepare a diagram showing presence only, the place at the head of the diamond will be empty; and we may represent it by the symbol gi. If we set an all-absent lattice beside the all-present one, we see that we have in effect taken our original binary diagram apart. In the all-absent lattice the empty set appears at the foot, and the full set is at its head; in the complementary lattice the opposite is the case, and the full set, consisting of C_ and D, is at the base. The two together form the lattice already shown. The two unary diagrams are shown in figure 19. These diagrams show features ordered by inclusion. We have encountered a serial situation, the empty set occurring at one end of a series, sets of one feature (or, in the absent lattice, of one absent feature) occurring centrally, and sets of two features, present or absent, appearing at the other terminal. We must examine this idea of inclusion. In our lattice of present features, every symbol is included in the more complex symbol or set of symbols to which it is connected by a downward line, while in the case of the absent features the line is upward. Thus D and C a_rebothjncluded in CD. Also, pi is included in D and C (and in D and C ) following the
Figure 19.
famous rule that the empty set is included in every set, much in the same way that every number is itself plus zero. This sort of inclusion is also known as 'containing' or as 'proper inclusion', the word 'proper' ruling out the possibility of a thing including itself. The logical symbol for proper inclusion is c ; we may write C c CD to mean that C is properly included in CD, or we may turn the sign round, to write CZ)3 C, meaning that CD includes C_. Any union of sets or classes includes all the sets which form it. Continuing our comparison of logical and mathematical symbols, we may compare 3 with >, the symbol for the idea 'is greater than'. In the plenitude, there are two features in the set CD, one in the set C. The formula 2 > 1 can be placed beside CD 3 C. Two uses of the symbols CD (and of other pairs of symbols mentioned above, such as CD, for example) should be noted. The first is their use as a name for a pair of features. From this point of view, as we have seen, CD is the name of a union of features. It is the result of performing the action CUD, this result being seen in the plenitude as feature C_ together with feature D. The second is the use of the symbols as a name for a set of items. We interpret this, not in the plenitude, but in the universe - the items which are both C and D are items land
Since the items which are C_ are items Q, 1 and 3, while those which are D are 1,2 and 3, in the universe CD is the name of an intersection, the name of the set of items which are in both C and Q. To unite present terms in one extension of the field is to intersect present terms in the other. This has a connection with the discussion of generic and specific terms given in chapter 3. Since the intersection of present terms is the union of their absent complements, it appears that the union of present terms in one extension of the field is the union of absent terms in the other. The symmetry of the field sees to it that the intersection of terms in one extension is the intersection of complementary terms in the second. and Z) looks like a The lattice for three features (say, 21, transparent cube, each feature forming one dimension, so that, for example, all the occurrences of 3 (by itself, and as part of XY,XZ and XyZ) appear at the corners of one face (figure 20). The reader may care to work out the attractive transparent solid representing a lattice of four features.
3. -
Order
In a lattice diagram we have encountered the idea of order, of there being some rule whereby we can place meanings before, or side by side with, or after, each other. In the diamond shaped lattice, for example, D came after 8, was side by side with C, and preceded CD. Governed by a relation of inclusion, it could do nothing else. Order of this type is known as partial order, and it permits the side-by-side situation to exist. A more stringent form of order is simple order, and in this case, the entities which are fitted into a sequence cannot be put side by side; each, except possibly the first and the last, must go between two others, one of which precedes it and the other of which follows it. An example of this is the series of integers, ordered by their relative magnitude. The result is a special type of lattice we may call a chain. If we know the meaning of the symbols, it is easy to arrange
Figure 20.
XYZ
7, 3 , 6 , 4 and 5 in ascending order, but even if we know the meaning of the letters of the alphabet, as representing certain sounds, we cannot arrange them in order. They have a conventional order which must be learnt before this is possible. Nevertheless, both numerals and letters possess an order which we know, and this can be used as an aid to finding items and features when these are represented by names or numbers. Many ways of displaying parts of the data field, besides a lattice, make use of the properties of order, either what appears to be a natura1 sequence, as with the numerals, or one which is conventional. Names and descriptions
If we encounter a gigantic ruddy-whiskered barbarian it wil1 probably not surprise US to learn that his friends cal1 him Big Red Hairy. The title is significant - it is a description, referred to an item. If we were told to pick out Big Red Hairy in a crowd we would probably succeed. On the other hand, if Hairy's name were Fred, and we were told to pick out Fred, the problem would be greater. We would have to discover which was Frederick, probably by asking until we were told. The name 'Fred' is neither a description nor significant, and is arbitrary. Let US turn the argument round, and suppose we set out to find the feature which three known people possess in common. Normally there would be no great difficulty in this. The 'Tom - Bob George' feature would fairly soon be discovered, by the use of the descriptive title of 'Tom-Bob-George'. It might be the possession of blue eyes, of fair hair, or the wearing of spectacles, or the sufferance of hiccoughs. On the other hand, if we were to look for a feature of which we knew nothing but the arbitrary name, the problem would be much more difficult. Someone would have to tel1 US the meaning of the name, either translating it int0 a name we knew or describing the quality by relating it to things possessing it, or even by ostensive definition, pointing out examples of it as Fred was pointed out.
In the long run al1 titles are names, or are made of names, which have to be learnt. When those for one type of term have been memorised, they may be used as descriptions for the other. We have already made use of the words 'normal' and 'transverse' to refer to different types of index, and we can use them also to refer to different types of name. Then a normal name is arbitrary and a transverse name is descriptive, using the arbitrary names of the transverse extension of the data field to make meaningful names for the extension we take as normal. In a transverse feature index, for example, the vehicles represent items, but they are stored in an order determined by their features, and so these features act as descriptive names for them. Thus 'transverse' and 'descriptive' are related ideas, and so are 'normal' and 'arbitrary'. Arrangement
We may take the words 'aerial', 'aquatic', and so on, and list them in simple alphabetic order : aerial aquatic non-aerial non-aquatic
Such a use of the alphabet to generate order amongst words is so common as to ment scarcely a passing thought. It has its problems, however, which we shall meet in due course. In particular, it gives immediate order to words, not terms. We may contrast it with another form of arrangement which, so far as the terms are concerned, is equally high-handed. In this second case, we do not use the chance orthography of the fortuitous names we conventionally give to the terrns we handle. Instead, we take the terms themselves and place them in the order of our choice. We begin by deciding to put capacity to swim before capacity to to fly. There is no reason for this in the present instance; it merely shows the absolute power we have over the situation. In practice, however, there might be a good purpose to be served by choosing a
particular order for what we rnay now cal1 the arrays of features. Within each array, we decide to put absent features before present ones. Again, this is quite arbitrary, although in practice it rnay be useful to choose one particular arrangement of features within an array, and less useful to select another. Remembering to insert headings to make the list easier to consult, we arrive at the following: ability t o swim: non-aquatic aquatic ability t o fly: non-aerial aerial
To this arrangement we rnay now apply a positional notation. Let us take the symbols O and 1, using them in two different ways; first, as referring to the headings, and second, as referring to the features beneath each heading. These symbols have an order derived from the numbers they rnay also be used to represent: O first, 1 second. These numerals rnay be placed on the page in an order which reflects our arrangement of headings and features. When two numerals appear side by side, the first represents the heading, and the second the individual feature, grouped with its fellows under that heading. We then have: 0: ability t o swim 0 0 : non-aquatic 0 1 : aquatic 1 : ability t o fly 10 : non-aerial 1 1 : aerial
This is a typical, though very simple, coded hierarchy of features, which we rnay study before concerting it to a classification of items. In doing this, we must look further into the matter of coding.
Coding
We have already distinguished the names we apply to terms according to whether they are normal or transverse. We may als0 distinguish them according to whether they are words of plain language, or codewords. In general, a codeword is made of one or more characters, of letters, numerals or other symbols, which together represent the term concerned but do not form a plainlanguage word. Occasionally, however, the title 'codeword' is used to mean a word of ordinary language used to signify some quite unrelated idea, as 'Operation Corkscrew' may refer to some secret, projected military manoeuvre. This usage is consistent with the above definition if we say that a word of plain language, to qualify as such, must mean what the dictionary says that it means, so that a word with a hidden meaning is (in relation to that meaning) just a set of characters, or letters, which happens to be pronounceable. In our example of a data field, 'aquatic' is a word and 'D' is a codeword. Both, of course, are names. A code is a list of the characters available for forming codewords, together with a set of rules for forming them and often also with a list of the plain-language names of terms, related to the codewords thus formed and allotted. Coding is thus a form of translation, from plain language to code language; decoding is the reverse. Coding consists of allotting a codeword to a term, or of recording such a codeword against or in place of a term or a word representing that term. In the sphere of information handling, it often consists of representing terms by means of symbols, or characters, which can be sensed by a machine or simpler device, or can be translated, in their turn, int0 slots, notches and the like, for this purpose. Like words, codewords may represent terms which are generic or specific, and may represent unions of generic terms, intersections of specific terms. As an example we may take from our data field's set of names the codeword CD, referring to the union of ability to swim with ability to fly. This codeword is composed of elements, C and D, each of which has a meaning of its own. In general, we
l
rnay use the word 'element', in respect of a codeword which represents several terms, to signify such a unit of meaning, even if, are in its turn, it is broken down into characters. Thus C and elements of CD, but if we now treat GIJ not just as a union but as a more specific term then matters alter. Let US celebrate the change by coining a word for the purpose. A thing which can both swim and fly is to be called an ichthyopter, or fish-wing. If we continue to represent this term by CD then C and Q are merely characters. We rnay put brackets round the codeword, thus (CD), to show they are now a single element. In passing, we rnay note that a card representing a duck is an item card if ability to swim and ability to fly are united features, but it is a unit card if they are fused into a single specific feature, which is the only feature recorded against the item.
o
Commutative and positional codewords
Since CD means exactly the Same as Dl, we rnay call this codeword commutative; we rnay change round the positions of its elements without altering its meaning. This arises directly from the commutative quality of the data field. To see the Same effect in the case of words, we rnay examine again the problem of finding Fred in a crowd. Our problem is not made easier if we know that he comes of the family Nerk. We must still find Fred Nerk, out of al1 the Freds and out of al1 the Nerks, none of whom we know. Clearly, Fred and Nerk are more generic or intersected, and Fred Nerk is more specific or united. Both are normal. The phrase, if we rnay so call it, Fred Nerk is also commutative, since it means the Same as 'Nerk, Fred ',as the person concerned might be called in a telephone directory. Let us now arrange terms from our list (aquatic, aerial and the rest) in pairs, so that the first term of any pair is concerned with ability to swim, and the second with ability to fly. In each case, there are only two possibilities - absence of the feature concerned, which we rnay syrnbolise by the zero, 0, and its presence, which we rnay symbolise by 1. We rnay thus generate four codewords which rnay be normal, referring to features, or transverse, referring to items.
Each codeword is made by choosing a feature from each of two arrays of features, whose order of appearance is important to US. We may, for example, know al1 the features which the symbol 1 may mean, but if we do not know which 1 we contemplate this is of little value to US.Order ensures that we know. Here are the codewords: normal 00 : non-aquatic, non-aerial 01 : non-aquatic, aerial 10 : aquatic, non-aerial 11 : aquatic, aerial
t ransverse
00 :ostrich 01 :sparrow 10 : penguin I l : duck, gannet
Classification : cascade diagrams
'Classification' is another word of many meanings, a word with which one can became so familiar that the switch from one of these to another goes hardly noticed. Let US, however, try to standardise a meaning for it. Let US take a classification to be an arrangement of items according to their features (or the transverse) such that transverse positional codewords can be generated to stand for the terms thus placed in order. If the codewords are not both positional and transverse, we do not have to deal with a classification. An alternative way of expressing this is to say that a classification is based transversely on a hierarchy. A hierarchy which generates a classification can be represented by a cascade diagram. To make such a diagram for our collection of birds, we start with the universe of al1 five of these, and place the letter U, to symbolise this, at the top of the display. Beneath it, we draw two lines, descending to places imagined to be occupied by those birds which cannot, and those which can, swim. At the lower stage occupied by these places, we write the ambifeature D, the places themselves beinga division of U, by means of D, into an array of features and D. Many special words are in use in connection with such a cascade.
o
Figure 21. A relation between a cascade diagram and a Venn diagram: the shadow forming the Venn diagram is shown projected on to a plane at the bottom of the drawing. To show this effect, the usual order of the symbols at the bottom of the cascade has been altered: instead of AB, AB, AB, ÄB we have AB, AB, AB, AB.
In librarianship, for instance, D may be known as 'the divider of the first order'. Here, with our eyes on a positional notation, we may cal1 it the (binary) divider of the first position. If we now divide both and D by C , then C is the divider of the second position. In our diagram, we may show C and D at the side, as the names of the steps, positions, in the cascade, and we may use O and 1 to show the routes through the cascade. The codewords resulting then appear at the foot of the diagram, from left to right in numeric sequence, figure 22. Thus we find yet another pattern which can be made from the part of the data field we have studied in this chapter. It appears very different from a Venn diagram, a statistica1 table, or a lattice, but it is really no more than the Same basic structure in a new disguise.
D: ability to swirn
C: ability tofly
O
l
1
I
O
I
1
I
01 O0 non-aquatic non-aquatic aerial flightless
10 aquatic flightless
11 aquatic aerial
ostrich
penguin
duck gannet
sparrow
Arrays
Most classifications of items use more than two features at each stage of the division of their universe. Their dividers are not binary, but ternary, quaternary and so on and often they are decimal. Such sets of features, mutually exclusive and collectively exhaustive, we have begun to call 'arrays'. Mutual exclusion between their members and collective exhaustion of the possibilities are important qualities of arrays. The first ensures that none of the transverse terms being handled can be possessed by more than one normal term, and the second ensures that every transverse term nevertheless has a home in the array concerned. The arrays of features used in classifications of items make 'partitions' of the entire set of items,
al1 items which possess the Same feature being equivalent to each other in this respect. Thus a partition in one extension of the data field corresponds to an array in the other. In our example of a data field, features A and E form an array whose corresponding partition of items is 1234/û; and items Q and 2 form an array whose partition of features is CE/ABD. When features are mutually exclusive, the items possessing them form a disjoint set, and transversely. To show an item array, we may imagine a translation agency which needs a set of translators who, between them, can handle ten different languages. Len is proficient in three of the tongues, Jack knows another two, and Peter can handle the remainder. Between them, these three people form an array which makes a partition of the languages. Al1 languages are spoken, and no language is spoken by more than one person. One is reminded of the Tower of Babel. Within such an array of items, and in many arrays of features, no order can be detected. We could obviously invent an order, such as by arranging the linguists by age, by length of experience, or in alphabetic order of their surnames, but in themselves as items they carry no order. They are a mere aggregate; in other cases an order is manifest. For example, in a dimension such as length or in some other scale such as wavelength, mass, hunger, honesty, number of corners or resourcefulness, the features (measurements or groups of these) fa11 int0 appropriate positions. Rank and sequence
It is useful to employ the word 'rank' to refer to the place a term occupies in an array, whether that place is allotted by an identitive rule or by an (apparently less arbitrary) quantitative one. Also, we may use the word 'sequence', to mean a series of names or of elements of codewords, chosen so as to make a positional codeword or p h r a s e . codeword like 01 for t& sparrow is a sequence, whereas CD, its equivalent, is not, for DC means the Same thing. We may think of these as 'indifferences' if they are commutative.
Every possible way of arranging the terms in an index can be the basis of a hierarchy, a cascade, forming a collection of positional codewords. For example, using our example of a data field, and treating every feature as binary, symbolising presence by 1 and absence by O, we rnay allot the sequence 11010 to item 2. It so happens that here there are two features which are complementary, as we already know, A and E. To omit E from the codeword makes no difference to our information. Item 2 is wel1 enough described as 1101. Whenever features which are part of an array are scattered it is helpful to bring them together in this way. A good deal of space can be saved in the codewords if this is done. We rnay note that, if every feature in an index can combine freely with every other then al1 can be thought of as present, being members of binary arrays whose absent members are not admitted. Our example of a data field is of this type if E is omitted, but with E in place we find that the absent member of the binary featg-eA (namely A) has been let in, being christened E and with E A. Graphs
We rnay conclude this chapter by returning to statistics and taking up again the concept of a dimension. Many dimensions occur in statistica1work. Anyone who produces lists of the possible answers to questions in social surveys encounters them continually. AU the possible dates of birth of the members of a population form such an array. The dates rnay at times be 'grouped' (an example is 'everyone born from 1900 to 1904 inclusive'), and in this case the dates falling within the extremes given are al1 treated alike. For calculations, they rnay al1 be given the value of the centra1 measurement of the group, in the present case, 1902. Treated as 'forcibly orderless', they fonn an aggregate, but the force must be exerted, so they are an aggregate of a rather phoney variety. A very simple two-dimensional graph rnay be made of our two binary features C and ~'(figure23). It has only two points on each axis: 0, which becomes the origin, and 1.
D: ability to swim
Here, if we quote the D axis before the C axis, then we have the ostrich occupying the point 0,O; the sparrow is at 0,l; the penguin is at 1,O, and the duck and the gannet are at the point 1,l. We have entered the world of ordered pairs, and are able, if we wish, to proceed from here int0 the vastness of number theory. Before we return, let us note that we have met these pairs before, without the commas between them, as positional codewords. If ability to fly varies and is not associated in the least with varied ability to swim, we shall find that the addition of more birds to our population results in our graph becoming dotted and spotted al1 over. Birds able to swim, but not able to swim very well, will occupy appropriate positions in the D dimension, and birds (like the hen) not exactly flightless but hardly as brilliant in the art as the eagle, will occupy places in the C dimension. If the two abilities are more or less related, the dots representing birds will crowd into a
Figuie 24.
C:aerial
D: ability to swim = 1 )
(D=o;~
density pattern. And if we could always tell, from knowing how well a bird swims, how well it flies, the graph would be in the form of a line, the typical textbook graph. Bar charts
As a last exarnple of a representation of our part of a data field, we may look at a bar chart. Here, too, several dimensions may be displayed, but, as before, we consider two, C and D. We also take just two values of each dimension, 1 and 0, as before. On an appropriate base line we raise two rectangles (bars), whose height corresponds to the number of instances (birds) of each value. Since two birds cannot swim, the rectangle above the value O is two units high. To tackle the other dimension, the separate bars are divided proportionately. Two of the swimmers can also fly, and one
cannot, so, the bar which is three units high above the value l on the D axis is split two-to-one. Charts of this type are frequently encountered in statistica1 descriptions. For example, the national income over a period of years may be shown as a set of bars whose varying heights show its growth, and each bar may be split according to the purposes - consumption, capita1investment and so on - to which the income was devoted. Meanwhile, here is the bar chart for the birds (figure 24). With this, we may leave the exploration of the ways in which the data field may be shown. The reader who wishes to pursue the bar chart further will find that it turns, with more bars, as a genera1 rule, int0 a histogram, and that as the bars become thinner and greater in numbers the top of the histogram becomes a smooth curve, and we are back again to a graph, the sort which displays a single line. Anyone who is interested in calculus will find that many textbooks explain integration, which is the finding of the area under a curve, by way of summing the areas of rectangles such as the bars we have here described, allowing the rectangles to become narrower and narrower and allowing their numbers to increase until their tops follow the curve required. Summary
There are many different representations of a data field. Besides the blob-and-ring pattern with'which we have become familiar, we find Venn diagrams, cascade diagrams, lists, bar charts, lattices, graphs and statistica1 tables. In some of these the terms in the field are treated from a quantitative viewpoint. Such representations are useful in statistics. In others the viewpoint is identitive. The names of terms may be words of plain language, or codewords, made of characters such as letters or numerals whose meaning is not clear unless the user is in possession of the code - the rules for translating the codewords int0 plain language. The codewords are positional if the meaning of their characters depends on the position of these characters, otherwise they are commutative. The
smallest part of a codeword which represents a term may be called an element, although an element may consist of more than one character. Sets of terms which are mutually exclusive and collectively exhaustive form arrays, which may be calied aggregates if they cannot be placed in any order which does not seem arbitrary, and dimensions if an order of magnitude seems immanent in them.
Exploring the net
In chapter 1, the display of units was shown assembled with a net, the net consisting of the names of terms and the display of the sets of relations of membership which comprise the terms. In that chapter and in those which followed it, the data field made by this assembly of display and net was described and its behaviour was discussed. It will now be helpful to concentrate for a while on the net of names which enables us to refer to any term or terms with which we may have to deal. Information-handling equipment acts on the marks it encounters (the record on the data vehicle) the holes, blanks, letters, magnetism, or whatever else shows the terms to be dealt with. The equipment is set to react to particular signs, and reacts in the Same way to the Same sign no matter what significance we attach to the sign. This appears clearly when we look at trial or sample sets of punched cards to which various alternative meanings have been attached, as may wel1 be the case when the cards are meant to show several different possible uses for themselves and their handling devices. Thus we are often concerned with a series of translations - our ideas into plain language, this into codewords, and these in their turn into marks on data vehicles. The marks may then be translated yet again in many ways by data handling equipment. Some of these translations become so familiar that they go unnoticed. We think in language, and when we become sufficiently used to a code we may think in codewords, which become a private language for us and the few who also know the code. In much of what follows we shall be dealing with the interplay of the written word, the spoken word, and methods other than sound or script of representing the meanings referred to by speech and writing. On the whole, speech will take a back place, and methods of representing the written word will take a more prominent one. But we should always recall that the principles employed in data handling could have been developed by beings whose language used neither speech nor writing, and that the most
important element of al1 is the term, whatever it is, to which our syrnbol, whatever it is, refers. Retrieval and translation
To retrieve an item (or a feature, or any card or other vehicle representing one), we must know the place it holds and it must be in its place. In saying this; we interpret 'retrieval' to mean picking out something when its whereabouts is known. If we know nothing but the something's name, then it must be in a place determined by that name. For example, if we know how the name is spelt, it must be kept with its peers in dictionary order. Otherwise, we must be provided with a means of translation. In effect, the place a thing holds is always one of its names. Typical aids to translation are name-to-number lists. Others are lists of words against each of which a preferred synonym may be placed, and yet others are the language-to-language dictionaries we meet at school. The names of items and of features may both need translation into the preferred codewords of a data handling system. Description, search and identification
However, we may be in the sad case in which we do not know any name whatever for the term or terms we need. In these circumstances, we must seek it by way of a descnption. Typically, the problem is to describe items by way of their features but of course the symmetry of the field makes the transverse als0 possible. Descnption is the act of bringing together transverse terms in the appropnate logica1 relationship (complemented, united, or intersected), to permit search. Search is the utilisation of the assembled terms to scan a data field, or the appropriate part of one. It terminates trivially with the act of finding, and more importantly with that of identification, discovenng a unique name for the term, or one of the terms, required. From this point on, we may once more need translation and retrieval, or retrieval may be direct.
However, the data field may be so organised that the terms we may seek are placed in positions defined by collections of transverse terms, so that forming the collection (performing the description, the logic) is enough, search is obviated, and retrieval follows description. We have met such a transverse index, in our example of the birds in chapter 4, where 01 could mean the sparrow, and where going to position 01 was retrieval by the use of a transverse codeword. The study of such indexes is a main theme in this chapter.
A succession of activities A great deal of information handling is concerned with the succession of operations mentioned above: identify, translate, retrieve, describe, search, identify, translate, retrieve - the succession can continue repeating itself in this way. We may exemplify these activities by imagining an indexer using a pack of punched feature cards to find a number of individuals in a personnel records system. He or she may think and act as follows: identify translate retrieve describe search identify translate retrieve describe search identify translate
I need the ideas 'masculine' and 'personnel' . . . but they are known as 'male' and 'staff' . . . so I find the cards bearing these titles. . . (we may take them to be in alphabetical order) and stack them . . . and look for the coincident holes. . . which are in positions 32 and 457. . . which refer to Jim Robinson and Patrick Lucas. . . (we take it that these are listed against the numbers in a register) co I find their record cards . . . (we again assume these to be in alphabetical order) one at a time. . . (a single card is a stack of one) and look for their features individually . . . which are as follows . . . from which I deduce. . .
I
The succession rnay terminate in many places, rnay branch, rnay turn to statistics if we count instead of identifying, rnay be abridged, but does not change its order. A transverse index makes it possible to step directly from the 'describe' stage to that of retrieval, cutting out the stages of search, identification and translation. Learning a system's preferred language and always working within it reduces or cuts out translation, even when other economies cannot be achieved. The switch from one extension of the data field to the other occurs at the 'describe' stage. There are two such stages in our example, so we rnay deduce that we shall end in the Same extension as that in which we began. This is so: we began with features 'masculine' and 'personnel' - and we ended with them - the characteristics of the people in both these categories at once. Six indexes
We rnay now display six indexes, each recording the Same information in a different way. The field they handle consists of four items and four features, and each item has a different set of three of the features. This is a very unusual situation to meet in practice, but it is a useful example. As before, we use letters for the features: 1 AX123 BX013 C X 023 DX012
2
3
012XD 013XB 023X C 123XA
ABX13 ACX23 A D X 12 BCX03 B D X 01 CD X 02
4 01XBD 02XCD 0 3 X BC 12XAD 1 3 X AB 23 X AC
5 ABCX3 ABDXI ACD X 2 BCDXO
6 OXBCD 1 XABD 2 XACD 3XABC
The use of the reciprocal membership relation enables US to do without underlining to show presence: AB X 13, for example, tells US that there is a fully-present subfield AB13. If we were to treat the two features AB as a specific feature and the two items as a specific item, then (A4)(13) would be a unit.
We recall from chapter 4 that, if any feature in a set of features may combine with any other, then we are dealing with those which we have chosen to regard as present, from a set of binary arrays. This is the case here. We may also recall that the terms whereby data vehicles are ordered are inseparable, however specific or generic they may be. Altering them forces US to change the set of transverse terms indexed against them. In our six indexes, the lefthand columns are those which are in order, and therefore represent the terms which are inseparable. Each of these terms may be a name for a vehicle in a normal or in a transverse index, depending on the type of term represented by the data vehicles which cany the information given in each line of each index. Thus, in index 3, the inseparable terms are features. We have a feature index, normal if the vehicles (such as that which records AD X 12) represent features and transverse if they represent items. The terms in the right-hand column are separable. The features become more united, more potentially specific, as we move from indexes 1 and 2 to indexes 5 and 6, while the items become more intersected. Each index answers one type of question immediately. Each gives retrieval without search, identification or translation in respect of one special form of enquiry. That appropriate to index 1 employs a single feature, that appropriate to index 5 employs a set of three features, a much more specific feature, and that appropriate to index 4 uses a set of two items. Since the left-hand sides have been arranged in alphabetical or numerical order, somq other questions are also rapidly answered. Thus in index 5 a two-feature question is quickly taken care of if, and this is important, the features it names are the chief ones in its alphabetical order - features A and B. This is because, by virtue of the arrangement, the two unions of features which contain A and B come next door to each other. The Same arrangement ensures that questions which mention A alone are quickly answered. Suppose, on the other hand, we ask index 5 for al1 the items which possess features B and C. The unions which contain B and C as elements are separated, and we must go to the different places
,
concerned and see which items are recorded there. In effect, we have to intersect ABC and BiD to obtain BG, and since an intersecting device for features is a uniting device for items, the result is items Qand 3. This is the Same result as we would obtain if we were to go to index 1 and to unite features B and C. The more united items 013 and 023 would be intersected to yield Q and 3 as before. Pre- and post-co-ordination
In order to produce an index such as 1, we simply list the items against the features as they arrive. The last item, item 3, has features A, B and C, and so we place it against these. To find any item possessing a pair or a triad of the available features, we adopt a description and search mechanism. Index 1, indeed, could represent a punched feature card system, with one line per card, the top line corresponding to card A. Index 6 could be interpreted as an item card system, with its latest arrival given the number 3 and placed in the last position. In the first case, the mechanism is to stack the feature cards, and in the second it is to search through the item cards. Such indexes as 1 and 6 are often called 'post-coordinate', since the co-ordination of the features is carried out by a mechanism when the items have been put int0 store and are to be found again. By contrast, an index such as 5 puts the items in a storage position dependent upon the complete set of features they possess. These must be ascertained, as in the other case, when the item arrives in the index, but they must then also be co-ordinated, brought together in an appropriate order. Such an index is known as pre-co-ordinate. A classification of items according to their features is a typical instance of this, and indeed it can be seen that index 5 is a part of a larger classification. This larger classification uses the feature A and A, B and B, C and C, and Q and L), in its hierarchy. In index 5 we sëe simply that part Öf the result which consists of items with three, and only three, present features taken from these four arrays. Index 5 thus imitates the behaviour of a
hierarchy used to form a classification. The access to the answers is immediate if the questions contain the right number of features. It is common expenence that classifications call for an intersecting mechanism if the items required happen to possess features which are separated in the hierarchy, appearing in more than one branch of it, and the mechanism is usually an optica1 scan or a physical walk along the passages between the shelves on which the classified objects are kept. But what about indexes 3 and 4 ? These interesting half-way houses appear to be neither fully pre- nor fully post-co-ordinate. Should we call them medi-co-ordinate? The proponents of each of the extreme types tend to drift towards the other when modifying their indexes in order to remove any disadvantages found in them; there are, obviously, many varieties of index occupying the wide range between the two limits. The principle of the golden mean (or of ignoble compromise) may lead US at times to cry long live medico-ordination. S c h e d u l e s a n d manifolds: c o - o r d i n a t e indexing We have a word for the arrangement of features which forms a classification of items - a hierarchy ;but we have as yet no word for the list of features which governs a post-co-ordinate index of items. Let US adopt the widely-used name, schedule. This leaves US with a gap still to fill. A name is required for what corresponds to a classification in a post-co-ordinate index. Let US call this a manifold. Then we may say, a hierarchy of features forms a classification of items, and a schedule of features can be used to organise items into a manifold. When a schedule of features is used to make a manifold of items, we have what is known as a co-ordinate index. The name is not a good one, since it is usually meant to refer only to this method of organising information, and yet it is clear that hierarchies are als0 means of co-ordinating terms. Manifold indexing might be better, but 'co-ordinate' has a strong hold. Here we may return for a brief look at the table in chapter 2,
Figure 25.
Universe (yegetables)
I
I
l: turnips
0: carrots
r-'l
0: costly
1 : cheap
+
0: costly
l: cheap
showing the relations of questions to cards in a transverse index. Clearly, a classification of items is a transverse index, and the individual things classified are kept in feature order. If we think of them as having one data vehicle each, then a feature question asked of them calls for answer by studying the card representing the required set of features, or by intersecting several cards bearing the required set in common. These occupy known places in the file. An item question, on the other hand, calls for a search for the item, unless the index is accompanied by another, giving, item-by-item, the places in which the items will be found. Transformation and dependence
In a hierarchy, the order in which the terms appear is entirely a matter of how the net is arranged, of how we build up the codewords or phrases which refer to the united or specific features or items it organises. Making hierarchies of features, classifying items, is a specialised art with many mles of which a number are of particular value in making indexes compact. To illustrate this, we may contrast two simple classifications of vegetables. These could have many features in each array, but once again we may choose a binary case, which has the advantage of taking up little space on the printed page (figure 25).
Universe (vegetables)
I
I
0:carrots
I
I
.F f
i;
I
0: costly
I
l: turnips
I
l : cheap
Here we have a transformable hierarchy, in that every item possesses one feature from each array. If we changed the positions of the arrays, so that cost appeared first and the type of vegetable appeared second, the index would still be coherent. This, however, is not always the case (figure 26). In this new example, the arrays of the second position differ, depending on the features of the first position. We cannot transform the hierarchy because the arrays are not independent. In any order other than this, the arrangement falls int0 two. For exarnple, if we divide the universe first according to edibility, there is no place for carrots. Dependence makes hierarchical systems compact. At each step in the division of our universe, we choose for each branch of the cascade that array which appears to be the most useful. In this example, we decided that the cost of carrots affects our action but that their edibility is relatively unimportant, since we only use them to make noses for snowmen. On the other hand, as keen turnipeaters, edibility in this vegetable is all-important to US. Without using the trick of dependence, we should have had to use three positions in our hierarchy, making the resultant codewords half as long again as they need be.
N
"
, ,
c.
.*
..
Collapse
A further method of making hierarchies compact is to unite features in advance in each array. Thus we could have one single array of four united features, bearing the names 'costly carrots', 'cheap carrots', 'edible turnips' and 'inedible turnips'. In this case we should have to use four symbols, 0, 1, 2 and 3, say, instead of only two. On the other hand, there would be only one array. So long as we have a sufficiency of characters, this method has its advantages. Such union of features is an instance of collapse, which we met in chapter 1, the result being a more specific feature, shown by a single character in place of the original two. To show an instance of it in more realistic circumstances we may think of a manufacturing company coding its products, such as jugs, in respect of their qualities. There may be two styles of handle, three types of decoration, four types of spout, three sizes, two colours, two different bases, and so on. If a decimal notation is employed, we may unite appropriate features of the handles, colours and bases, thus gaining 2 x 2 x 2, namely 8, more specific features. In this case, two characters (the remaining ones of the available ten) wil1 go unused in this array. Another array could carry a union of sizes and types of decoration, 3 x 3 in this case, or 9 in all, leaving only one character unused. An array of twenty-four letters of the alphabet would enable US to unite, say, handle, colour, base and size. When the length of codewords is important, too long a codeword being too cumbersome to handle, such methods as those of dependence and of collapse are important. But we must recall that if we represent the characters in their turn by a binary code (consisting of no more marks than O and 1, in vanous combinations) we may gain little, for the collapse has been followed, in this case, by expansion again, each character being represented by several marks. This is an effect we shall meet again.
'
'
1
I I
Figure 27. The blazon of a shield, displayed as a hierarchy of features. In general, sections of the hierarchy which do not apply to the shield may be omitted. Thus if the field (in this case, the background) is plain, then the treatment (gutté, semée, poudré, and the rest) is just omitted, and the blazon proceeds to the charges (the fesses, piles and so on) placed upon it. The rules of blazon are complex and meticuleus, but they spring from a straightforward hierarchical pattem.
d'huil
I-
(on) a fess
-
de sang
/
(O?)
a pile
.
wavy
Or
argent
two
P plates
bezants
1
azure
gules
vert
sable
sernée
poudré
crusilly
fretty
I
de poix
I
d e larrnes
I
I
d'eau
I
purpure
I
I
I
(on) a saltire
(on) a chevron
(on.) a chlef
(on) a pale
dancy
nebuly
dovetailed
ernbattled
raguly
I
gules
I
four
I
vert
f ive
I
I
I
m
I
I
I
I
de vin
a cross
azure
I
errnine
I
sable
I
six
I
purpure
I
seven
I-
errnine m--.
eight
Obviating a search mechanism
l
1 11
/1 1
l! l
I
II
i!
1:
Hierarchies of features have been used with conspicuous success in the sphere of general librarianship. The items handled in this case are books; the features are those of their subject matter. In the most wide-ranging cases, this subject matter is no less than the entire field of human knowledge, the holotheme itself. Well-known examples are the related Dewey and Universal Decimal Classifications. Here, the holotheme is broken down into increasinglv united or specific sections, the codewords generated in the course of this activity being used as titles for places on library shelves open to the public. No special logica1 or search devices are employed, apart from a genera1 guide to the hierarchy itself, a set of labels on the appropriate shelves, and often, two sets of abstract cards, one forming a normal item index (the books arranged aceording to their titles) and the other a transverse feature index (the books arranged according to their descriptive code numbers). An 'author' catalogue may als0 exist, which could be a normal index of authors, their features being the books they have written, but, since the cards in such an index may wel1 be duplicates of the code and title cards, it is more likely to be yet another transverse index of features, the books in this case being arranged according to their writers. These card indexes are usually of the 'blind-filed' variety. Where the public is concerned, there are advantages in doing without a search mechanism, and the Same sometimes applies in other spheres. Classifications of engineering stores have been developed, with advantages ranging from variety reduction, space saving and the bringing together of similar parts to the provision of effective numbering for engineering drawings and effective descnp, tion for accountancy and similar purposes. In a totally different field, the College of Arms employs a hierarchical arrangement, and has done so for centuries, in the blazon of a shield bearing a coat of arms. The pattern on the shield is first divided according to its field, or background, and then according to certain major shapes called ordinaries, and so on to the most specific detail. Codewords
are not generated in this case, for the blazon itself is a long single sentence in the approved order without punctuation; but the principle is the Same. Acting as a model
In the biologica1 sciences, much work has gone into the study of classification, and the word 'taxonomy' is encountered, referring to the arrangement of living organisms int0 groups. Here, the attempt to place items in places which reflect those they occupy in the tree of evolution is prominent. This is not the only purpose for which the classifications of plants and animals are used. Retrieval of information, once the descriptive name is known, is clearly important. There is a special fascination in the way in which an evolutionary pattern, with species branching off it and developing on their own, and branching again, is like a cascade. Incidentally, the fact that it may be shown as a tree, with its roots in the past and its highest branches breaking the surface of the present, makes no difference to the argument; it is just another disguise which we could have added, had we wished, to chapter 4. The problem has been to formalise things, for nature has not provided US with neatly-marked positions or stages in the cascade and neat divisions into arrays. In general, the evolutionary sequence occurs within the arrays of the biologica1 classification, only the major developments being shown as jumps from position to position. There is no ranking order in the arrays since no code numbers are generated by the classifications. There is thus considerable freedom to add, to alter, and occasionally to subtract. The names of the positions (steps, stages, divisions or what-have-you), are familiar: kingdoms, subkingdoms, phyla, subphyla, classes, orders, families, genera, species. Here is one of the few cases where a special name exists for each position in the hierarchy. It is advantageous to possess this facility. In the decimal system we find names for positions also, these being known as thousands, hundreds, tens, units, tenths, and so on.
The result, at the foot of the cascade, at the specific end of the hierarchy, is a codeword for a number, which we can manipulate by the rules of arithmetic. Limitations and rigidities
As we may expect from our analysis, there are many limitations to a classification. These reduce the number of occasions on which it is the best answer to a data handling problem to a fairly smal1 number. First, there is a restriction resulting from the need to keep the number of positions in a hierarchy within reasonable bounds. A seven-position hierarchy results in a seven-position codeword, and this is often taken as the longest which can be handled with ease. In the biologica1 sciences this problem is avoided by starting the full use of descriptive phrases far down the cascade. Words of plain, though specialised, language are employed, and very large numbers of these are used, each assuming in itself al1 the more generic features which are relevant. Thus to say Taraxacum is to mention a particular genus of the family Compositae of the subclass Dicotyledoneae of the class Angiospermae of the subphylum Pteropsida of the phylum Tracheophyta of the kingdom of plants. Taraxacum, that is to say, is already a seven-position descriptor, a union of many features. Since no set of characters is large enough to supply a different symbol for al1 such unions in the system, a word would have been needed even if the attempt to use a character had been made. Below the level of a genus the rules change. Words which occur nowhere else may be employed at the level of species, to refer to a single type of plant, and in this case the name of the genus could be omitted, but there are many instances of dependence. Taraxacum oficinale is the dandelion, and the word 'officinale' does not subsume al1 previous meanings in itself. There are other officinales comfrey, for instance, (Symphytum oficinale), and Nasturtium oficinale (watercress). Another rigidity arises from the need to place the arrays em-
ployed in a hierarchy in a fixed order. As an example, let US imagine a library of the arts, so arranged that al1 the works it contains are divided first according to art, then according to country, and lastly according to period. Imagine twenty arts, twenty countries, twenty periods. The student of a given art has little difficulty. Although his material is in many different places, these are, at least, al1 close together. Al1 the books about the seventeenth art come before any about the eighteenth and after al1 about the arts labelled with lower numbers. The student of a given art in a given country has an even easier time, for he has only twenty places in which to look, these being divisions of his subject into periods. The student of a given art in a given country in a given period has the simplest job of all. He has asked the irnmediate retrieval question, and has only one place in which to find his material. On the other hand, the student of al1 art in a particular period has four hundred places in which to look, none of them next to each other. He is unfortunate, for his order of importance is not that chosen by the librarian. Order is one thing, choice is another. The designer of a classification is lucky if time's only effect upon it is to change the order of importance of its arrays. Some features may become useless, and others, not even admitted to the hierarchy, may become allimportant. In these cases the classification breaks down. This is often the case with product classifications in industry, where changes are frequent. New things are made, old ones are discontinued, mergers bring in new companies whose output must be added to the index, and so on. Product classifications do not have an easy life. A data field for a hierarchy
When codewords are to be made, and a hierarchy is adopted for the purpose, the data field which results takes on a distinctively patterned appearance. Figure 28 shows the field for the vegetables, assurning that we have encountered ten batches of these, and that
106
1
/
2 m
j
3 g
they have been placed in a storage order corresponding to their codenames (figure 28). We can see from the blank spaces in the field that the hierarchy is not transformable, that the arrays are dependent. We note als0 that the Same number of data units occurs in each column, reflecting the fact that al1 the positional codewords are of the Same length. We have been helpful, arranging the field so that the features appear in order of importante in the hierarchy, but we could have used the commutative quality of a data field to disguise this. We may als0 see that the set of six data units in columns 6, 2 and 7 and rows A 0 and B 0 form a fully-present subfield. There are four such subfields in the field, as many as there are different code sequences. The item numbers across the top of the field show how the order of arrival of the batches, which we may take to be the order in which they are numbered, is not their storage order. Finally, the blank areas are so disposed that we could push the lowest of the filled areas, as it were, up int0 the gap. This reflects the space-saving in codewords made by means of a dependent system. cummary
Many of the operations of information-handling fa11 into a simple sequence, moving from one extension of the data field to the other and back again. The steps in the sequence are: identify, translate, retrieve, describe, search; the change of extension occurs at the stage of description. Al1 indexes are descriptive in a sense - features and items describe each other - but in some cases the terms of one sort are arranged in hierarchies so as to produce positional codewords which describe the terms of the other. The positional effect is latent the moment a hierarchy is made, and becomes overt when a single set of ordered characters is used time and time again, once in each position available in the codewords it makes. When items are arranged in order by means of a hierarchy of features we have a classification of the items concerned. If the features which are used in an index are not arranged in a
Figure 28.
6 Array A: type of vegetable Array B: price af vegetable Array C. edibility of
2
7
0
1
3
9
8
4
5
,
O:
, O:
,
cheap
O: edib'e 1. inedible
Codewords
hierarchy, but listed in some useful order for easy finding, with the items they possess indexed against them, we have a schedule of features used to make a manifold of items. Items may appear in many different places in a manifold, since they possess more than one of the features which appear in its governing schedule. In a classification, however, the items each possess only one fully united or specific feature, no matter how many more intersected or generic features may comprise it. Hierarchical indexes farm an important part of data study. They can be of use when no mechanism for logica1description is available to aid in searches, when positional codewords are needed, and when an index is called upon to reflect the structure of a subject which is itself hierarchical. On the other hand, they have limitations. When indexing items, they handle relatively united or specific features, and thus make it difficult at times to search for those which are more generic. They do not permit many features to be indexed against a given item without becoming unwieldy, and they enforce an arbitrary choice and an arbitrary order of features, with little opportunity to change these once they have been fixed.
Direct coding
Let us imagine a punched item card, representing a gramophone record. The holes represent features of the record, generic or specific, one hole for each feature. This is a case of direct coding. Indeed, it is direct in both extensions of the field. In general, so long as we have one vehicle for each case of one type of term (continuation vehicles being allowed if one is not enough) and one mark for each instance of the other type, then we have direct coding in each extension. Thus if we consider codewords of the sort usually made from a hierarchy or a schedule, we find that each element of these is a character acting as a direct representation of a term. As an example, we may class g ~ m o p h o n erecords according to whether they are (A) or are not (A) twelve inches in diameter, and we may also class them according to whether they do (B) or do not (B) carry clarinet quintets. Then the commutative codeword AB, made from the schedule just described, is direct, a phrase composed of two direct elements, referring to records of clannet quintets which are not twelve inches in diameter. Such a codeword could be shown on the punched item card representing the record by a hole in position B and a blank in position A. Graphic coding
Let us now suppose that we have only one hundred places on the punched item cards in which a hole may be placed, and let us imagine that we have many records to be indexed, containing, between them, music played by three hundred different bands. Further, to make things awkward, we may say we want to be able to find al1 the records featuring any given band. How are we to represent the bands on the cards? There is not enough room to use one position for each band, so direct representation is clearly impossible. Let US therefore take the twenty-six letters of the alphabet, and
unite them in pairs. There are three hundred and twenty-five such pairs. If we represent each band, not by a single hole (named by a single letter) but by a pair of holes chosen out of our collection of twenty-six positions, we shall use up only twenty-six of the hundred spaces available on each item card. The remaining seventy-four wil1 be available for the representation of other features. Moreover, we shall be able to give unions of letters to twenty-five more bands than occur in our collection. This is al1 gain. Duly, we choose twenty-six positions on the cards, allot letters to them, and punch them in pairs, one pair per band. Now suppose we look at the marks, the letters. We can no ldnger say that each represents a single feature, in this case. a band. The hole in position Q, for example, rnay be part of the union (we rnay cal1 it a digraph) &Q or BQ or CQ or DQ; indeed, by itself, no hole in the set of twentysix positions has a direct meaning. It rnay be considered together with any of twenty-five other holes. Together with any one of these it represents a term, and so it is possible, from this point of view, to think of it as in itself representing oniy a half of a term. There are twenty-five different terms of which it forms a half in this way. If we were using a tetragraph code, with four characters to be assembled before a meaning could be symbolised, then any three of these would form three-quarters of a term. Such parts of terms rnay be thought of as subterms and when they are united, the specific resulting is an entire term. Thus we rnay speak of subitems and subfeatures as occasion rnay require. An arrangement of this type rnay be calied a graphic code, the word 'graphic' being taken to cover letters, numerals or any other characters. When we use a graphic code, the holes or other marks in or on item vehicles, or the cards or other vehicles in feature indexes, symbolise characters whích are part of the names of features, and the transverse. The marks represent, as we have noted, only parts of terms. Thus if a single mark (we rnay think of it as a hole) means 'boy', the coding is direct; but if it means 'B', and other marks have to be added (say for '0' and 'Y'), then the coding is graphic. Graphic coding rnay indeed spel1 the word we
-
Subcodes and supercodes 1l
!! i I
/I !l
I
I
11 i
II li i I!
use as a term's name, but it may not : if 'V' and 'L' and 'O' and 'B' in that order meant 'boy' the code would be graphic, and pronounceable, but would not spel1 a word of accepted English. If the term were represented by the four marks 4DP8 the coding would still be graphic but would be pronounced only as four separate words, one for each character. Graphic codes, like others, may be commutative or positional.
I
'l l
l
l
By reference to any direct code, a graphic code is a subcode; its marks represent parts of terms only. However, there is a family likeness between the way in which a graphic subcode builds terms and the way in which the terms of a direct code build more complex terms. Against 'AuBuC = human' and ' A u C u D = unkempt' we may lay 'human~masculine~juvenile= boy' and 'humanu juvenileuunkempt = urchin'. We can say that the more generic terms of a direct code are used as a supercode with which to make those which are more specific. Pf we index more than one subcoded term, each being represented by characters from the same subcode, or more than one specific term, each being represented by more generic terms from the Same set of direct terms, against a single transverse term, we may obtain wrong answers. For example, if we imagine that BuCUD means masculine, and that we have recorded a certain ragamuffin young lady as being AuBUC, namely human, and also as AuCUD, or unkempt, then we have a problem on our hands. Automatically she has been recorded as BuCUD, as masculine. The trouble is that ia set contains al1 of its subsets. In listing al1 four characters against 'the young lady concerned we have also listed every term any selection of them may represent. Unfortunately, one of these terms happens to be wrong. This effect is known by many narnes, of which 'crossover', 'false drop' and 'ghost7 are examples. With direct coding, the Same applies. Our graphic subcode was used to form features which were then related to an item. The wrong
information would not have been generated if we had related each trigraph to a different item, an effect we could have sirnulated by giving the young lady two referente numbers and recording one subcoded feature against each of them. Nor would it have arrived if we had taken the trigraphs from different alphabets of characters so that they could not have interfered with each other. Let US pursue the matter in the case of supercodes. We use, not an alphabet, but a whole vocabulary, and we apply this vocabulary, taken in one extension of the data field, to describe terms in the other. The terms must al1 be compatible in the sense that the complete set of descriptions must be such that every subset is a true description of the term which it describes. Thus, if we return to our gramophone records, a given record rnay carry a violin concerto by Mozart together with another violin concerto played by Menuhin. Then 'violin', 'concerto7, 'composed by Mozart' and 'played by Menuhin' are al1 fair descriptions of the contents of the record. Yet the violin concerto played by Menuhin might have been composed by Elgar - so 'Menuhin playing a concerto by Mozart' would be a false account of the record. Again, we should not have generated wrong information if we had related the correct three features to each of two items, perhaps one upon each side of the disc, or if we had used two similar vocabularies, choosing three features from each. The semantic continuum
In practice, we need not step immediately from a graphic subcode utilising single characters to a code of the direct type. For example, it is possible to utilise characters to make syllables. We rnay form such parts of words as oxy and di and ase and y1 and laevo and beta and ene, and still possess only smal1 portions out of which complete names rnay be made, in this case, names of chemicals. We rnay make subcodes out of subcodes out of subcodes, or we rnay put a code between any two codes we rnay think of, to be made out of the one and to form the elements of the other. We are faced with a continuum. This has been given several names, of which the best-
Figure 29. The semantic continuum.
Decision point
Generic point
Graphic range (SU bcodes)
'O'
ill i,ll
1i i 11
I1 /l i/ ;I i ll
11 /1'i
!Ii
i1 :
1y
1IIj I I
I
j
i
l
Direct range (supercodes)
'1'
known may be 'the descriptive continuum', a title which gains power because the terms and parts of terms it contains rnay be used transversely, for description. There is, however, a danger in naming anything according to what it does or according to its use, since it rnay turn out to have other functions and purposes also. We rnay therefore decide to cal1 the continuum by the name 'semantic', since it is composed of al1 the words and codewords and parts of these to which meaning or partial meaning is attached. The semantic continuum has what appears to be a unit point, occupied by the most generic terms, above which occur more specific terms of two, three, four, and co on to infinity, terms of the fully generic sort. Below this generic point we encounter subcodes, whose elements represent parts of terms. These can be infinitely dense. Further, we may think of a place in the continuum corresponding to, say, three and a quarter, occupied by a specific term which is a phrase made of three generic terms and a character from a tetragraph subcode. As usual, we must not press analogies too far, but there is something here which teases us into further inquiry. For instance, is there something not unlike zero in the continuum? If so, is it something to do with binary subcoding? We can hardly reach any lower position than binary occupies, for it makes use of one character only, which is either present or absent. This type of code has a representation as a pair of characters, say A and B, or O and 1, such that one or the other must always be chosen. There are only two possibilities, which may (respectively) represent any subterm or term and its complement, anywhere along the entire
semantic continuum. When they represent the most abstract matter of all, not an element of a phrase, not a generic term, not an element of a subcode, but an element of an element of an element, then we may fee1 we have reached the spot where the continuum begins. Here presence and absence are equally important, separated, if we like, by the place we may cal1 zero, the place of no information at all. Either direction, from here, gives us at least one piece of news - it tells US which way we have chosen to move from our origin. We have widened the meaning applied to the word 'element', which earlier meant an element of meaning, a term, in a codeword which represented a phrase or more specific term. This, clearly, applies in the direct range of the continuum. In the graphic range, an element is one of the characters (or unitary sets of characters) available for building subcode words. Textual matter
We noted in chapter 1 that a data field is a set of statements. The information provided in a document, as textual matter, is such a set, often a very involved set indeed, and much of the ski11 of abstracting lies in reducing it to simple terms. In the long run, these simple terms are items and features whose relations are the subject of the text, and the abstract is designed to tell US briefly what these are. An index to the text, however, is not always (and perhaps not even usually) concerned to tell US what they are. Instead, it is concerned to direct us to the places in the text where the information is given, or if the items are not pages but whole books, to the relevant books in the library. However, in the course of listing al1 the relevant items and features of the data field of the text, and recording them against the book which contains it, many opportunities for crossovers arise. This is because every textual feature can combine with every textual item. It is as if the items were treated as features, recorded in the feature extension, and as if the result were then compressed int0 a single set of data units against the document concerned.
F i p r e 29. The semantic continuum.
Decision point
Generic point
Graphic range (subcodes)
'O'
Direct range (supercodes) '1'
J,
1
il
1ii
I
li I,
I,
Ji
f
6'
,l
'84 SN
j
4' :l
known rnay be 'the descriptive continuum', a title which gains power because the terms and parts of terms it contains rnay be used transversely, for description. There is, however, a danger in naming anything according to what it does or according to its use, since it rnay turn out to have other functions and purposes also. We rnay therefore decide to cal1 the continuum by the name 'semantic', since it is composed of al1 the words and codewords and parts of these to which meaning or partial meaning is attached. The semantic continuum has what appears to be a unit point, occupied by the most generic terms, above which occur more specific terms of two, three, four, and so on to infinity, terms of the fully generic sort. Below this generic point we encounter subcodes, whose elements represent parts of terms. These can be infinitely dense. Further, we rnay think of a place in the continuum corresponding to, say, three and a quarter, occupied by a specific term which is a phrase made of three generic terms and a character from a tetragraph subcode. As usual, we must not press analogies too far, but there is something here which teases US int0 further inquiry. For instance, is there something not unlike zero in the continuum? If so, is it something to do with binary subcoding? We can hardly reach any lower position than binary occupies, for it makes use of one character only, which is either present or absent. This type of code has a representation as a pair of characters, say A and B, or O and 1, such that one or the other must always be chosen. There are only two possibilities, which may (respectively) represent any subterm or term and its complement, anywhere along the entire
semantic continuum. When they represent the most abstract matter of all, not an element of a phrase, not a generic term, not an element of a subcode, but an element of an element of an element, then we may fee1 we have reached the spot where the continuum begins. Here presence and absence are equally important, separated, if we like, by the place we may cal1zero, the place of no information at all. Either direction, from here, gives us at least one piece of news - it tells us which way we have chosen to move from our origin. We have widened the meaning applied to the word 'element', which earlier meant an element of meaning, a term, in a codeword which represented a phrase or more specific term. This, clearly, applies in the direct range of the continuum. In the graphic range, an element is one of the characters (or unitary sets of characters) available for building subcode words. Textual matter
We noted in chapter 1 that a data field is a set of statements. The information provided in a document, as textual matter, is such a set, often a very involved set indeed, and much of the ski11 of abstracting lies in reducing it to simple terms. In the long run, these simple terms are items and features whose relations are the subject of the text, and the abstract is designed to tell US briefly what these are. An index to the text, however, is not always (and perhaps not even usually) concerned to tell us what they are. Instead, it is concerned to direct us to the places in the text where the information is given, or if the items are not pages but whole books, to the relevant books in the librarv. ~owever: in the course of listing al1 the relevant items and features of the data field of the text, and recording them against the book which contains it, many opportunities for crossovers arise. This is because every textual feature can combine with every textual item. It is as if the items were treated as features, recorded in the feature extension, and as if the result were then compressed int0 a single set of data units against the document concerned.
Figure 30.
r\
Bob
Bob Torn
Big Tall
Torn
@z
O i<"x*.z~P
?.+K$...
mi
Bob
0 @j
01
Big
Tall
Both
Bob Torn
O
@t
Thus we may think of a Toddler's First Reader which tells US that Bob is Big and Tom is Tall. The data field consists of the four data units shown in figure 30, which may be represented as in the centre, and then compressed int0 the single column as on the right (figure 30). If we fee1 the need to justify our treatment of an item as a feature of itself, which seems to happen at the top of the centre field, we may argue that being itself is the one unique feature every item possesses. Bob is Bob, besides being Big, and Tom is Tom. Thus the new onecolumn field gives us every possible crossover. We appear to have indexed the fact that Tom is Big as wel1 as Tall - and so on. Comprecsion
This brings us to the idea of a compressed data field. If we return for a moment to the example of chapter 1, where the idea of collapse was encountered, we may compare the two operations. Thus, if we collapse items 2 and 1in our original example of a data field, the result is a more specific item that we may cal1 (22) and that possesses the features which 2 and 3 have in common, namely and D. We have in effect collapsed the subfield 23AD in one extension, which is the Same as intersecting items 2 and 3 and at the
Same time giving them a single more specific name. We could, however, have united these items, with the result that our new set of features would be A, B, C and Q. To unite them, and apply a single name to the result, is compression. It is a powerful source of crossovers, which often seem to arise because, having produced a situation in which al1 the features (and the items acting as such) relate immediately to the new specific item, we continue to act as if the new item's parts were still distinct. Librarians may wel1 be amused at this. No relevant book is ever missed in a search as a result of a crossover; the only problem is the occasional retrieval of one which is not relevant. This is easy to recognise and to replace; difficulty arises only when the works are not examined, but are passed on directly to an enquirer as themselves, or as a list of titles. On the other hand, statisticians are much concerned, for instead pf retrieving items they merely count them. They have no chance of finding out whether an item admitted to the count is rightly there or not, and so they must be quite certain that wrong ones cannot appear. Means of dealing with this and related problems are mentioned later in these pages as we explore the semantic continuum upwards from decision point to infinity. Summary
In each extension of the data field we may represent parts of terms, terms, or phrases made of collections of terms. This entire range may be called the 'semantic continuum'. It has some similarities, which must not be pushed too far, with the continuum of rational numbers. It has a zero, of no information, which can be thought of as a decision point. Above this, there is the graphic range, of subcodes, then the generic point of individual fully-generic terms, and including this, the direct range of more and more complicated descriptive phrases. The behaviour of the terms and parts of terms in the continuum is the Same wherever they are set in it. In particular, the way crossovers are generated is the Same wherever they occur in the continuum.
Commutative subcodes
Let us take a series of six characters, from A to F, and assemble them in pairs, writing these in a triangular formation thus:
.
We can make, as we see, fifteen different pairs out of these six letters. From an alphabet of twenty-six letters, chosen in pairs, we can make 325 pairs; and if we assemble the letters in tetragraphs, sets of four, we can make 14,950 such sets. It is useful to cite the characters which form digraphs, trigraphs, tetragraphs and so on in alphabetical order, even though a pair is a pair (and a triad is a triad . . .) no matter which order its elements hold. This is because the conventional order of the alphabet makes it possible to give each polygraph a conventional position, and enables us to see when apparently different polygraphs are really the Same. Thus DPAQWK is the Same as WAPDKQ, a fact which becomes obvious when both are permuted int0 the alphabetical ADKPQW. A further and very useful convention is to take the underlining (for presence) as understood in discussing codewords. We shall do this, as required, from now on. Crossovers
Our triangle of fifteen digraphs gives us the symbols we need for a commutative digraph subcode to base six - that is, made of an alphabet of six characters. Each digraph differs from every other,
I
and collectively they form al1 the digraphs it is possible to make out of six separate elements. We can use them to represent any set of fifteen terms, or less, if we do not mind leaving some digraphs unallotted, but for safety the terms should be mutually exclusive. . We met this problem in the previous chapter when we saw how the recording of A and B and C, and also A and C and D, against a single item, led to our recording B and C and D against it, willy-nilly. If recording ABC had, in that case, made it impossible for us to record anything else, then this difficulty would not have arisen. From our present triangle we can choose any two digraphs and obtain, by way of crossover, a third and perhaps a fourth, fifth and sixth. We have already met this effect, and named it a 'crossover'. Another of its names, mentioned in chapter 6, arises because it leads to irrelevant item cards falling out of packs of such cards dunng selection by means of needles or other sensing devices. This is the name 'false drop'. If crossovers are to be prevented, special action must be taken. We must deny ourselves the luxury of indexing more than one feature from any subcoded set of features against any given item, and the transverse. If we cannot or will not do this, palliative action rnay be necessary. A well-trained indexer can always tel1 when crossovers are put int0 the index being created. It occurs whenever the rule about terms in subcodes being mutually exclusive is broken. He rnay then take steps to warn the searchers who come after him. Thus, in a punched feature card index, he rnay employ a card entitled 'warning', punching it in the pssitions occupied by the reference numbers of items which rnay possibly appear incorrectly. In the case of punched item cards, he rnay add a 'warning7 punch position, or notch, which will separate the cards which had better be separately examined for correctness from those which are safe. Another measure which rnay be useful is to refuse to allot a meaning to any subcode polygraph which is known to be generated by existing multiple coding. Thus, if in a comrnutative digraph subcode, we have registered DE and F G against an item, we know
that DF, DG, EF and EG will als0 appear to be registered against it. But if none of these occur in the list of digraphs to which meanings are attached, we shall never search for them, and so the problem will not arise. On the other hand, if this method is carried grimly to its conclusion, the advantage a subcode provides namely, indexing many terms on few cards or positions - will vanish. Other palliative methods of use against crossovers include schemes of superimposed coding, which are mentioned later, that reduce the chance of their occurring to an arbitrarily low figure, and searches which include terms known to be highly correlated with those which may wrongly appear. Thus if, in an index of gramophone records, Manuel's Carribbean Band is known to occur as a crossover, and als0 to play nothing but calypsos, then to search on Manuel's Band plus calypso may wel1 exclude most wrong answers because the remaining subcoded orchestras, whose occurrence together causes the crossover to arise, are likely to play some other type of music. Capacities of commutative subcodes
The capacity of a subcode is the number of different codewords which can be made from it. In the example of the digraph triangle above, six letters formed fifteen codewords, and the capacity of the subcode was fifteen. The genera1 formula for calculating the capacity of a commutative subcode is: n
C, =
n! r ! (n - r)!
In this formula, "C ,means the number of combinations of n things (in the present case, of characters forming the base alphabet) taking r things at a time. The exclamation mark, written after a positive integer, means that the number concerned is to be multiplied successively by al1 the lesser integers down to one. Thus, in the case
of the commutative digraph subcode of six characters, we have:
The formula can be made easier by noticing that there are as many numbers above the line after the cancellation as there are letters in each codeword, and that these numbers work downward from the highest. Also, there are the Same number again, below the line, these working upwards from 1. Thus we can write down immediately, say, the formula for choosing threes out of seventeen. It must b e l 7 x 16 x 15over1 x 2 x 3. Positional subcodes
Suppose we now split our collection of six letters into two series: A, B, C and D, E, F. We may make a subcode by arranging for pairs of letters to be chosen so that the first always comes from the first set and the second comes from the second. BD is such a digraph. so is AF. fhe result can be shown as a network of rows and columns, with the columns containing the letters of the first set and the rows those of the second thus . B
C
D AD
A
BD
CD
E
AE
BE
CE
F
AF
BF
CF
In this case, we have only nine digraphs, and the capacity of this new subcode is less than that of the other. It is als0 calculated in a different way: we simply multiply the number of characters in the first series by the number in the second: 3 x 3 = 9. We may note, too, that each set of characters is what we have called in chapter 4
an array. To make the codewords we must choose one letter from each set, and may not choose more than one; the letters in the sets are collectively exhaustive and mutually exclusive. If we compare the two types of subcode, we find that this new type is hidden within the commutative one, a square within a triangle (figure 38). d
The two tips of this triangle contain the digraphs which are not allowed in the new subcode, and are made of letters chosen from within arrays, instead of chosen one from each array. We may note that, in this new subcode, the letters D, E and F occur only in the second place in any digraph, while the letters A, B and C appear in the first place alone. This gives US the opportunity of introducing a second A, B and C in place of the D, E and F, distinguishing this second array from the first by the position its characters hold - the second position in any pair of letters. The square then becomes that shown here : AA
BA
CA
This emphasises the positional character of the method, and gives this type of code its name. We have met it before, reaching it by means of a cascade diagram. First-place-A is not the Same as
second-place-A. In a punched feature card system it cannot be represented by the Same card as that which stands for secondplace-A, and in a punched item card system it cannot be represented by the Same hole-position. This is like the construction of the decimal system, which we noted in chapter 5. A decimal square running from 00 to 99 is clearly a case of a positional digraph code. Crossovers in positional systems
In a positional subcode, capacity per number of different characters utilised (remembering that A-in-the-first-position is a different character from A-in-the-second) is less than is the case in a commutative subcode. However, some sets of digraphs can now be indexed against the Same transverse meaning without crossovers resulting. It is clear that AC and CB will yield AB and CC, that is, two digraphs which are diagonally placed with reference to each other will produce as crossovers the digraphs at the other corners of the rectangle. On the other hand, we may index, as subcoded features, the three digraphs AA, BA and CA against the Same item and have no trouble at all. The first-position characters cannot, by the rules, combine with each other. A first-position character can only combine with a second-position character, and there is only one such character available - second-position A, the correct one. Here, then, the diagonals are dangerous, but the lines and columns by themselves are safe. In other words no matter how many arrays of characters are used in a positional subcode, crossovers are obviated if al1 the subcode words allotted to any transverse meaning are the Same, except in one position only. Further disguises
Complicated subcodes can be made of many mixtures of positional and commutative codes, but these can generally be reduced to their simpler components without a great deal of trouble. As an example, we may take a form of codeword which consists of a letter-
11
1l !j
11
?i !l
;I
numeral-letter trigraph, such as A3B or F5X. This codeword is represented by two series of cards in a punched feature card system, or by two series of hole positions on punched item cards. Of these, one consists of a set entitled A3, F5, and the like, and the other of a set entitled 3B, 5X and so on. To enter such a codeword as N7Q on the cards, we punch cards, or positions, to symbolise N7 and 74. Despite its apparent complexity, this is obviously no more than a set of ten positional digraph subcodes, each known by a number from O to 9, and each consisting of two alphabets, one for first position and one for second. The codeword N7Q instructs US to go to subcode 7 and to punch NQ from it. Clearly, crossovers can occur: N7Q and Q7N would give Q7Q and N7N as such, but crossovers cannot occur between subcodes (there is no trouble with N7Q and A3B), or within subcodes providing that, as we have seen, one of the two letters is held constant.
II
1l
11
I
G
i'
l
1
Instances of subcode use
In many administrative and statistica1 information handling systems, positional subcodes are employed. Examples are the representation of dates by decade, year, month and tens and units of days, representation of sums of money by hundreds, tens and units of dollars, pounds, francs, marks and so on, decimal representation of many other measurements such as height, distance, weight, temperature, and the spelling out of names of authors, countries, customers, suppliers, cities. Commutative subcodes are much less commonly used for measurable features, and much more often used for the remainder, than are subcodes of the positional type. Binary subcodes
Suppose we now take four characters in order to make various feature subcodes. The symbols A, B, C and D wil1 suit, being used to form cornmutative polygraphs.
rnonographs digraphs trigraphs tetragraphs
A B AB AC ABC ABD ABCD
C AD ACD
D BC BCD
BD
CD
Every polygraph shown here can be thought of as the letters which are present out of a total collection of four, some of which are absent. If we know the number of characters to be expected in any polygraph, we know whether that polygraph is complete or not : for instance, if we have AB and we know the code is of the digraph type, we know that AB is entire, and not a part of ABC or ABD. This requirement, for exact definition, was mentioned in chapter 1. For US,in this case, AB is short for ADCD. If we look at the possible codes laid out above, we see that there are only four codewords in the monograph code, but six in the digraph; and in fact the digraph code is the one of greatest capacity - so long as we restrict our operations to present subfeatures only. However, we may produce a subcode of capacity sixteen by also admitting absent subfeatures, making up the total of every polygraph's characters to four. In doing this, we make a binary subcode of four arrays, and is binary because each position may be occupied by one of two characters only, representing presence and absence --
present s u b f e a tur l- total codewords none ABCD 1 one ABCD ABCD ABCB ÄBCD 4 two ABCD ABCD ABCO ABCD ABCD ÄËCo 6 three ABCD ABCQ ABCD ÄBCQ 4 four ABC!? I grand total
16
Incidentally, we have had here to add a codeword corresponding, in our earlier list, to the case of no characters at all. We have met this sort of thing before. For example, in chapter 4, we used C to mean 'aerial' in respect of a universe of birds, and
o
to mean 'aquatic'. C and D meant the corresponding absent features, and we were able to produce four types of bird from this arrangement : different main types of bird 1 2 1
number of present features in the bird_s none CD one CD Co two
grand total
4
We may note that this looks like yet another disguise of the data field we displayed in so many ways in chapter 4. In particular, we only have to shift the codewords of the middle line a little to the left, and we are back at a simple lattice diagram :
CD --
.
,
CD --
From this we conclude that the reader working out a lattice for four features may obtain a clue from the sixteen-codeword binary subcode displayed a little earlier in this section. In chapter 4, we placed the features C and D in an arbitrary order, in fact we placed D first, and then we used the symbols O and 1 to represent absence and presence, putting absence first. The result was a set of codewords 00, 01, 10 and 11, positional and normal, describing routes through (or the end-points reached by these routes through) a hierarchy. Using the characters A, B, C and D in alphabetical order, which though we are very wel1 used to it by long practice is equally arbitrary, we can use this cascade method to make sixteen codewords comparable to the sixteen already displayed.
, I
There are many orders in which these codewords may be shown. We may lay two of them side by side in figure 3 1. The first, on the left, is the order obtained by following the rule that codewords with single present features precede those with more than one such feature, and so on, while at the Same time feature A precedes B, and so on. The second, on the right, is the order obtained by following a hierarchy whch creates the symbols for the binary numbers from zero to fifteen, sixteen of them in all. The arrangement on the left appears appropriate in many cases in which we are concerned with identity. The set (ABCD) and its subsets are shown as ordered by inclusion. The arrangement on the right appears more closely related to quantity, and its binary symbols may represent numbers, as shown. Both arrangements are symmetrie about a centre line drawn half-way down the table, if presence and absence are interchanged, and on either side of this line the codewords complement those on the other side. The entire sixteen, thought of as present identities (6, A, B, C, Q, AB, and so on) form what is known as the power set of the set A,B,C,II; there are two to the power four of them. On the left, the commutative subcodes are shown in descent; on the right, and at right angles to this, is shown the hierarchy of binary numbers, quantity seeming in some way orthogonal to identity. The numbers run from two octads through four tetrads and eight dyads to sixteen units - a subcode which is positional. With our eye on these two arrangements of binary symbols, we may distinguish between two major forms of binary subcoding. On the right we have a positional subcode for numbers, one which seems to be natural. We apply the usual rules for addition, for 1 = 0, example, and the notation works as it should. Thus O and 1 O = 1, and 1 1 = 0, together with a carry-over of 1 to the next column to the left. Result: to add two and three we write:
+
0010 0011
(two) (three)
0101
(five)
+
+
By contrast, on the left we have several commutative subcodes for sets and subsets, and the rules are different. It is questionable at all; perhaps U would be better. Still, whether we should use using for the moment, we have: O O = 0, and 1 O = 1, as before. However, the carrying rule goes int0 reverse, and we have 1 1 = 1, carry 0. Then we may add 0011 (shown as meaning Q ) and 0101 (Bll) to obtain (01 11). This is obviously what goes on in the case of union of subsets, and in the generation of crossovers.
+
+
+
+
+
A
B
O O O 0 0 O O O 1 1 1 l
0 0 0 0 1 1 1 1 0 0 0 0
C
D
0 0 o 1 0 1 1 0 2 , l 1 1 3 0 0 4 0 1 5 1 0 6 7 1 1 0 0 8 0 1 9 1 0 1 0 , 1 1 1 1
r
+- ,
-t
'
-
-
-
In many cases in information handling, a binary code is used in this entitive way. Particular patterns of marks carry particular meanings, as in our example of a data field in chapter 1. The pattern we show as 0111, or alternatively name ECD, may wel1 have an arbitrary significante. Punched across the channels of a length of punched paper tape, it might represent a character of an alphabet, a numeral, an instruction (such as to change the print to lower-case) or a punctuation mark. In the columns of a standard 80-column tabulating machine card the holes are subsets of a set of characters
Opposite A typical code for punched tape. This is clearly a binary code of five arrays, but it is a little more complex than appears on the surface. In many cases, the patterns of holes across the channels of the tape represent two characters ('figure' or 'letter'), the distinction being made by the use of a 'figure shift' or 'letter shift' pattern. Each of these controls al1 the following characters until it is countermanded by the appearance of the other, as the tape is scanned by the tape-reading device. This effect would otherwise have to be achieved by the use of a sixth binary channel, where hole and no-hole could show which of the two alternative shifts was meant. In the layout , the 'decimal points' indicate the sprocket holes in the tape.
often s h o k , running from top to bottom, as X, Y, O, 1, 2, 3, 4, 5 , 6, 7, 8 and 9. A combination digraph subcode chosen from these f o m s a means of representing letters of the alphabet; numerals may be represented by a monograph code omitting the X and Y. Here, the X, Y, O, 1,2 and so on, do the duty carried out by the A, B, C and D at the tops of the columns of our display. It is important to note that this is not a use of the binary symbols to represent numbers immediately, as 0010 represented number two, but to represent numerals. The fact that these in their turn stand for numbers is another matter. Identity and quantity again Here we may pause for a moment and look at the way in which identity and quantity seem to deñne each other much as features describe items and the transverse. Suppose we have a set of something - we should perhaps cal1 it a subset - chosen from a larger set. How can we tel1 which subset is in question? In the most abstract case, it seems we can give only arbitrary numbers or places, which we must define numerically, to the members of the larger set, and then say which ones are chosen to make the subset. It comes down to number in the end, since even if we try, to start with, to use some other method of naming, such as the letters of the alphabet, we are still faced with the problem of defining the elements of the other method. To do this, we must put them in order, so we define sets by means of numbers. To give a practica1 example, we may defìne a set of holes across a punched tape by numbering its channels and saying in which positions the holes occur. So far, so good, but how do we define a number? The accepted
1 I
I1' I
I
M
Y$
!i
P
definition is that a number is a set of sets. Thus the number four is the set of al1 tetrads. Let US give an example of a tetrad: the tetrad composed of the first, sixth, eleventh and seventeenth element in our set of elements, which we may take it, for the moment, contains seventeen at least. We are back to numbers again. Numbers define sets and sets define numbers. Once more, as with items and features, we are in a self-sustaining situation. With so many places in which matters seem to have no basis, but merely to support each other in thin air, it is a marvel that anyone involved in data study remains sane. Perhaps information-handlers only have a conviction of their sanity which is not borne out by the facts. To return to binary, when a binary sequence immediately represents a number, the positions represent powers of two. Thus, in the case of 01 10 the positions represent 23,22,21 and 20- namely, 8, 4, 2 and l - and the code sequence can be interpreted as (1 x 4) (1 x 2) + (O x l), or 6, as shown in the (O x 8) table. This, of course, is typical of systems of numerals: the positions in the decimal system represent powers of ten. As a representative of an entity, however, 01 10 signifies a subset, the subset of 22 and 21 chosen from the four powers of two available. There is no question of order about this and the members of the subset al1 occur at once. Again, entity and quantity appear, like items and features (diagrammatically at least), at right angles.
+
+
Pascal's triangle and binomials
Subcodes have an interesting relationship to the display of numbers known as 'Pascal's triangle', which goes on for ever and begins as in the table opposite. Apart from the unit at its apex, every number in this triangle is obtained by adding together the number, if any, immediately above and the number, if any, to the left of this. Reading off the triangle, we find that row 4 gives, as a sum, the capacity (16) of a binary subcode of four characters, and that the numbers appearing dong the row give the capacities (1, 4, 6, 4 and 1 respectively) of agraph,
I
monograph, digraph, trigraph and tetragraph subcodes made by taking different subsets of none, one, two, three and four characters from the total number available. I'hese may al1 be thought of as commutative within themselves, even though this effect only occurs in practice when we reach the digraph subcode. For those with an interest in mathematics we may use the symbols p and a to symbolise presence and absence, writing $p and +a because there is the Same amount, one half, of presence as there is of absence in the entire binarv code when it is fullv written out. We rriay use o LO signiIy tne oase or rne coae, rnar is, tne number o1 separate positions, represented by characters, it contains. Then we may write the formula for a binomial: (+p +a)b,which we can expand. If the result is displayed in a column, term bv term, we can show its interpretation (taking b equal to 2, for the example) : formula: (+p +a)b = i p 2 +pa i a z ; yrbsubcode cap&tty: 2O = 2" 4 +p2 there is a present graphic subcode of two characters (therefore, a present digraph) whose capacity is a quarter of the total capacity of the binary subcode (therefore, one). If we take X and Y as the characters in the base alphabet of two, this rninimal subcode consists of the single codeword +pa there is a present graphic subcode of one character and also an absent graphic subcode of one character; both of these have a capacity which is a half of the total capacity of the
+
+
+
+
&x.
binary subcode, and therefore of two. These are (present)
X and y and (absent) X and P. (Note: pa = plal). i a 2 there is an absent graphic subcode of two characters (therefore an absent digraph) whose capacity is a quarter of the total capacity-of the binary subcode (therefore OB). This is the digraph XY. In the above, the underlined one, two and one are the numbers in the appropriate line in the Pascal triangle; and if we had expanded (p a)b instead of putting the proportions int0 the formula we would have obtained these directly. Thus with b = 3 we have (p a)3 = lp3 3p2a 3pa2 la3. This gives the 1 - 3 - 3 - 1 pattern. The sign 1 is usually omitted in writing out the expansion. If it is so much more direct to omit the proportions, why put them in? The answer is that it gives US a chance to tie up our subject with statistics once more. There are times when a statistician knows sufficiently wel1 the proportion of items possessing a given feature in a large population of items, and wishes to calculate the chances of obtaining a certain number of items possessing this feature if he takes a smal1 sample. Suppose he sets b equal to the number of items in his sample, and writes the proportion of present-featured items int0 the formula. Say that one quarter of the items in the large population have the feature, and that the sample takes ten. The result is the formula ($p +a)lQ.The proportion attributed to a must be 2 because it is the remainder of the population after the present $ has been taken care of. Now if the statisician looks for the mathematica1 term in the expansion which contains p6a4, he wil1 find that its coefficient gives him the chance of getting six present (and therefore four absent) cases of his feature in the ten items of his sample. He does not need to work out the whole expansion, however, because there is a formula for the job. Even we can do it, in a very slightly roundabout way, for we are obviously concerned again with T,,the number of ways of choosing (in this case) six things out of ten things, as a proportion of al1 the possible ways of choosing any number of things out of 2" (in this case 210) things.
+ +
+
+
+
+
Additive subcodes: 1. 2, 4, 7 We have seen that the positions in our four-position binary subcode can be treated as representing powers of two, which add to the numbers written at the side in our earlier example. Thus, if we write 8,4, 2, 1 in place of A, B, C , D, we have:
And so on. This is a commonplace of the binary number system, and it has been used as a basis for a method of representing the numbers from 1 to 9 by additive pairs. Using O as wel1 as 4 , 2 and 1, we can reach the representation of the number six in a straightforward way :
However, we cannot achieve a representation of seven using only two characters unless we introduce the appropriate numeral, when we can achieve 7 O = 7, 7 1 = 8, and 7 2 = 9. This, the 0, 1, 2, 4, 7 subcode, is rather forcibly changed from the nòrmal binary, to achieve the advantage of every codeword (such as 2 1, for instance) consisting of two characters only. It is one of the standard methods of representing numbers on edge-notched cards. The problem of representing nought is not solved by it as it stands, but another position (or character), often shown as S, may be used, to form S O = 0, which completes the subcode.
+
+
+
+
+
The method does not give a very great reduction in numbers of cards or positions employed in order to represent the numerals from O to 9, but its additive value is useful. Thus, if we have indexed a set of articles which have been sold at various prices from £1 to £9, we may count al1 those indexed by means of a 1, add to this total twice the number of those indexed by a 2, add to this foyr times those indexed by a 4, and finally add seven times the number of those indexed by a 7. The result is the total income from sales. In this particular case, if a punched feature card system is in use, the code may give a result by counting and multiplying the numbers of holes in four cards only, instead of operating on as many cards as there are items whose values are to be added, which would be the case if an item card system were employed. Superimposed coding
Al1 crossovers are generated as a result of supenmposed coding or subcoding, although as a genera1 rule this is not intentional. Either it happens by oversight or it is accepted as necessary but unfortunate. However, it is possible to make a virtue out of necessity and use a code or subcode with a very large base - a subcode, for example, possessing as many characters as there are cards in a feature card system or positions upon an item card. This may be more than a hundred or even more than a thousand. Both commutative and positional codes can be superimposed, and in each case the object is to use a system of such a large capacity that the crossovers are reduced to negligible proportions. Imagine, for example, a tetragraph subcode of features to base four hundred. Four hundred punched feature cards, or a set of punched item cards with four hundred positions upon each, wil1 be needed for this, and every feature wil1 be represented by four feature cards or by four holes in the appropriate item card. There are more than a thousand rnillion possible sets of four; to employ a system using one thousand features is only to use one millionth of the subcode's capacity. There is, in fact, only one chance in a million that a
particular set of four characters has any meaning, so on average, only one millionth of the crossovers generated will be meaningful and therefore likely to cause trouble. The remaining crossovers, having no feature attached to them, will never be sought for, and so will have no chance to appear in response to an enquiry. This, of course, is the case of the commutative code. The positional code differs in that its characters are split int0 different arrays. Each array may, however, be very large. A further element of safety in such a code arises because we seldom seek for one feature only. The whole point of most information handling systems is that they permit the co-ordination of several features at a time. Suppose we return to the tetragraph subcode to base four hundred and assume we are looking for a set of three features twelve characters of the subcode being involved, if we assume no overlapping. To encounter crossover trouble in this situation, we must assume that some other item has been indexed with a set of characters which includes al1 twelve of those which refer to the features of our search. This likelihood is so remote that it may be discounted completely. It is possible to calculate the number of crossovers to be expected in various conditions, but the calculation has to take many variables int0 account: the type of code (positional or commutative), the number of characters per codeword, the number of characters in the base alphabet, the number of subcoded features indexed against each item, and the likelihood of these features' subcodes containing one, two, three (and so on) characters in common, al1 play their part. Most of these calculations are belied in practice because information handling systems change their characteristics as time goes on, and in any case can seldom be specified with full accuracy even at the start. This is no criticism; as a genera1 rule it is advantageous for the system to conform to the changing circumstances in which it operates. It means, however, that the deliberate adoption of a superimposed coding system is usually preceded by the defenestration of arithmetic and by the bold choice of a base alphabet of rather large proportions.
Surnrnary
li
11
b
I<
1i
1
&
I!
Subcoding from the viewpoint of direct indexing is a means of reducing the number of cards or holes, or other symbols, required for the representation of meanings. The fundamental subcode is binary, which can be arranged to show the qualities of a positional system and als0 those of a comrnutative system. Graphic, and other more complex codes, can be shown to arise from binary codes by appropriate selection and arrangement. If subcodes are to be used safely, the meanings represented by the codewords they make must be mutually exclusive, otherwise crossovers, resulting in extra and wrong answers to searches may arise. There are vanous methods of reducing crossovers or of warning against their existence, and one form of subcoding - the superimposition of subcode words - defies them boldly. When numbers are to be recorded, it is possible to employ additive codes, which are als0 derived from binary in the simplest instances. Subcoding can be applied to items as wel1 as to features, although this is seldom done. By the symmetry of the data field, however, the behaviour of item subcodes can be derived from that of subcodes of features.
Monadic terms
A data field does not make use of one semantic continuum only, but of two, one in each of its extensions. Both the items and the features can be subcoded or represented directly, being as generic or as specific as we please. As we may expect from our study of the field, terms become more generic in one extension as they become more specific in the other. This is the transverse relationship between union and intersection in a slightly different guise. If we set a level of genericity in one extension of the field, that of the other is fixed, and departure from it in the direction of the more generic leads to crossovers. We know that the effect appeared, for exarnple, in the Case of the Gramophone Record and the Case of the Toddler's First Reader. There is no point in labouring over the matter, but it is helpful to introduce the idea of a monadic term, a term which, in the context of a @ven level of genericity in one extension of the field, causes no crossovers. Such terms are those which a librarian, setting up a manifold or co-ordinate index, prefers to al1 others for his schedules. Thus, if we have a work on economics which refers to a fall-indemand-for-mangonels and a rise-in-supply-of-arbalests,the only monadic features are this highly specific pair. However, if we split the book for indexing purposes mentally into two, and index fall, demand, and mangonels as three distinct features under part one, with rise, supply, and arbalests following under part two, then the monadic features have become more generic. Each of the two items, however, is more specifìc than was the original. The danger of recording 'rise' and 'fall' separately from the 'supply' and 'demand' to which they refer is, of course, removed because in the cases chosen there are no other subjects indexed, to which they could wrongly attach themselves. The way the terms in the two extensions of the field determine each other in this manner is clear, but in practice the effect is often defied. A decision is taken as to which are the items of an index, for example, and the features are then decided upon with only the most
glancing reference to this. Alternatively, the schedule of features is decided upon, and the items are then indexed as they stand, whether they should be split int0 more specific items or not. A typical example arises in the case of technica1 documentation, when the items may be reports or articles, and when the author's or editor's decision, on presentation or stylistic grounds, causes a paper which is really, say, three items to be printed as a textual unity. The result deceives the indexer into thinking it is one single indexable item as wel1 as a single wodge of text. Incidentally, the use of the word 'really' above can be called int0 question: it means that the schedule of features in use calls for the text to be broken up. A different schedule might permit the text to remain in one piece. Links and roles An indexing technique which has been developed in connection with this problem consists of allotting 'link numbers' to features, each link number referring to a monadic item which is treated as a subitem of a larger whole. To apply links to our current example, we should use the following list of features against the sales report, treating this as monadic : fall demand mangonels rise
1 1 1 2
SUPP~Y
2 2
arbalests
It is clear how the numerals link together the related ideas. However, we need an extra array of positions on item vehicles, and an extra array of vehicles in the case of feature systems, to accommodate them, and so the device does not save US any space. In feature systems, mangonels 1, mangonels 2, mangonels 3 and so on may be called for, as separate vehicles; in item systems,
'mangonels' rnay occupy one place on the vehicles, but an array of links must be placed against it and a similar array must go against every other concept. Links, it appears, are a way of duplicating vocabularies, and can be thought of as showing to which supercode a given direct term belongs. A role is the part played by a given word in the construction of a descnptive sentence, or of a given term in a more speciñc term of which it forms a part. Thus the term 'buildings' plays a different role in the phrase 'the effect of wind on buildings' from that which it plays in 'the effect of buildings on wind'. In the first place we rnay be concerned with the wind rocking the buildings; in the second we rnay deal with turbulence caused by obstructions to a smooth airstream. A good deal of work has been done in separating the different roles which a term rnay play in the description of subject matter, and the roles rnay be given numbers in the Same way as links were given them. In this example, we rnay think of the buildings acting on the wind, or of the wind acting on the buildings. We then have buildings-role-l (active) and buildings-role-2 (passive), and we have an active and a passive role for the wind also. Again we have a multiplicity of features, wind and buildings in their various relationships being features of subject matter. However, we rnay care little for either the grarnmar or the structure of our subject matter, and do without both links and roles as a consequence. After all, we rnay argue, there are only certain ways in which a set of features rnay combine, and on the whole the more features are combined the fewer ways remain in which they rnay al1 make sense as a whole. The result is that they form a correct description of the item sought sufficiently often for comfort. Since tolerance of unrequired items is a function of personality as well as of occupation, it rnay be as well to take this matter no further here. Paired and matching indexes
Returning to the straightforward, or relatively so, indexing of objects without subject matter, and staying with manifolds a little
longer before considering classifications, we meet cases in which some at least of the items of one system form the basis for features in a related system. This often occurs when one of the systems indexes things, and the other deals with the occurrences which rnay affect the things. Examples are people and accidents, goods and sales, machines and breakdowns. Thus we rnay at one time wish to find al1 the accidents in which Jane was involved, and at another we rnay wish to find everything about a given accident; these are feature and item questions about accidents. Further, we rnay wish to know al1 about Jane, or find everyone possessing a given set of characteristics: these are item and feature questions about people. We rnay work from one to the other. Suppose we ask how many accidents of a particular type there are, using a punched feature card system, and that, as mentioned briefly in chapter 3, we find that there are more of these than we would expect at random. We rnay then look at the individual accidents, comparing them with each other, perhaps using item cards, one for each accident. If these were punched also, the comparison could be by stacking. We rnay discover that certain people are involved in several accidents, or that certain types of people carrying out a certain process are involved. Then we move from the accident to the personnel index, seeking al1 people of that type, or looking for everything we can discover about the accident-proneindividuals. The case of machines and breakdowns is similar to this - a certain fault becomes common and we find this from the breakdown index. It occurs on a given component of the various machines which are indexed. We turn to the index of machines and ask: which of them include this component? Of these, which have had this sort of breakdown? Obviously, a given type of breakdown is a feature of a machine, and a given type of machine is a feature of a breakdown. The systems indexing machines and breakdowns are paired, and are reciprocal. A matching system differs from this. Examples are indexes used for matching applicants to jobs, tenants to houses, prospective husbands to prospective wives. Here, two different types of item
i
1 l
3
8,
!
l/
j /l l ? l 11
I
'I
/ j11/
1;
;1
/j
;i ;I
I I
share a set of features. Thus a job rnay demand someone aged 45, and a person rnay in fact be of this age; the job rnay offer a certain salary, and this rnay be the salary the applicant requires; the job rnay cal1 for fluent Spanish, and the applicant rnay possess this language. In certain punched feature card installations, a search is made by stacking the cards which describe the job, and then topping the stack by a card which means, in effect 'the item is an applicant'. In reverse, the stack rnay describe an applicant, and rnay be topped by a card stating 'the item is a job'. If jobs and applicants are recorded as they come to hand, intermingled, matching can take place in either direction, and the item becomes a curiously abstract thing which has the extra features that it rnay be a person seeking an appointment, or it may be an appointment awaiting a person. The jobs and applicants can, however, be represented by suitably designed item cards, acting as the item side of the entire index. By symmetry, a similar principle can be applied to punched or other item card systems. Conjugate or complete indexes
In the cases of the accident record, the index of machines and breakdowns, and the matching of jobs to applicants described above, it has been assumed that both item and feature cards are in use - feature cards for the feature questions and item cards for the others. This, of course, need not be the case. A set of punched item cards, or item cards treated in some other way to permit searching, rnay be used, although they wil1 be slow at answering feature questions. When both types of cards, or other vehicles, are in use, the system rnay be known as 'conjugate' or 'complete'. The latter name contrasts with the word 'partial', a complete index being one which answers both types of question (item and feature) by direct reference to an appropnate vehicle, and a partial index being one which calls for searching through al1 vehicles in order to answer one of the two, if not both, types of question. The word 'total' has also been
Figure 32. A typical layout of a conjugate index for a perconnel record.
Input ('raw material')
Feature cards
Strip index
Item cards
Output of retrievals and statistics
Name to number conversion
Output of individual details
used in the Same sense as 'complete', to imply that both item and feature vehicles exist in an installation, as in the phrase 'total data processing'. However, difficulties attend both these names. If we ask whether an index is complete or not, we rnay well be thought to ask if al1 the available information has been entered int0 it; and if we ask whether the processing is total or not, we rnay be thought to ask if all the available information has been handled, to produce some appropnate output. The name 'conjugate' rnay well be the best of the three. It is not in frequent use, though not unknown, and it implies that two or more rather similar things are working together for a common purpose. It is not easy to find a word to mean that only one type of vehicle is in use in an index. Most names, such as 'one-sided' or 'incomplete' carry with them a vaguely disparaging aura, which rnay well be quite unmerited. Perhaps the simple negative, 'inconjugate' is best. Translation and interpretation
Taking a conjugate index made of punched feature and plain item cards, we can see that the reference numbers of the holes in the feature cards could be used as names for the item cards, and the transverse. In practice, however, such reference numbers are not always enough. If we think of a personnel record, the names of the people indexed rnay be more important than their reference numbers, and if item cards are used, carrying details of personnel, these rnay therefore be kept in an alphabetical order of surnames. This was shown in the 'succession of activities' early in chapter 5. Items, like features, rnay have many synonyrns, and the schedules of items and of features which together make up the extensions of the data field in manifold systems have to take care of this. Often, to make it possible to add new terms between any pair of existing terms, strip indexes, described in a later chapter, are used for these schedules. They may, however, often be written down on sheets of paper or the pages of a book, with sufficient space for the occasional insertion. In the case of features, these rnay be listed in alphabetical
order; but for administrative or statistica1 purposes, they are almost always arranged in arrays and given code names or nurnbers. This split between the use of plain-language names and code names is commonly found in indexing work. Thus the personnel records officer will hardly be likely to store his feature vehicles, if he uses these, in alphabetical order of their titles. This will separate male from female, juvenile from adult, foundry from toolroom, when these clearly form arrays or parts of arrays which should be kept together. On the other hand, the plain item cards representing the staff he indexes are quite likely to be in alphabetical order of surnames, so that a number-to-name translation is necessary to gain access to the item information from the feature side. By contrast, a libranan may wel1 be tempted, if he uses a manifold (co-ordinate)index embodied in punched feature cards, to keep the cards representing subject matter in alphabetical order of preferred terms, and may have to arrange for his schedules to embody some translation of the 'orb - see sphere' type in consequence. If he uses a coded arrangement of features, similar features being kept together by the possession of similar code names, then his translations are more likely to take the form 'orb code 125 . . . sphere - code 125 . . .' and so on. In their storage unit, the feature cards will then be kept in code number order. But in either case, he or she is not likely to keep the books in an alphabetical order of their titles. Reference number order, generated as the works arnve, one at a time, will usually suffice. Punched item cards often cal1 for a translation which is placed on the cards themselves. In this case, the cards are said to be 'interpreted'. Thus the holes in the columns of an 80-column tabulating machine card may be interpreted by a machine which prints the characters they signify (if this is the case) at the head of the columns concerned. Edge-notched item cards may, in places, be ready-interpreted, the meanings of the notch positions being printed against them when the cards are manufactured. The Same applies to slotted cards. Punched tape may also be interpreted, dunng the punching operation.
Direct positional indexes
We have already met with direct positional indexes. Their behaviour, as we rnay expect, is similar to that of positional subcodes. Thus, if two codewords whose elements represent direct terms are superimposed in the course of indexing an item, the expected crossovers wil1 arise, and this can be prevented by any effective means of separating them. Most direct positional systems of features guard against this by insisting that the items indexed rnay be allotted one sequence of features only, and therefore, if this is represented by a codeword, no more than a single sequence of characters. The existence of mixed positional codes must not be forgotten. In providing positional direct codes for industrial products, for example, some of the sub-sequences of elements in the code names rnay be dependent, and others rnay not. If the codewords are transformed in any way, then the dependent portions must be moved as blocks. For example, a product made in ten sizes (from O to 9) and in eight colours (from A to H) and of a shape which is one of a number of shapes, each represented by a dependent sequence of three smal1 letters such as dfg, rnay have such codewords as dfg/3/D and D/dfg/3; but d/3/f/D/g is difficult to comprehend. Faceted classifications
It is possible to use the code sequences of the Universa1 Decimal and similar classifications to represent rather specific features, and to use them as titles for punched feature cards which, united, define even more specific items. Such sequences rnay als0 be joined together to form codenames of many characters, to describe complex subject matter. An example is 622.33:622.86 (410)"1945/61":31 which is interpreted as a statistica1summary of accidents in British coal mines from 1945 to 1961. Another form of representing sets of features, usually of subject matter, by related hierarchies, is the faceted classification.In such a
1 1
system, each facet consists of one hierarchy, not necessarily very long, and the main subject of the system - the type of item discussed - is the universe for al1 the hierarchies. In his book on the method, Vickery chooses soils as the universe, the facets being soils according to their constitution, according to their origin, according to their structure, and the like. Examples are peat soil, granitic soil, and so on, and the hierarchies als0 contain such categories as the organisms a soil may contain, the operations which may be performed upon it and the tools which may be used to effect these. As a brief example, laterite comes in its hierarchy below subtropical-and-tropical, which is a main division of soils by climate. The code words which represent paths through each hierarchy begin with a capita1 letter and proceed with smal1letters. Thus, if we encounter GhKovPr, we know we are in the presence of an idea made up of three hierarchies put together in an agreed order: facet G first, then facet K, then facet P. Faceted classifications make use of a hierarchy of hierarchies. Often, in producing positional code words to stand for subject matter, the Same grammatical need which leads to the use of roles in commutative systems arises, and is met by the use of symbols such as colons, dashes and dots between characters. These carry such ideas as 'used for', 'acted on by' and the more general 'related to'. They are a valuable means of tackling the problem when methods of mechanisation (post-co-ordination or post-de-co-ordination) are not to be used, the system being transversive. Such colons, brackets, strokes and other symbols appear in the example of a Universa1 Decimal Classification code number @ven above.
In respect of,each other, items and features are monadic when every combination of the features truly describes the item or items and when every combination of the items, if there are more than one, truly describes the features. The more complex the items, the
more parts they have which cal1 for individual description, and the more specific the features required for the purpose. When sets of features hang together as describing a part of an item, various devices exist to link them in indexing systems; and when grarnrnar has a part to play there are devices for showing the role a given idea plays in a description. Paired and matching systems occur, the first often dealing with things and the events which affect these things, as two connected indexes, and the second dealing with items of different types which, for the purpose in hand, possess the Same set of features. A conjugate system is one with both feature and item provision, usually punched feature and plain item vehicles being used. It is of the normal variety; transversive systems provide both types of approach by giving their items descriptive names under which they are fiied. Direct positional codes may be used to represent features which are then combined on item or on feature cards in the wellknown ways, by compresence of sets of holes or marks in the one case, and of cards in the other. They may als0 be combined in sequences of hierarchies, faceted classifications being examples of this technique.
Generic point
In examining the semantic continuum, we noted the existente of a point, rather like unity in the continuum of rational numbers, below which we encountered subcodes, and above which we met with increasingly speciñc terms. This we called the generic point. To conclude our examination of the continuum, we may see whether we can come a little nearer to finding which sorts of term can be taken as generic. It would seem that a fully generic term cannot be split int0 even more generic elements, as the idea 'boy' can be split. We took the elements of 'boy' to be 'masculine', 'human' and 'juvenile'. We may check this by seeing how a well-known dictionary deals with the matter. The Concise Oxford Dictionary defines 'boy' as 'male child', thus providing US with one of our elements in the form of a synonym for 'masculine'; and it defines 'child' as 'newborn human being', thus coming close to our remaining elements, although 'newborn' covers a smaller range of ages than does our own chosen idea. So far, so good, but how does the dictionary deal with 'human', 'juvenile' and 'masculine' in their turn? It defines them, if this word may be used, by means of synonyms. 'Masculine' is given as meaning 'male'; 'juvenile' is said to mean 'youthful'; 'human' is referred to 'man' (which we may presume embraces woman). Thus we encounter no more splitting int0 genera1ideas. In one case, however, we find a brief description: 'male' is noted as 'of the sex that begets offspring'. If we work out strings of definitions in this way, we encounter, in the end, a word which is explained by means of a synonym that we are supposed already to understand, or by means of a description in terms of structure or function, which itself often depends upon structure.ln picture dictionaries, we find 'ostensive' or pointing-out definition, the pictures acting as means of pointing out the meanings of the words written against them This is kindergarten stuff in many cases 'apple - bun - cat - dog' etc, but a complex and properly labelled diagram carries out the Same function in many specialised sciences.
Let US therefore begin by saying a term is generic when its meaning can be explained only by words which describe a structure, immediately or by way of an effect of that structure. At this generic point, the other ways of conveying the meaning of the term are by giving an example, and by using a synonym we already understand.
?
Structural definition
Here we appeal to what we know about the construction of things. Before examining other types of term, it may be best to consider &e naturally-occurring objects which form the subject-matter of many major sciences, such as atoms, cells, plants, animals, communities. We mav break down a communitv into elements which are individual people, these into organs, these into tissues, these into cells, and the cells into organelles, molecules, atoms, and subatomic particles. If we place objects of these types in order, choosing a position for each which is below any objects of which it forms a part, and above any objects which go to compose it, we shall achieve a reproducible series. So far as our knowledge goes, anyone who follows this rule will produce a list which is closely similar to that made by anyone else following the Same principle. Unlike such arrangements as the Universa1 Decimal Classification, this arrangement will not depend on any one person's, or any committee's, decision as to where terms ought to fit. By following the clue afforded by structural definition as the way to describe a fully-generic term, we can find a place for leaves, onions, polyvinyl chloride molecules, stationmasters, protons and co-ordinating committees. This is rather an area than a specific spot for each, as yet, but it is an area which falls into place amongst others. The idea can be followed further. Obviously the digestive system is more complex than the stomach but less so that the anima1 which possesses it; an atomic nucleus comes between a neutron and an atom. How far can we go? We need some idea of the complete range, and some idea of the stages between each of its major levels. Indeed, we als0 need some way of deciding which
these levels are. Molecules, cells, living beings may fee1 as if these are amongst the major steps in the ascent, but it would be nice to have some reason for it. Integrative levels
Let US start by seeing how far we may go. Below the subatomic particle, what do we find? Certainly there seems to be nothing which we can cal1 physical matter, and yet, in dealing with such things, the physicist calls upon concepts which are highly sophisticated, obviously drawn from the higher reaches of a sub-subatomic series of ideas. These are the abstract concepts of tensor mathematics, of probability theory, of group algebras. If we drop the idea of saying that things are made out of things, and say instead that things may be described in terms of things, this situation can be catered for. Below the subatomic we meet the concepts of higher mathematics, especially those which can be given an interpretation in terms of geometry, space, time. How do we deal with the ideas of geometry? Plane figures, lines, volumes and the like have equations which define them, showing the structures they must possess if they are to be the sort of figures they are. The types of idea which fit into these equations are those of number and set theory. As we proceed downwards, towards simpler and simpler concepts, we find at last mere members of sets, single instances, something suspiciously like a bit or a data unit, single and totally abstract. And after that, nothing - the empty set. This is where we stop. It is possible to arrange the ideas thus encountered in pairs, as follows : members of sets- whole sets lines-geometric figures photons-sub-atomic particles with rest-mass atoms - molecules
organelles - living cells organs of the body - plants and animals departments,sections of communities- whole communities parts of nations- nations The first two lines of this list concern the ideas of mathematics. Next come two lines dealing with the physical and chemica1 sciences; the life sciences, biology and medicine, follow. The last two lines contain the subject matter of sociology, politics, economics and business management. Each line may be taken as the briefest possible statement of the contents of an integrative level a degree of complexity in the build-up of things as we know them. The second idea mentioned in each line, for instance, a set, solid figure, or particle, is in some way the main type of object found at each level. The first idea of each line is a part or organ of that type of object. These integrative levels may be named, for convenience, thus : O : set-theoretic 1 : geometric
2 : subatomic 3 4 5 6 7
molecular cytomechanic biomorphic : communal : national
: : :
The two unusual names in this list, cytomechanic and biomorphic, are chosen to remind US that artefacts must be given a place in any arrangement of this type. Mostly, these occur in the levels between the molecular - containing the substances of which they are made, and the communal - containing the communities which use them.
Formative stages
I
I
EI l
Fi
i"
1I !I1
l"
i, i:
!'i, b
I I
I
4
Many types of object fit between those which have been listed above; the example of the digestive system has already been given. Other examples are production lines in factories, polyribosomes in cells, the ignition system in a motor car, a subset of a set. If we take it that the main independent units of the levels have been correctly chosen, it is clear that there are still many stages between each one of these and the next, and that these must be considered. Further, there still seems to be no clinching reason why the main units of the levels should be taken as rightly selected. An examination of the stages between each might help with this problem as wel1 as provide a place for objects without a home. Let us start by giving a name to the major type of object found at each level. We have already begun to cal1 them units, giving yet another meaning to this word. This action may, perhaps, be justified by reference to the set-theoretic level, where a basic entity has been given this name. We are very familiar with this object. In its capacity as an instance of a set, we have called it a data unit. Such units may be brought together to form assemblies in which an equal part is played by each; the units in the assemblies are commutative, and order does not matter since they al1 act together. Assemblies can be detected at higher levels, too. An example is a group of people conversing, another is the constellation Orion. Another is the set of subscribers to an international treaty which affects al1 in the Same way. However, as we know, assemblies may be set in a context in which order is important. This happens in cascade diagrams and in positional code words. Suppose we name such arrangements by the genera1 name, systems. Immediately we have a clue to the placing of production lines, the ignition system of a car, and other ideas which refer to series of objects. In any set of stages between levels, they may be expected to come above assemblies, which themselves come above units. The question arises, what comes after systems? When systems
I 1
I
are brought together so that they interact, we achieve something which appears to have a degree of feedback or control. It rnay be called a combine; it is not quite complete, but it is nearly so. A subfield in a data field seems to be of this type. A roadworthy chassis of an omnibus, without the bodywork added, rnay als0 be an example. When enough systems have been added to a combine and when every one of its interacting parts are at work, a full unit emerges. This sequence of unit-assembly-system-combine appears to fit twice in each integrative level, once between the lowest object of each level and the main unit - between the department of a Company and the entire Company for instance - and once again between the main unit and the lowest object of the next level. To show which of these two series is intended, we rnay talk of subunits, subassemblies, subsystems and subcombines when the ideas of the lower part of an integrative level are concerned. These steps, from subunit to unit and so upwards again to subunit, rnay be named formative stages. In this sense, a set member appears as the lowest of al1 subunits, a subunit at the level of set theory. The stage after a subcombine is a main unit, and the effect of a subcombine, as a set of interacting systems, now suggests a reason for such a unit possessing its typical properties. It has the complete set of systems needed in order to remain in balance with its surroundings. There is a quality the biologist knows as homeostasis, an ability to heal if not too seriously injured, a capacity for keeping the internal environment constant in the face of changing external circumstances in order to ensure continuing existente. Nations have it, communities such as Trade Unions and industrial Companies have it, and individuals have it, al1 in their own separate ways. The way in which a community retains its corporate identity even though every one of its original members rnay leave is wel1 known. Homeostasis is a property we rnay expect of a main unit at a high integrative level, and at the lower levels we rnay expect something like it, adjusted for non-living matter. Atoms, except those with complete outer shells, do not seem to possess it, which is
l
1
i1
I!
t
11 i
,i
l1 l
i
I,
1
11 I
iI
i/
':
ii j1
l
11
:
why they assemble themselves with others, sometimes to make very complex molecules. At a lower level still, a cube lacking one side seems to be incomplete, and cries out for the remaining side to be added. Our whole ability to fill in the missing places in a pattem, to recognise shadow letters and so on, seems connected with the effect. In contrast with units, subunits do not have the homeostatic property, or at least, they do not have it to the Same degree. On the other hand, they show a greater functional effect, like flowers, leaves, personnel departments, livers, kidneys, departments of state and mitochondria, they carry out a job for the unit of which they are a part. At the levels of life this often appears purposive; at the lower levels it merely shows as the provision of a special quality or effect. Design engineers spend much of their hardest thought on finding the shape which will produce the effect they seek. Objects and substances
So far, we have considered only such objects as possess some sort of recognisable outline or structure, the structure being more prominent in the case of the abstractions of set theoretic level and of the communities and nations above the level of living beings. We know, however, that many objects become aggregated into substances. There is, to be sure, no word for a substance composed of teacups, but copper atoms make copper, cells make up an organism, and we are familiar with such substances as sand, nitric acid, flour, grain, cough medicine, bone, brick, coal, air, glass and plywood. These are collections of large numbers of objects which are considered without any particular boundaries being attached to them, although they may possess quite complicated internal structures. It is possible to start each integrative level with an object and a substance made of many examples of it, these ideas lying side by side. Later, when we study the behaviour of relations, a reason for this will appear. This is not surprising, for al1 structure is concerned with the relations that things have to each other, and we may
expect, in the long run, that the pattern of the holotheme will be based on the behaviour of sets of relations. Relations, indeed, will prove to be closely concerned with the ideas of units, assemblies, systems and combines. However, let us leave that part of our study for later consideration, and return to the terms which we know to embody the relations. When the objects and substances are considered together, it seems that, where objects are built up int0 more complex objects, substances are broken down. They first become diluted: corresponding to assemblies of objects we find mixtures of substances. Then they become separated, in one dimension to begin with, as with plywood, or the laminations of sedimentary rock, and then in two. Finally, separated from other substances in al1 three dimensions, they appear as a different type of whole unit, one which has been made by a breaking-down process instead of by construction. This type of unit is the starting-point of the engineer, the toolmaker, and it brings artefacts into the pattern without dficulty. Semantic types
Objects and substances may now be grouped under the name: things. Obviously, we possess words representing many other types of term besides these. For example, there are those ideas we call qualities. These include al1 those assessibles and measurables listed in chapter 4 as forming arrays of the dimensional type. Things, of one sort or another, may be blue, honest, wrinkled, rectangular, or hot. Then, there are the types of term we may call occurrences. Examples are wind, fire, dance, earthquakes, meiosis, declarations of intent, performances of I1 Trovatore. Qualities, too, may change, and these changes include movement, fading, deformation and the like, which are alteration of position, colour and shape respectively. These four semantic types (there are others) appear to be divisible into these which are passive, such as things and qualities, and those which are active, such as occurrences and changes. They are also divisibleint0 those which are entitive, such as things and occurrences,
and those which are attributive, such as qualities and changes. If we add the distinction between terms and relations, we rnay come up with eight semantic types al1 told: passive entitive terms (things) and passive entitive relations passive attributive terms (qualities) and passive attributive relations active entitive terms (occurrences) and active entitive relations active attributive terrns (changes) and active attributive relations
As we shall later see, the entitive relations, passive and active, are those which obtain between sets, and the attributive ones are those which obtain between numbers. The question arises, how does the distinction between object and substance, as major types of thing, appear when transferred to the realm of qualities? The answer is that it appears as the difference between qualities, such as position, which are not divisible, and those, like distance, which are. We cannot, for instance, be at half the North Pole, but we can be at half-way from our starting point to the North Pole, granted good weather and a suitable means of transport. Heat and temperature are a similar pair, so are angle and onentation. Indivisible qualities are properties of objects; divisible ones are properties of substances, including the substances of which the objects are composed. A similar result is found when occurrences and changes are considered. Occurrences rnay be events, single and indivisible, or they rnay be processes, continuous and capable of division. Changes rnay als0 be discrete or continuous. Things, qualities, occurrences and changes appear throughout the set of integrative levels. Further, the qualities of the lower levels persist into those which are higher. An example is mass: no set-theoretica1 or geometrical concept has this, but once it appears at subatomic level, it remains. Things of al1 higher levels possess it. Belligerence, on the other hand, anses later. People rnay be fond of a fight, and so rnay communities and nations, but molecules are
not. Another interesting effect of qualities is the way in which they arise, quite often, from occurrences. Thus colour, in a substance, arises from the wavelength of the photons its molecules emit, and temperature is a result of the vibrations of molecules. The honesty of a community may be measured as the frequency with which its members tel1 the truth when given the opportunity. Mass, length, time and energy
One of the more interesting relationships in the holotheme concerns the dimensions used in physical equations, that is, mass, length, time and energy. Any one of these can be defined in terms of the other three. For example, energy is mass times length-squared divided by time-squared. The Same effect appears in the case of things, qualities, occurrences and changes. Thus the occurrence we know as wind is air (the thing) moving (the change) its position (the quality). The pattern is, that any three of these types of term can be used to define the fourth. In passing, we may note a relationship between mass and things (at least at subatomic level and above), a relationship between length and qualities, which may be measured on scales, a relationship between time and change, and a relationship between energy and occurrences. Analogies must not be pressed too far (for instance, we can talk of a time-scale, referring time to a length) but nevertheless they may often suggest new lines of enquiry. Collectives
Many names in daily use collect terms of one sort by the use of terms of a different sort. For example, such names as 'antiseptics', 'surfactants', 'bakers', 'managers', 'reptiles' and 'refrigerators' collect things according to an occurrence in which they take an active part. 'Plastics' collect things according to a quality, 'bipeds' according to a thing, or perhaps two things. But enough has been said to show the countryside through which we travel when we
explore the holotheme. There are at least two good reasons for the journey: the pure interest of it, and its value in application. If we can find out with which type of term we deal, we shall have a good idea of how it rnay be expected to behave, how it rnay join with others, which neighbours it rnay have; and if we are making a hierarchy or a schedule this rnay give US a better idea of where to leave gaps for terms which we have not yet encountered. When dealing with subject matter (for example, 'dimensions of steel wire for cables for suspension bridges') it is possible to place each idea at its appropriate level and then to indicate which level is most important, so that, in a manifold or co-ordinate index, we do not find this article about steel wire when we are looking for information about suspension bridges as a whole. Relations
In our study of the holotheme, we have concentrated upon terms. The relations, whether passive or active, and whether between entities or attributes, have been mentioned only in passing. Every level of the pattern of meaning has its own appropriate types of relation, simple or complex. We have met union, intersection, overlap, and at higher levels we find, for instance, parenthood (a relation between biomorphs), monopoly (a relation of communal level) and economic federation (a relation between countries). The set of semantic types includes relations as wel1 as terms. As a rule, an active relation rnay take effect only when the appropriate passive relation, known as the condition for it, obtains. The active relations rnay be called 'operations'. For example: to gain a result other than the production of the empty set,, from the operation of intersection, this operation must be carried out on sets which are in the condition of overlap. The data field, as an interlocking collection of sets, is one of the objects of the lowest integrative level, the most abstract. As a result of its abstraction it achieves a generality which makes it a valuable study. Its behaviour is repeated at the higher levels, in which the
terms are very different from those of the lowest, but in which the relations, however complex, follow a pattem set in the field itself. For this reason it can be used to represent the higher terms, those of everyday life. Data study is concerned with its use in this way. Summary
The semantic continuum runs from parts of terms through generic terms to highly specific terms. It is helpful to be able to tel1 which terms are as generic as it is possible to be, and this can be done with the aid of a study of the holotheme or pattem of meaning. This is the set of al1 the terms we may ever be called on to consider, from which, as a genera1 rule, we select the contents of our data fields. Amongst the ways in which the concepts which appear in the holotheme may be arranged is one in which they are divided into terms and relations. Each of these may be passive or active, and entitive or attributive. Thus a quality is a passive attributive term. The main types of term made clear in this way may be named things, qualities, occurrences and changes. Terms occupy a range of integrative levels, degrees of structural complexity, of which two are found in the abstráct realm of logic and mathematics, two in that of physics and chemistry, two in that of the life sciences, and two in that of the social sciences. Terms of lower levels build up into terms of higher levels by a series of welldefined formative stages, which appear in a repetitive pattern. The main stages are those of unit, assembly, system and combine, a sequence which is repeated twice in each level. In this pattem, objects and substances, events and processes, collective concepts of al1 types, and many other special varieties of term al1 find a place. The study of the holotheme is the study of these, and of how they may be expected to behave. Its application to data study is immediate, since the way a term is represented on data vehicles must take account of its type if the index concerned is to operate at its full efficiency.
Embodiment of the state of a data field
1 /i;[
!d
i; I?
It is time to tum to the way in which the relations between terms admitted to the data field are shown. A typical method is to punch a hole in a card, representing a present data unit. Tabs, coloured signals, handwriting, pnnting and typing, and many other methods may also be employed. An important difference between types of data handling equipment is that between methods which represent the items or features of the field by means of separate data vehicles, and methods which record them on connected vehicles, such as a single length of microfilm, magnetic tape, or punched paper tape. The difference affects the way in which searching and comparison is carried out. Connected vehicles cannot be shuffled; they can only be sequentially scanned.
111
1 l hi
Y'
i1 l1
11
i\
/g
i
l
Separate and connected vehicles
As an example of the effect of this difference, we may consider a feature question asked of a connected feature vehicle. On this the features appear in order (it might be a length of magnetic film or tape) and each length representing a feature bears a pattern of bits which tel1 US the items possessing that characteristic. Various ways of tackling the problem may be found, but clearly they must al1 of them consist of running dong the vehicle until al1 the features in the question have been encountered, so that some means of comparing them may then be employed. Suppose, for instance, we travel to the first feature, record its items on some suitable device, and carry the resulting pattem forward to the next feature. The next feature is then used to modify the pattern, taking away from it those items which do not appear against this second feature also. The modified pattern then carries on til1 the third feature is encountered. In a case such as this, the list of items which suit our needs does not become available until the last feature, usually wel1 along the
vehicle, has been studied. We begin with an unknown, the pattern of items to be placed on the search device. By contrast, if the vehicle were arranged item by item, we should be able to load the search device with a known pattern, the features required, which would have the additional advantage of being smaller, possibly much smaller, than the pattern of items required in a feature system. Further, information would begin to come off the system almost imrnediately, in fact as soon as the first item bearing the requisite pattern of features was found. In essence, the problem would be no different from that of a serial search through separate item vehicles. It is, perhaps, not surprising that when computers have been loaded with information in connected feature order the results have been disappointing, and a reversion to item order has led to an improvement in the speed and ease of working. Random access
Information is by no means always stored in computers in this connected or serial form. Random access devices, bringing any portion of a large collection of information to hand as easily as any other portion, are widely used. This corresponds to the existente of separate vehicles in manual systems. We can pull any card out of a pack in much the Same time as we can pull any other, providing we know its name and it is kept in the right place in accordance with that name. Separate cards permit random access, as a genera1 rule, and to such devices the rest of this chapter turns its attention. Cards
Millions of cards are made yearly out of pasteboard and other card stock, often of material specially formulated for the purpose, to give good qualities of dimensional stability, resistance to scuffing, and ink reception, to choose few out of many. The name arose from the matenal (khartes meant papyrus leaf in ancient Greek),
but it.does not seem unusual to talk about plastic or even metal cards. Here we shall use the term in its widest meaning, to refer to any thin but not floppy sheet of material which can be punched or marked in other ways in order to carry information, and we shall think of such a sheet as rectangular and not too large for ease of handling, although neither of these requirements is absolute. We shall distinguish cards by the type of meaning they represent, and by whether they are punched or plain, this last term meaning that any method of marking them, other than holes or notches, is in use. One exception will be made to this: opaque material bearing transparent areas in place of holes will count as punched. This may not be exactly fair, but it is handy. Edge-notched cards
Edge-notched cards are specially designed to represent items. Round their edges they carry one or more rows of holes, each representing a feature or subfeature, and these are converted int0 notches by means of a clipper if the item concerned possesses the feature represented. To find al1 items possessing a given feature, the user passes a needle through the entire pack, or, if it is too large for this, through several hundred cards taken from the pack, and then through further sub-packs as necessary. The needle is inserted in the position occupied by the appropriate feature, and is then lifted. As a result, cards in which the hole has been converted int0 a notch drop from it, while the remainder stay in place. Taking the cards which fa11 corresponds to choosing the items possessing the present feature; taking those which stay on the needle corresponds to choosing those with the absent feature, which may be viewed as choosing a 'negative' feature. If two or more features are represented by holes along the same edge of the card, two or more needles may be used, and only the cards bearing notches in al1 positions interrogated will fall. Thus, simultaneous selection of items, from collections of several hundred at a time, is possible. Other advantages of the method are
I
I
Figure 33. The method of searching through a small pack of edge-notcheditem cards in order to find those representing items possessing a particular set of features. Those possessing the required features are shown dropping from the needle.
its extreme simplicity, cheapness and portability, and the fact that drawings, diagrams, and other artwork, and extensive written material as well, can, if required, be put in the cefLtre of the cards. Immediately the search is completed, the items conmrned, or cards giving details of them, are to hand. On the other hand, the method has certain disadvantages. Simultaneousinterrogation of more than one feature is not possible if these are sited on different edges of the card. Once the cards have fallen off the needle, they may well become out of order, and must be replaced in position by hand, unless order does not matter. The replacing may consist of physically parting the remainder of the pack at the right point and
Figure 34. Below An example of a slotted card lying on its side. The presence of a feature is shown by the existence of a slot in the appropriate position. Figure 35. Right A pack of slotted cards in their cradle after selection. The cards representing items possessing the features of the search are shown upstanding.
replacing the appropriate fallen card. It is also possible, however, to sort the cards back into order by a series of needlings-andfallings, provided that the names or reference numbers of the items the cards represent have been notched into them for the purpose. The maximum number of items which can be tackled at a time is in the region of seven to eight hundred. One corner of cards of this type is usually cut, to ensure that the cards in the pack are all the right way round before a search begins. A glance at the appropriate corner is enough to tell their user whether they are correctly orientated or not. If more than one row of holes is placed round the edges of the cards, both deep and shallow notching is possible, giving a greater capacity in features. The deep notches obliterate any shallow notches, and the coding must take this into account. Often the cards are interpreted in advance, the meanings of the holes being printed against them at an angle, usually a right angle to the edge in question.
Slotted cards Slotted cards, again, are used as item cards. They are frequently of a stiffer board than most other cards. They bear a rectangular pattern of holes which takes up most of their central space, leaving an area at the top in which details of the item each represents may be written. Such cards are usually manipulated in a cradle in which, if there are not too many, they may also be stored; otherwise they are kept in trays. Instead of them being notched, a slot is made between two of the holes, connecting these, to signify that a feature is possessed by the item concerned. The slots all run in the same direction, almost always vertically, and may at times run into each other to form quite a long channel in the card. The names of the features may often be printed between the holes whose connection signifies its possession. To carry out a search, the cards are placed in the cradle and held in place by rods which pass through its front plate, through the
pack. and through its back plate. There are normally two such rods, one at each side, and they pass through slots, rather than holes, so that they do not interfere with vertical card movement during selection. When the cards are thus secure, the needles for the selection are passed through them so that each goes through the upper of one of the pairs of holes which correspond to the features of the search. A copy of the card is usually placed on the front plate, to make it easy to find the correct position for the needles. The cradle, with its cards, is now inverted. Were it not for the needles, the cards would drop out. As it is, they fall only a little way - the length of a slot - and only those cards which are slotted in every position in which there is a needle do this. The operation is completed by slipping yet another needle into place to prevent the cards from returning when the cradle is brought right way up again. The return of the cradle to its original position then leaves the required cards upstanding from amongst the rest. By this means, cards cannot fall fully out of the pack and so become out of order, and because all the slots run in one direction it is possible to select the cards by means of any of the indexed features. There is no problem caused by features being represented on different edges. Against these advantages over edge-notched cards we may set the more complicated equipment required, although fairly complicated equipment may be helpful also for handling edge-notched cards, notably electrically-powered vibration boxes to make the cards fall more easily. Tabulating machine cards
If numbers used are any guide, then tabulating machine cards, tab cards, 80-column cards (they have many names) are more important than any other type of card used for information handling. Not all tab cards have 80 columns in which punching may be done. There are half-size, 40-column, cards, and also others with rather larger numbers of columns than eighty. The 80-column variety, however, is by far the most widespread. It is used for accountancy
I
purposes, for many large-scale sorting, counting and tabulating jobs, and as a means of feeding information into computers. The standard 80-column card begins life as a plain, that is, unpunched, rectangle, usually corner-cut and printed with an array of numbers. To prepare it for use, it is punched, normally by a key-punch, but other methods are available. For example, it is possible to obtain pre-scored cards, in which the positions which may be occupied by holes are already cut almost through, so that a simple push with a blunt rod will remove the chads and produce the holes. Cards of this type are almost always item cards, but they have been adapted to act as feature cards with a rather small field, usually 960 positions or less, corresponding to the 80 columns and to twelve rows or positions in each column. These twelve rows are numbered, as are the columns, and may be used to signify numerals from 0 to 9, with the remaining two rows acting for ten and eleven in cases in which this is helpful, as with inches, pence and hours - a facility less useful when the metric system is employed. Letters of the alphabet are usually digraph-coded, two holes per letter, one letter per column. In terms of the semantic continuum, the letters, numerals and other characters are shown by means of a binary subcode which occupies the columns; sets of columns, often known as 'card fields', may then represent the codewords of graphic subcodes, or the characters themselves may be direct codenames for terms, so that, in effect, the terms are binary-subcoded, one per column. Direct coding, one term per hole-position, is also possible. The equipment available for handling such cards includes punches of many kinds, verifiers, interpreters, sorters, tabulators, collators, reproducers, calculators, computer interfaces and printers, and there are also devices for photocopying information written upon the cards, and for inserting microfilm into apertures made in them. They are indeed extremely versatile, although, as an item card means of answering feature questions, they call for serial searching none the less. Eighty-column cards, like other cards, may be interpreted in
Figure 36. An 80-column card (shown enlarged) punched with the alphabet from A to Z and the numerals from 0 to 9. The first is a digraph and the second a monograph, and both are binary.
advance, at least inasmuch as the names of the arrays of features represented on them may be printed at the heads of the appropriate card fields. The interpreters mentioned above, however, are not for this purpose, but to show which specific feature in a given array is the one possessed by the item concerned. Thus, if eolumns 23,24 and 25 form a card field entitled 'coloured', and if 135 means 'red', then these columns will bear a 1, a 9 and a 5 respectively in the case of any red items, and an interpreter, if used, will print 295' at their collective head. Verification consists of passing a card through a device known as a verifier, and depressing keys which *rise the holes already punched. The machine givm a warning when the user of the verifier strikes a key to which no hole corresponds, or omits to strike in the cases in which a hole is available to be sensed. When the warning is given, either the verification puncher or the original operative has made a mistake. Recourse may then be had to the original data, so as to choose between them. Since the chance of two people making the same mistake is low, verification is a valuable safeguard against error. In passing, we may note that the verification techniques
Figure 37. An 80-column card sorting machine. The cards are fed into the machine from the hopper, long-edge on, and fall into receptacles beneath the sensing heads, according to the holes appearing in them. At the end is a 'reject' hopper for unselected cards. It is possible to count the number of cards falling into each hopper, and thus to obtain a total.
available for use with punched feature cards are very different, although this one may also be used. In the case of the punched feature card, a 'self-verifying' quality is apparent within the arrays of features. This arises because any stack of such cards representing mutually exclusive features ought to show no holes. A rapid check on accuracy is therefore afforded by stacking instead of by repeating a punching operation. As we would expect, cards of this type are of especial value when item questions are asked or item-operations are performed. An example of the latter is the totalling of monetary values, for example the values of sales, attributed to specific items, for example purchasers, followed by the preparation of invoices or other documents directed to the item concerned. Accounts and wages departments are especially familiar with 80-column cards. Coded microfilm cards Recalling that we decided to treat cards bearing transparent places instead of holes as if they were punched, we may turn to these, miniature cards made of photographic film. The written or printed information upon them is photographically reduced. They may bear small replicas of many pages of a document, for example, or of a number of engineering drawings. Such pieces of microfilm can, as we have already seen, be inserted in other cards, and sometimes under the name of microfiches they may be treated as item cards into which no search mechanism is built. The card of which we speak here carries such a mechanism, however : a pattern of black (opaque) and transparent squares or rectangles, which represents their features. Sensing devices for such coded cards use light, which passes through the transparent places. Besides a serial search device, such an installation calls for a reader (an enlarger) and perhaps for a copier, to enable prints to be taken from the enlarged image. These cards share with slotted and edge-notched cards the quality of being speciallydesigned for search and selection. Here they contrast with 80-column cards, whose uses are wider.
Figure 38. Edge-tracked cards. Here the track is clearly shown, and the punching (like that of punched tape) is visible on either side of this track.
Edge-tracked cards
As a last example of punched item cards we may take edge-tracked cards, which are designed for use with an automatic typewriter. These bear a line of sprocket-holes, acting as guide holes, punched parallel to one, and perhaps to both, of their longer edges, permitting a punch of the sort which also produces punched paper tape to work upon them. The cards are fed into such a punch, and holes representing characters are punched into them in 'channels' parallel to the sprocket holes. A five-channel card has three channels on one side of the sprocket holes and two on the other, and the pattern of holes which represents a character appears across the five. The punch is usually operated by a typewriter keyboard and, in reverse, the cards can operate a typewriter fitted with a reading head. Usually both punch and reading head are fitted to the same machine. Instructions as to which portions of the punchmg are to be printed out can be fed into the machine by means of punched tape or by instructions incorporated in the punching on the cards. In contrast to edge-notched and slotted cards, cards of this type are not designed for searching or selection. They carry item informa-
Figure 39. A clipper for edge-notched cards. The picture shows an edge-notched card being prepared for use. The clipper converts holes into notches, thus indicating the presence of the features or subfeatures allotted to the positions of the holes concerned.
tion, which can be reproduced in various ways, and the punching permits instructions to be given. Microfilm can be inserted in them. They can form a very effective item side to a system whose feature questions are answered by means of punched feature cards.
1
Other punched item cards Cards which are essentially punched item cards occur in many another context. For instance, those which are punched in order to instruct a Jacquard loom in its duties, producing a pattern in the cloth it weaves, are item cards. Even a punched tram or bus ticket may be thought of as such. This takes us well away from the kernel of our subject, however, and it may be best to return to it and to consider item cards of the plain varietv.
Figure 40. A keypunch for 80-column cards. The keys can be seen on top of the punch, and below them is the carriage on which the card is fed beneath them. The card is punched column by column, being fed under the keys short-edge on.
Blind card indexes Perhaps the simplest plain item cards are of the blind-filed type, which are rectangular and feint-ruled or blank, kept with many others in a drawer of a card-cabinet or in a box. Subject, author and similar indexes in public libraries make extensive use of this type of card, and the number of research students and others who have entered their notes on such data vehicles must be very large. Dividers of a stiffer material, with projecting tabs, blank or bearing
letters or numerals, are available, to separate the various sections of a set of such cards. Cards cut to the continental 'A' sizes can be obtained; other standard sizes are 5 inches by 3 inches, 6 inches by 4 inches, and 8 inches bv 5 inches. Such cards normall; stand free in their boxes or cabinets, although fixed cards are also found. These are cut, at the foot or along one edge, with a dovetail, tee-notch, or other anchorage, by which they may be attached to a rod or other holding device. Typically, in a cabinet drawer, such a rod runs from the front to the back along the bottom of the drawer,. In these circumstances, cards cannot be removed from their position accidentally. Cardwheels may also be obtained, acting instead of drawers; their axes may be horizontal or vertical, and in either case the required cards are brought into view by a rotation of the wheel. Since the hub of a wheel has a smaller circumference than does the perimeter, space is automatically available for the cards to fall apart into a vee, making them easy to read even though they are fixed in place. The normal blind-filed card can be so closely packed between its neighbours that it is extremely hard to consult. The idea of bringing a card from its storage position to a position close to its user can be applied on a much larger scale than that of the card wheel. Devices moving many drawers of cards by means of electric motors can be bought. These handle very large numbers of blind-filed cards without calling for the operator to leave his or her workplace. Flat-tray visible cards
A number of devices have been developed to make it possible to read the title and some other information carried on a card without removing that card from its storage position. Mostly, these depend on overlapping the cards, so that the details in question may be read from the visible edge. In the case of flat-tray visible cards, the edge is at the foot of the cards. These are kept in shallow drawers, in a cabinet of such, and
F i ~ u r e41. A punched feature card driI1. In thls device, the cards are stackcd 'lnd the drill is then set accurately in place in two dimensions. Drllls of this type can be instructed autornatlcally to move from posltion to position. They are at t he~rmost efficient when large numbers of identical feature cards are required.
each drawer can be pulled out and hinged downwards to rest on a baseplate, which is drawn forward as required. It lies at an angle suitable for reading and for entering further information on the cards. In each shallow tray, each card, except for the top one, projects a little below its higher neighbour. On this projecting edge, names may be typed or written, and coloured and shaped signals may be placed, representingvarious features of the item represented by the card. Combinations of shape, colour and position of signals are available, and a search for all items of a given sort consists of a
Figure 42. Vertical visible cards in a tub. Both item and feature cards may be stored in this way.
F i p r e 43. A flat-tray visible card cabinet. each drawer or tray contains many overlapping cards, on whose visible edges features are recorded by means of coloured tabs. Searching is serial down the sets of cards. The cards may also be kept in classified order.
ere
serial, visual, run down the column of overlapped cards to find the appropriate collection of tabs. Provision is made for the cards which overlie any card required to be lifted up, to allow the retrieved card to be inspected. In equipment of this type, the cards are fixed in their positions in the trays. The cards are often stored according to a simple hierarchy. For example they may be used in a set of personnel records, arranged in name order within occupations, within departments. Block searching is thus made easier. Vertical visible cards
Another form of visible-edge card has its visibility down the side. It is known as 'vertical visible' because it stands, almost upright, leaning against a movable flap in company with its peers. Such cards overlap each other from left to right or from right to left, and together with the flaps or dividers against which they lean, stand in a well, tub, or tray which has rods running across its floor from front to back. Tabs cut along the foot of the cards mesh with these rods, and hold the cards in place, but otherwise the cards stand freely and may be lifted out very easily. If one of the flaps is tilted forward, another flap, supporting another row of cards, is revealed behind it. The cards are usually corner-cut, and their names are written diagonally on the corners. They do not act as a search mechanism, since it is not easy to see to the bottom of the open vee, to which their visible edges descend. But for other purposes, such as conveying straightforward item information, they are very useful. They may be edge-tracked, thus putting them in the realm of punched cards, they may be given pockets to carry ancillary documents, they may be filed in pairs, and they may be given many other forms of special treatment. This type of storage may be noted in passing as one of the major ways in which punched feature cards may be kept, the other being blind filing, assisted by position tabs. Vertical visible cards carry many types of information, for example, they may act as ledger cards in accountancy, as patient cards in medical record offices, and as customer records for car sales.
Hanging cards
A card may also be given a visible edge at the top, the only place not yet considered. One method of doing this is to hang the card from a pair of rails, much as folders may be hung in a filing cabinet. The top of the card, or the device which connects it to the rails and to which it is fastened, may be bent at an angle or even to a completely horizontal position, and may bear appropriate names and feature signals. The technique is very similar to that of the hanging folder which has a visible strip bearing such names and signals and which itself can be thought of as an item vehcle. Other plain item cards
Amongst the many other plain item cards which may be used for indexing and information handling we may note mark-sensed cards, which may be the same size and shape as 80-column cards and may be fed through machines which detect pencil or other marks upon their surfaces. Mark-sensing can, of course, be combined with punching. A special type of mark-sensing, characterrecognition, is carried out when the figures printed on cheques by means of magnetic ink are read by appropriate machinery. Here, in contrast to our more general use of the word 'mark', the idea of sensing a hole, as a form of mark, is excluded. Magnetic inks and similar substances may be used on plain item cards in other ways. It is possible to obtain cards, for accountancy and other purposes, on which a magnetic strip is set, and this may be read in much the same way that magnetic tape is read. Such magnetic cards carry one or more channels or tracks, and may also provide room for typed or written interpretation of the information recorded on these, and for other item material. Microfiches, already mentioned in passing, have their place here : each is a card, on photographic film stock, which may bear the images of very many pages of a book or other document. Many other vehicles are plain item cards. As an example we may
Figure 44. Hanging cards. Here the cards (plain item cards) are suspended from a frame. They may be provided with many forms of signal to show the features of the items concerned.
R"" i
take addressing-machine plates or stencils, which may be tabbed so that a machine may select which addresses to print according to the features possessed by the owner of the address concerned. If we turn our attention to sheets of paper, we find plain item vehicles in profusion - invoices, works orders, petty cash vouchers; the game of recognising them can be played for days. One which deserves mention is the type which may be bound into the pages of a book, so that opening the pages displays sets of small sheets, overlapping each other vertically, like flat-tray visible cards in book form.
Figure 45. Top right A desk punch for feature cards. This simple device consists of a sleeve which comes down when the lever is depressed, and grips the card or stack of cards to be punched. If the operative sees the cards to be in the correct position, further pressure is put on the lever, and the striker then comes down through the sleeve. On retraction, the sleeve remains down, stripping the card from the striker. Figure 46. Bottom righi A small office preparing punched feature cards.
Punched feature cards
We have already mentioned that many types of punched item card may be used as punched feature cards, but on the whole this is not satisfactory, since the cards have been designed for use as the representatives of items. A feature card is designed for the transverse purpose. If it is to be punched, it bears upon its face a network of numbered positions, normally clearly printed, but sometimes invisible, revealed only when a network printed on a transparency is placed upon it. By and large, the first of these types is best and saves an operation when the user seeks to find the reference numbers of any set of holes in which he or she is interested. It calls, however, for printing of great exactitude. Punched feature cards include cards printed in black on transparent sheets with the transparent portions acting as the holes. Two forms of reading punched feature cards by eye are available. When the item capacity of the cards is large, the holes are usually small, and the stacks of cards may be placed on a light box or viewing frame. When this is done, points of light show the positions of the through holes, and dim glows indicate the items which possess some, but not all, of the features in the stack. With cards of smaller capacities, the holes are larger and may be read by being placed against a dark background. In this case, they are read by reflected, as opposed to transmitted, light, and holes which pass through one, but not a second, card may be shown by inserting a coloured transparency between the two, with the second card below the colour. The items possessing the top but not the lower feature then appear as coloured discs. Other changes of colour can be obtained by using more than one tinted transparency, providing the answers to more complicated questions. Feature cards may be punched, or the holes in them may be produced by means of a drill. When a drill is used, it is more usual to stack all the cards which describe an item, and to drill through the entire stack in that item's position. When a punch is employed, there is a tendency to accrete the information about which holes should be
placed in each card, and then to punch each card separately and fully. In the first case, there is considerable stacking time; one card may be put away and taken out again to be placed in the next stack many times, but there is less expenditure of time in producing the holes, since many cards can be drilled at once. In the second case, stack and put-away time is reduced greatly, but each hole is separately made. The advantages of the punch are greatest when a system is compact, or high-density (many items per feature); they become less as the system becomes more diffuse, or low-density (fewer items per feature). Punches and drills are available at all levels of complexity from that of the simplest hand-held device to that of the electrically-powered machine with two-dimensional movement, click stops, illuminated reference to position punched, and so on. By and large, only two types of storage are used for punched feature cards: vertical visible, and blind filing. The blind-filed cards, particularly those supplied with the most complex equipment, are provided with coloured and numbered tabs in many positions, so that they may be put away quickly after drilling in any other, and may be found again as rapidly as possible. Vertical visible storage depends for its efficiency on maintaining a proper storage order, and is more frequently used with systems in which the card is punched over a large portion of its surface at one sitting. In this latter case, the punching instructions may be taken from a transfer sheet, which is, in effect, a portion of the data field itself, with the items along one edge, usually the side, and the features along the other. Ticks or characters where the items and features intersect show the data units to be recorded. Capacities of feature cards run from as low as 200 items to as high as 10,000 per card, and higher. Card-reading machines exist, which can identify or count the holes in such cards, or both. These can then perform calculations on the numbers they obtain, or feed the patterns of holes encountered into a computer, or take other action which puts feature cards into the mainstream of electronic data processing, to which they are the newest addition.
Plain feature cards
Plain feature cards are uncommon. They generally consist of lists of reference numbers to items possessing the feature they represent, and the comparison of features is not easy if they are employed for this purpose, since the cards must be placed side by side and the eye must continually switch from one to another. The reference numbers are often printed in ten columns, according to the last digit of the numbers rather than the first, this being felt to be an aid to searching. Cards so treated are often known as uniterm cards. Two advantages are claimed for the method. The cards can index large collections in a small space because no room on them is dedicated to items which do not possess the feature. they represent (in a punched feature card a blank position shows this, except in any part of the card which is not yet used). No equipment is needed in order to prepare or to employ the cards, beyond the cards themselves and a writing instrument. Other feature vehicles
Many other feature vehicles exist, often as separate lists: people to be invited to the Christmas Party, people who are members of the Tennis Club, contents of the best bedroom, contents of grandfather's writing desk, list of locomotives spotted at Cambridge Station. As with item vehicles, the game of finding new examples can go on for a very long time. Strip index
Any vehicle can represent any type of meaning, although most are designed with one or other of the main varieties in view. A form of vehicle which is widespread in use, but which we have not yet considered, is the strip from a strip index. Typically, this may be four, six or eight inches wide, and one-quarter to one-sixth of an inch high, being made of thin wood with a paper surface. Looking
Figure 47. A strip index, showing strips on hanging wall panels. Other forms of strip index are desk-standing.
rather like small roller-blinds or the covers of old-fashioned roll-top desks, these are supplied assembled on a backing sheet from which they can be taken. The set, on its backing, is put into a typewriter, and the relevant information is typed upon each strip. The strips can then be separated from each other and placed into a frame which holds them in place in such a way that they can be rearranged easily, permitting new strips to be inserted at any time, and old ones to be taken away. Sets of such frames, on stands or wall fixings, and usually double-sided, hold lists of current customers, current telephone numbers, current members. They may list items against their reference numbers, features against their code numbers, words of any type against any of their synonyms, and are thus a typical means of translation. Other forms of strip index use paper, cut with ears or other means of fixture to neighbouring strips or to a backing such as a page of a book. Like plain blind index cards and other simple tags and vehicles, they are undifferentiated, appropriate to a wide variety of uses. Hierarchies may be shown by means of them, each stage being shown by an additional indentation, the writing starting a few positions further to the right. They can be of considerable use in the work of arranging things into classes because the names of items or features can be typed on them, and experimentally grouped in many different ways. Summary
Items and features may be represented by many types of data vehicle, and these may be separate, one per term, or connected. Separate vehicles have the advantage that they can be arranged in any order. They include edge-notched cards, slotted cards, 80column and other tabulating machine cards, microcards, edgetracked cards, blind index cards, cards for cardwheels, cards designed for flat-tray visible or vertical visible filing, hanging cards, microfiches, cards carrying magnetic strips, and many others, all of which may be used to represent items. Most can also be used to
represent features, but usually feature cards are specially designed for this purpose. Like item cards, feature cards may be punched or plain. Strips for strip indexes form yet another means of representing meanings, and may often be used for giving cross-references or for translation.
A sectional data vehicle represents some, but not all, of what is recorded about an item or a feature. As an example we may take an index of items represented by 80-column cards, assuming that there is not sufficient room on a single card for all the information to be dealt with. In this case, several cards must be used to represent a single item, and each must record. some standard data, even if only a name or a reference number, to show to which item it refers. Each must also possess an individual reference number to show whether it is the first, second, or third (and so on) of the cards representing the item. Such 'continuation vehicles' were mentioned at the start of chapter 6. Situations like this make comparisons between punched item and punched feature card systems difficult. We may have a data field of 200 features and 1,000items, but this does not mean that our choice lies between 200 punched feature cards and 1,000 punched item cards for selection purposes. Let us suppose that the capacity of the item cards is 120 features, and that we may ask item questions as well as feature questions, so that if we adopt punched feature cards for searches, we shall still need 1,000 plain item cards to tackle the questions about specified items. Then our comparison must be between 1,000 plain item plus 200 punched feature cards, on the one hand, and 2,000 punched item cards on the other, assuming these to be interpreted so that we may read them easily to answer item questions. Things become even more complex when the features must be subcoded on the item cards but may remain directly represented in the feature system, or the transverse. Feature cards may be sectional, also, though the further step mentioned above, subcoding the transverse meanings (the items) upon them, is seldom taken. The point to be made here is that if we run sectional vehcles together, we have not made a connected vehicle. We have merely united into a whole the sections which were previously distinct. The essential point about a connected vehicle is that it bears
information about several entirely different items or features, and that the details about the different meanings cannot be rearranged, term by term. Simple connected vehicles
Thus a transfer sheet, of the sort used in punched feature card applications, in which item information appears column by column and feature details occur line by line, is a connected vehicle. Many standard lists are simple transfer sheets in type, although they do not bear that name. In chapter 2 we noted that a grocer's bill is of this sort. Ledgers and similar registers of members, clients, or purchasers and the like are such. When literature ceased being written on scrolls and became presented in book form (although it kept the old name of 'volume') the connected vehicle which was the scroll became, as it were, two-dimensional. Bits of it were cut up and bound one behind the other. A measure of what might now be called 'random access' was thus achieved. However, the books were not of the loose-leaf variety, so the vehicles were still connected in our current sense of the word. Coded microfilm
Coded cards of microfilm may be brought together in the form of a length of film, so that to gain access to a particular frame of the film we must run the entire length up to that point, and to carry out a search we must pass the whole of the film under an appropriate sensing head, seeking the required features. Microfilm search devices are arranged so that the film stops its travel when a frame which meets our requirements is encountered. The image can then be inspected and enlarged upon a screen. Provision is often made for a print to be taken of the image concerned, if we wish it; this is generally known as the 'hard copy'. Very clearly, coded microfilm is a connected data vehicle.
Figure 48. Some frames from a reel of coded microfilm.
Punched paper tape
,
, I
l
i
, l
I I
,I)
Punched paper tape was noted, very briefly, in chapter 10, when edge-tracked cards were mentioned. Historically, the punched paper tape came first, and the cards followed, adapting some of the devices already available for handling the tape. Unused tape of this type shows only a track of sprocket holes, not quite centrally disposed, running along its entire length. On either side of these run the channels, where the bits, the binary units of information, are recorded. Five-channel and eight-channel tapes are widely used, and other numbers of channels are also available. There is a great similarity between the layout of punched paper tape and that of magnetic tape. It is a matter of choice where the dividing line between tape and roll occurs, for tape with more than fifty channels exists, being fed to some forms of automatic typewriter, and for the special use of instructing Monotype typesetting machines a tape of 3 1 channels is used. On the whole, however, we think of punched paper tape as the fairly narrow sort with the central sprocket-holes; the wider sorts have these in two tracks, usually down their edges. Tape, fed into computers, teleprint devices, or automatic typewriters and the like may well carry lengths of connected text. Equally it may carry coded information about items, one item at a time, or about features, one feature at a time, being then punched item tape, or punched feature tape. As a means of instructing computers, punched tape may carry much programming data together with numbers and other material for the calculations to be carried out. It is generally a means of transfer of information rather than a
vehicle which stores it, but it can be used for storage also, and many types of card-holding cabinet have been modified to provide for keeping it in appropriate lengths. Magnetic tape and film
A typical reel, roll or spool of magnetic tape may contain over 2,000 feet of the material, half an inch wide, formed of a magnetisable substance deposited evenly on a plastic base. It has several, often five or eight, tracks or channels running along its length, and its layout is generally, except for the lack of sprocket holes, that described in respect of punched paper tape. Instead of holes and blanks it carries areas of magnetisation, in two different senses (south-north and north-south), which can be distinguished by an array of pick-up heads. One track (this may be the case with punched tape also) may carry a so-called 'parity bit' as a check on the accuracy with which the information upon the tape has been recorded. Since the operation of parity check is of interest, it may be helpful to describe it. To do so, we may consider a character, as normally recorded on punched paper tape by means of a pattern of holes and blanks across its various channels, or on magnetic tape by a similar pattern of directions or magnetism. Taking 1 to stand for a present hole, or for the magnetic direction which is taken to mean presence, and 0 to stand for absence, we may imagine the sequence 01 10110 to represent a character. In this case, the total number of l's is four, which is even. The mark for track eight, the parity channel, is therefore 0, if even parity is to be maintained. A character bearing, say 0100000 as its symbol would call for 1 in the parity track. In a check procedure, disparity between the parity bit and the actual total of present marks will show that there is a mistake, which may, of course, be in the parity bit itself. With magnetic tape, and with magnetic film, which is a wider variety of the same, we enter the realm of the computer, for which it is often used as a store of information. It is essentially a con-
nected store. Other forms of memory are also available to computers, and to these we may now turn. One special quality is essential to all of these: the marks on them must be capable of rapid change. Computer memories
The safest remark to make about data vehicles used as computer memories is that there will be new ones tomorrow. However, three may be mentioned here, besides the magnetic tape or film already noted. These three are magnetic disc, drum, and core memories. The shapes of the first two of these are obvious from their names, for both give a two-dimensional access to information - much as a book does in comparison with a scroll. This is done by a combination of linear and rotary motion. The third, the core, is a ring of magnetic material, the two directions of whose magnetism dre the two ways round the ring. Such a core is threaded on several wires, which carry the current used for changing its direction of magnetisation as required, and for sensing which direction is in force at a given time. Two-dimensional matrices of these cores can be set up, which can, in turn, be arranged one behind the other to make a three-dimensional block. Such an arrangement makes it possible to find the state of any particular core at any time and is therefore a random access device. The ease with which marks on magnetic vehicles can be changed makes it possible to move blocks of information from position to position on the static vehicle (the memory) instead of moving vehicles from place to place, each bearing its load of static information. When used for its original purpose, computing, the patterns of bits in the computer's memory alter constantly, in response to the various commands to add, subtract, and perform more complex work which can be reduced to these and to similar operations. To use the language of these pages, the state of its data field is constantly changing. The changes are governed by a series of commands collectively known as the programme. Computers of this
type are known as 'digital'. They work by discrete steps, one data unit or bit at a time. We had better talk of bits, because a data unit may imply that each of the terms which make it up is at or above generic point. Another type of computer is known by the name 'analogue', and is much less widely used than the digital computer. Its main stronghold is the physical sciences, where it carries out its function of simulating 'real-life' situations by accepting input in the form of a curve or graph (or the equivalent) and producing output which is also plotted on a graph or may be shown on an oscilloscope. To return to the digital computer, the sort which has a unit-byunit, bit-by-bit memory, more than one type of data vehicle, memory or information store may be required by such a machine. There may, and there usually must, be a store which can be used in the shortest possible time, a random-access store, a case of 'going straight to the page and line'. Core stores, and even more sophisticated devices, provide this facility. They are, in effect, what we have called separate data vehicles. A larger-capacity 'backing store' which takes longer to enter but carries much more information, accompanies this. Magnetic tape, with its serial access but with countervailing advantages, is often used for this type of work. Information from this larger store can be brought forward to the immediate-access store as needed, worked on, and returned. As in the case of punched cards, the sets of positions occupied by present or absent data units or bits must be given reference numbers. In computer language these are usually known as 'addresses'. The bits are grouped into sets known as words, and each word has its own address. On tape, film, disc and drum the words occupy specified parts of the magnetic tracks. In a core memory they each occupy a set of positions through the stack of planes or matrices in which the cores are placed. A set of forty such matrices permits the use of forty-bit words. If each plane of this type is a matrix of 64 rows and 64 columns, a core at each intersection, then it has a total of 4,096 words. It is like a data field cut up and shuffled into a stack of 64 subfields, each with 40 units in one extension and 64 in the other.
Computers and questions
Calculation, the manipulation of quantities, is carried out in a digital computer by a device called an arithmetic unit, which operates on a data field whose words represent numbers in binary notation under the control of its programme. The programme is also a data field. The machine scans it and reacts to the patterns it finds there. These patterns may dictate changes in the programme itself. When a computer is not used for computing, but as a search or selection device, followed by print-out of the information it has found, it generally makes use of the connected data vehicles which form its backing store, such as magnetic tape. It carries out a serial operation on this, just as such an operation may be carried out on cards or other data vehicles. If the information is recorded item-by-
I
1 l
Figures 49 and 50. The computer (left) is an IBM system/360 Model 40. This is of the digital type, by far the most widely used. A computer need not appear in the form of one single unit: several different free-standing units may comprise it, linked by cablescarrying electrical impulses. The pal?icular set of inpk, output and processing devices and memory stores which is employed m a given installation is often called the hconfiguration'. Below an malogue computer.
item along its length, a feature question is answered in the usual way by means of an item-by-item search. The speed with which this is done is very great, however. The battles which rage on the subject of whether this type of work should be done by a computer seldom call into question the speed of the affair. They are concerned with whether the machine can be made available when it is wanted, with the cost of using it, and with other administrative and financial problems. Summary
Connected data vehicles do not necessarily force serial access to information, although this is often the case when they are linear, the form taken by punched paper tape, magnetic tape and coded lengths of microfilm. In other cases, more rapid access is possible,
and devices exist for completely random access, an example being a core store in a computer. In computers in general, the parts of vehicles representing terms cannot be changed about physically. Instead, facilities exist for changing the marks on the vehicles so that these change the terms they represent. The facilities are derived from the qualities of magnetisable data vehicles, which permit remarkably fast interrogation and change.
Relations- passive and active
In chapter 9 we met passive and active relations, which we called, respectively, conditions and operations. In a data field, mutual exclusiveness, overlap, inclusion and identical equality are conditions which may obtain between any two terms, any two sets of data units. Operations include union of sets, intersection of sets, choosing the most, or least, inclusive of a number of sets, and - also an operation - leaving well alone. These, of course, all apply to the settheoretic level of the holotheme. At higher levels, the conditions consist of all the relations of construction, and the operations of all those of alteration. Conditions in classed schedules
We recall that a schedule is an arrangement of terms used in the formation of a manifold or co-ordinate index. A classed schedule has these terms arranged in sets, which keep similar terms together. The terms in such a class of terms may well be mutually exclusive, in which case they could be used as an array in a hierarchy. On the other hand they may overlap, or be so chosen that some of them include others. As an example, we show a number of sets of features which can be used to describe hotel : a mutually exclusive s e t
a n overlapping set
a cumulative set
a n equivalent (identical) s e t
hotel is : in France in Spain in the United States in Germany in Switzerland
hotel possesses : a ballroom tennis courts a swimming pool a bowling green a skittle alley a golf course
hotel is : more than 50 years old more than l00 years old more than 200 years old
hotel has : first class food excellent cuisine top quality refreshments
It is helpful, when preparing lists of features such as these, for
statistical, administrative or other purposes, to recognise the type of relation with which we have to deal. We may consider them in turn. Disjunction, mutual exclusion
Three important advantages are possessed by a mutually exclusive set. It can be used, if necessary, as part of a hierarchy, or it can be the basis of a helpful verification procedure, or it can permit the safe use of subcoding. All of these have already been mentioned but since the verification method was treated only briefly, in chapter 10, it may be described in greater detail here. The technique is applicable to punched feature cards. To demonstrate it, we may take the mutually exclusive set of features of hotels their country of location, which we shall also take to be collectively exhaustive, the entire list of countries appearing in the index. Thus every one of the hotels must possess one of these features and one only. If we now stack two unused punched feature cards, one entitled 'in France', and make holes through both at once, in every relevant position, we shall end with a card showing all hotels which possess this feature, and also an exact copy of this card. If we take the copy, and place it on top of a feature card entitled 'in Spain', we shall see the holes representing hotels in France piercing the top card and allowing the lower ('in Spain') card to show through. This can be made easier if the copy, we shall call it a 'check' card, is a different colour from the card below it, or if a coloured transparency is slipped in between the two. These already-punched positions are, obviously, places in which no further hole should be made. If we find from our instructions that we should punch the new stack in a position in which the check card is already punched, then either these instructions or our original punching on the 'in France' card must be mistaken. Let us imagine that all is well, that we punch the stack of check-card-on-'in Spain'-card, and then take the check card and place it on the 'in the United States' card. It now shows both the 'in France' and the 'in Spain' holes, and performs the same
checking function as before. After further similar operations the check card, bearing the set of holes punched in all the other cards, will be placed on the 'in Switzerland' card. Every unpunched position on it is clearly a hotel in Switzerland, as the instructions for punching ought to show. After 'in Switzerland' too has been punched, the check card will have prevented any errors of commission, but errors of omission are still to be considered. These are shown immediately, for they are positions on the check card which have been missed. This example of 'self-verification' is, of course, a special case, used only when the data vehicles are punched feature cards, or very rarely, when punched item cards are mutually exclusive in respect of their features. To make it effective, the operator must be able to see the cards at the moment of drilling or punching, so as to check that the place to be punched is safe. Further, the method of stacking all the cards which describe a given item, and then punching through them all at once, cannot be used with such a check as this. On the other hand, the technique is very well adapted to the use of transfer sheets. Yet, though a special case, the example shows the sort of practical application which can be made of a knowledge of the behaviour of the relations in the data field. Overlapping
Features in the same array cannot overlap each other, and any set of overlapping features must, therefore, come from different arrays. If there are very many arrays, as we have seen, any hierarchy based on them will have many positions, or collapse of terms, or both. When a schedule is in use, however, this problem is insignificant. The overlapping features, as in the collection of sporting facilities available in the hotels, are simply listed, each representated by a suitable feature vehicle or position upon an item vehicle. We may note that a mutex (mutually exclusive) check procedure, as described above, is not possible in this case, but although the features themselves are not mutually exclusive the total numbers of
features possessed by the items are. It is therefore possible to apply a 'mutex check' to the system by handling all the one-of-the-feature items first, followed by all the two-of-the-features items, and so on. Inclusion
The features listed as 'more than 50 years old', 'more than l00 years old' and so on are often known as 'cumulative'. Each of the later features includes all of the earlier ones, a condition often to be found in a dimension, in this case, the time dimension, shown as age. If features from such a cumulative set appear as notches in series round an edge-notched card, then the notching will be continuous up to and including the most specific relevant feature; and then it will cease. If punched feature cards are in use, and are stacked with the most specific feature beneath all the others, and so in order with the most generic on top, the holes will be seen to travel different distances into the stack. A coloured transparency slipped between any two cards from such a set will show as coloured discs all the items possessing a value at or above the more generic, but less than the more specific of the pair. Reverse cumulation is possible. In our example, this would be shown by the use of such features as '50 years old or less', '100 years old or less' and so on. These are clearly the complements of the features actually employed. A set of features which contains both cumulative features and their complements ('decumulative' features) makes it possible, in the case of feature cards, to search for all items within any range whose end-points appear in the set, without using a coloured transparency to show these, and in the case of item vehicles, also, it permits a search to be made on present features only. Cumulation occurs when every feature either includes, or is included in, any other feature chosen from the set. The features in such series form a simple order, such as we met in chapter 4. In that chapter we also met partial order, which is weaker. To show this, we may imagine that our feature list contains the ideas:
in classical style with Doric columns with Ionic columns
Here the cumulative effect is modified. Doric and Ionic columns are clearly varieties of classical style, but they themselves do not fit into a single series. In fact they overlap because a building might possess both, and they do not exhaust the possibilitiesfor a building might have neither. A side-by-side relationship is permitted here. In all cases of inclusion, we may expect to find our indexing instructions well besprinkled with 'punch also' and 'index also under' and similar phrases. The effect is seen, too, when collective terms or terms of a generic type are recorded. If Pekingese, punch also dog, if boy, punch masculine, if peanuts, punch victuals. At times, this is assisted by the type of coding used. Thus is an index of buildings includes the codeword 35 1, meaning 'hotels' -in mediaeval style - with courtyard', then feature cards, or hole positions on item cards, may exist bearing the codewords 3,35 and 351. Instruction to record 351 is then taken, automatically, as instruction to record also 35 and 3. Equivalence
In schedules, equivalence often makes its appearance in the unwelcome form of synonymity.In a subject-matter index, for example, with the terms kept in alphabetic order, the existence of one rather rare term may be forgotten. A new word for it may be introduced, perhaps after an unsuccessful search for any word which is already in the list and which means the same as the new arrival. Searching by means of either of the synonyms then becomes incomplete. This is a reason for preferring classed schedules to those which are alphabetical, though, strictly, alphabetic arrangements are not unclassed. They are classed by means of irrelevant qualities, the ways their names are spelt.
In the case of punched cards, equivalence is the relation used when testing the accuracy of copy-punching by stacking the card and its copy. Often, a coloured transparency is placed between these, and coloured discs are sought by viewing the stack from one side (A on B) and then the other (B on A). No discs: equivalence: the punching is correct. In our example of the gastronomic advantages of the hotels, the features are equivalent, being synonymous, and, beyond synonymous, if every Swiss hotel provides first-class food, then 'in Switzerland' is also equivalent to any of these features. Syllogisms
A traditional form of logical argument is the syllogism. An example is 'all men are mortal, all Greeks are men, therefore all Greeks are mortal'. It is possible to display arguments of this type by using punched feature cards, the holes correspondmg to the items which are men, mortals, Greeks, and so on. It is an effective way of showing logical behaviour. In terms of our data field, the syllogism just quoted tells us that, out of a universe of many things, those which have the feature of being men have also that of being mortal. If I, signifies the set of men, and means the set of mortals, we have here a case of 15 3 L. Further, if M means being a Greek (from the item viewpoint, M is the set of all Greeks), then we have L 3 M . The whole syllogism tells us that, from K 3 L and L 3 M we have K 3 L 3 M and may deduce that K 3 M . If, in this type of syllogism, we wish to show that our deductions hold even if 'man', 'mortal' and 'Greek' are synonymous, we must use the symbol and the idea of mere inclusion, instead of that of 'containing' or proper inclusion. This widest form of inclusion is given a symbol which has a suggestion of the sign for equality about it. We write 2 instead of 3 . In the realm of numbers the parallel sign is 2 ,for 'is greater than or equal to'. With this additional idea, we may write, i f K z L a n d L z M , then K z M . The relation of overlap may also appear in a syllogism. Overlap is the case when two sets have at least one common member but also
have each at least one member which is not a member of the other. We may choose here to symbolise overlap by 5 . Then if Q stands for being a hat and R stands for being blue, and hats may be of other colours, and other things may be blue, we may write with truth the relationship Q ? Ri: some hats are blue. Such a sentence as 'no eleuhant can use a slide rule' exemuliiies the relation of mutual exclusheness, which we may symboliseAby/I. Here, with S for the feature of being an elephant and T for capacity to use a slide rule we have S//T. A sequence of relations
Relations have been classed for many years according to whether or not they are reflexive, symmetric, and transitive. A reflexive relation is one which a term may bear in respect of itself, and it is unary, in that a single term may bear it on its own. A symmetric relation links two terms in such a way that each bears that relation to the other. A transitive relation affects three terms, and is (in this sense) ternary. It passes on, as it were, through the middle one to the last, being such that if the first bears the relation to the second, and the second bears it to the third, then the first bears it to the third. We can say, loosely, that order matters with transitive relations. These three properties of relations are all concerned with those which we have called passive, with conditions. A similar set of three is concerned with active relations. These, properties of operations, have been studied in the realm of mathematics. Corresponding to reflexiveness, there is the property of 'idempotency'. An active relation is idempotent if, carried out on a thing, it leaves that thing unchanged. Thus intersection is idempotent. If we intersect a set with itself we are left with the set and with nothing new: An&+. Corresponding to symmetry, we have commutation. Again, intersection will suit as an example. To intersect set A with set B is the same as intersecting set B with set 4. A third property of active relations is association. The result of AnB, if intersected with C, is the same as the result of A intersected with the result of Enc. This
Figure 51.
Some relations and their codes Log~calnames Transverslve Mathematical
l Trarsitlve
Symmetr~c
Reflexwe
l=yes E n t i t ~ v e( d a t a f i e l d ) Pass~ve Act~ve (actions) (structures)
v
+v
dt)sent
0 0 0 0
'IS
0 0 0 1 0 0 1 0
--Is present //IS dlstlnct from %overlaps
0 0 1 1 0 0 0 0
1 1 1 1
0 0 1 1
0 1 0 1
A t t r i b u t i v e ( n u m b e r field) Pass~ve Active (reckonings) (properties)
complementatton
I Identity +add~tion
A synimetr~c d~fference n intersection
2 ~ncludes
>IS
r IS ident~cal
=IS equal to
greater than -subtract~on =equation
w~th
1 0 0 0 1 0 0 1 1 0 1 0
~ l s a member of a reciprocal member of
XIS
palrlng
x
multipl~cat~on
1 0 1 1 1 1 1 1
1 1 1 1
0 0 1 1
0 1 0 1
-d~vls~on 4-1sdivistble by naveragfng
V V a n d so on The symbols 9 X fC and n are here adopted as appropriate to the concepts concerned but are not widespread In the literature of our sublect
is expressed with brackets, thus: (Anl3)nC = An(BnC). The order of performing the intersections is immaterial. If an operation is not associative, then the order matters, so non-association corresponds to transitivity. All these properties can be found in the relations which can affect subsets of a single set, subsets of the set of features in a data field, for example, or subsets of the set of items. Other relations obtain between sets, as opposed to within them. Reciprocal membership is an example of this. In the data field, such relations are transversive, and relations within a set may therefore be thought of, with respect to the data field, as intransversive. Further, besides these entitive relations, between sets, we find attributive relations, between numbers. These, too, possess all the properties mentioned, and these, too, may be conditions or operations. If we use a binary tetragraph code to symbolise relations, with transversiveness in the left-hand position, followed by transitivity, symmetry and reflexiveness in that order, every relation to be found in the data field can be allotted a code number. Thus 0010 is the code number for a relation which is intransversive (that is, occurs within a set), intransitive, symmetric, and irreflexive. An example is disjunction. Subsets within a set can be disjoint, so the relation is intransversive. If subset A and subset B are disjoint, and subset B is also different in every member from subset C, it does not necessarily follow that subset A and subset C are disjoint. They might have common elements, so the relation is intransitive. On the other hand, it is symmetric, for if A is totally different from B, B is totally different from A. Finally, it is not reflexive, for nothing can be totally different from itself. Another relation with this code is addition. Unlike disjunction, which is a condition, addition is an operation. Subsets of a set can be added to each other, if they are disjoint, and it is therefore intransversive - within a set. The operation is associative, the equivalent of being intransitive. Thus 2 (3 4) is the same as (2 3) 4. 3=3 2. It is commutative, the equivalent of symmetric: 2 Lastly, it is not idempotent (the equivalent of being not reflexive) :
+ +
+
+ + +
2 + 2 is not the same as 2. So addition, like disjunction, is a relation with code 0010. We may note that, for objects to be added, they must be different from each other, that is, disjoint. The condition for the operation, and the operation itself, have the same relation code. It seems to be a general rule that every code number of this type is possessed by at least four relations: an operation upon numbers or quantity, an operation upon sets or identity, a condition obtaining between numbers, and a condition obtaining between sets. Further, another four relations, also with the same code number, can be found by taking the conditions or operations acting as 'shadows' to those already considered. These four act on the complements of the subsets affected by the first four. We have already seen that intersection and union shadow or complement each other in this way. They are called 'dual' in set theory. Relations and the holotheme
In chapter 9, we mentioned that data fields were occupants of the lowest level of the holotheme, and within each level we distinguished between subunits, subassemblies, subsystems and subcombines, which lived below the main units, whatever these were, of that level. After subcombines came the main units, followed by assemblies, systems and combines, after which the next level began, with subunits of its own. If we relate this pattern to that of the relations given above, we can take the relations in pairs and label them as in the following table. The two types of relation at each formative stage, reflexive and irreflexive, appear to correspond to the difference between objects and substances, the relations between objects being irreflexive and those concerned with substances being of the reflexive variety. This connection between substances and reflexiveness is of some interest. It seems to be connected with infinity by way of the fact that a set is reflexive if it can be put into a one-one correspondence with a proper subset of itself. Any infinite set is reflexive, but it does not follow that any reflexive operation must be concerned with an
transversive t r a n s i t i v e s y m m e t r i c reflexive
0 1
relations appropriate to subunits (data units)
0
relations concerned with sub-assemblies (subterms)
1
0 1
relations concerned with sub-systems (series of subterms)
0 l
relations concerned with sub-combines (sets of such series)
0 1
relations appropriate to units of the level (entire terms)
0 1
relations concerned with assemblies of terms
0 1
relations concerned with systems of terms
0 1
relations concerned with combioes (subfields)
infinite set. A substance, as we saw in chapter 9, is conceptually unbounded and so may be thought of as infinite. certainly if we take copper away from unbounded copper we still have copper. On the other hand, the subsets of a set may well be finite, as those of our data field are, although reflexive operations such as union and intersection may be carried out upon them. Despite this, something of the infinite quality remains. Unite or intersect two subsets and the result is still a subset.
Fz@re 52. The set of eight semantic types displayed as the corners of a cube. The faces of the cube form three palrs, terms and relations. passlve and active concepts, entitlve and attributive concepts.
Structures
v
A
Changes
Relation codes at higher levels
With 1111 as the code for a subfield's typ+al relations, it may seem that the next step upward, to an entire data field, should produce the code 10000, the data field then becoming a subunit at the next higher level, the one we have called geometric. It is interesting to see what happens if we continue this method of progress. The addition of the new digit gives us a further sixteen relation codes, from 10000 to 11111, and therefore takes us to the top of the geometric level, beyond which the subatomic has been said to begin. At this point a new position must be used, preceding the others in the relation code, and providing 100000 as the code number for the subunits of subatomic level. This new unit provides thirty-two new relation codes, running from 100000to 111111, and thus covering two integrative levels, the subatomic and the molecular, the range of physics and chemistry. After this, yet another unit must be added, producing a seven-digit code number, 1000000, for the lowest object at the cytomechanic level. Since this provides us with sixty-four new relation code numbers, it takes us upwards through all the higher levels to the top of the national level, and so finishes the sequence as we know it. Long code names and code numbers of this sort are not easy to read. There are advantages in breaking them up, and in this case we may divide the relation code numbers three-four, thus: 000/0000. In this case, the first three digits give integrative level and the remainder give formative stage together with the object-substance distinction. It is interesting to note where the new units - the ones - in the triplets in the code numbers come in. The first appears at the level of geometric space, the next at the level at which mass-energy appears and the last where life arises. These are critical points in the holotheme and the fact that they seem to arise naturally from extending the sequence of relation code numbers is somehow both unexpected and satisfying. Thus the triplet code for the integrative levels runs :
000 : set-theoretic 001 :geometric
01 0 : subatomic 01 1 : molecular
100 : cytomechanic 101 : biomorphic
I l0 : communal I l l :.national
Equipment and structure
The structure of the data fields with which we are concerned is, we have seen, a pattern of relations, and in choosing equipment to represent it we gain advantages if we select devices which mimic the pattern closely. If we are concerned with manipulating the field by changing its state, then methods such as those used in magnetic and simdar stores in computers are appropriate. In simpler and earlier calculating machines trains of gearwheels and other devices carried out this function. If we are concerned with searches and other operations on a static field, then devices which are not adapted to change may be preferable. Thus we may conclude our brief study of the state of a data field by noting that this is what is embodied in the hardwear. The network of terms is related to the embodiment and determines the pattern of presence and absence on the vehicles employed. Wherever we have an ambiterm there we have a choice. We may treat either part of it as the present part, with the result that the same net may appear in very different disguises. Let us suppose we wish to reduce the number of blobs, and to increase the number of rings in our original example of a data field by changing the present terms to their absent complements or the reverse, as necessary. We could achieve a remarkable simplification, which would be even greater if we were to allow but one example of each different pattern in any row or column to occur. In figure 53 the rings are omitted to make the effect more clear. This reduces fourteen blobs to four, and twenty-five data units, counting both states of unit, to sixteen. In an index embodied in equipment which could handle absence as easily as presence, and in which the work of entering information consisted of making marks for present units, such an exercise could save considerable effort.
Figure 53.A data field reduced in size and complexity by a course of complementation.
- 0
1
Aa ~ a c 0 0 -
1
. 3
a
4
-L
sea -
E
com~lementA and B . . .
rennove E (as being the same
1
2
3
4
remove 0 (as being the same as 2).
Provided that we recall that A = _E and that 0 = 2, we have lost no information. The example raises the question: how many apparently different displays of data units refer to the same situation, differing only by reason of complementations in the net? By a judicious use of complementation, there are many patterns we can make out of our data field, yet there are also many patterns to which we cannot attain. These represent genuinely different nets. The answer, with X standing for the number of ambiterms in one extension of the field and y standing for the number in the other, is : (2 5-1)
U-1
In the case of our example of a five-by-five field, this amounts to 65,536 fields, each fundamentally different from every other. Summary
-
In a classed schedule, a knowledge of the types of relation which hold between features (or, transversely, items) in any collection of such, is helpful. The relations available may be passive or active, that is, conditions of the field or operations possible within it. If we take the data field as a thing, an entity, they are relations of the entitive sort. Such relations are similar to relations between attributes, or quantities. Relations can be placed in order according to whether or not they are reflexive, symmetrical, transitive and transversive, and this order is connected with their position in the holotheme. The order is binary, and we may deduce that new qualities of relations may appear at the integrative levels concerned with space (geometric) energy (subatomic level) and life (cytomechanic level), thus providing relations with code numbers ranging from 0000000 to 1111111. The hardware of a system embodies the relations to be found in the data field and methods of information handling are most efficient when the equipment chosen mimics the behaviour of the relevant relations closely.
Primary and secondary handling
Information comes in quanta, like light or energy, and the more complex things we do with it, like differentiation or linear programming, are in the long run based on counting the units, or on identifying the patterns they form. Calculation, interpretation and other activities are valid parts of data processing, and are therefore part of the subject matter of data study. But the core of the subject, as we have seen, is the data field and its behaviour, the relations between normal and transverse terms, and the properties of the terms themselves. So let us distinguish between installations designed to handle numbers or patterns if supplied with them and installations concerned with recording and handling the raw material in which these may be found. In many cases the two types are united. However, we may give examples of the difference. A simple example is that of a punched feature card system used for a social survey. To form statistical tables, the cards are stacked and the holes are counted. If these totals are now given further mathematical treatment, turned into percentages for example, or used as a basis for forming a correlation coefficient or a measure of sigmficance, then we are concerned with a system designed to handle numbers which are supplied to it. The feature cards do not play a part in this type of extra work, any more than they would play a part if the result of their use were a set of reference numbers instead of a total, and the work of learning more about the items thus discovered was undertaken. In a computer, the calculation can proceed as soon as the search for items with the chosen features in common has been completed and the items have been totalled. Alternatively, on the identity side, the items concerned can be printed-out so that details of them are available on appropriate forms for all to read. Let us give these two stages in data processing two names, almost the last special names we shall need to use in these pages. We may call them the primary handling of the information and the secondary
handling. Secondary handling need not be the result of immediatelyprecedent primary work. Many mathematical enquiries begin by supposing that certain quantities stand in certain relationships to each other, and proceed from there, while others take numerical data out of cold storage from time to time and update it, as in the case of pay or pension calculations. A similar thing happens in the case of identity, for example when a set of references to books is collected and then put aside until the opportunity to study the works concerned arises. Primary handling may be so bound up with secondary handling that we are not aware of the difference, or so distant from it that we forget the primary work was ever undertaken, or it may be carried out by a different person, or as we have seen, it may even be omitted, the imaginary results of an imaginary first stage being processed in order to see what the outcome would have been if the imagining had been the reality. At other times, the distinction is clear. For example, we may obtain totals in one way and immediately work on them in a different way, as by using a slide rule, a nomogram, a calculating machine, or a graph. Primary and secondary handling come third and fifth in a sequence of operations which can themselves be broken down into smaller operations. This sequence is: collect information, record information in a way suited to primary handling, carry out the primary handling, record in a way suited to secondary handling, carry out the secondary handling. In setting up a data handling system the whole sequence is relevant, but here we may concentrate on the second operation. In doing so, we should bear in mind that the way information is collected affects the recording, and that the recording can be performed at times, in a way suited to both the succeeding stages of handling. Initial information
A useful starting point is the definition of the questions we wish to answer. This, which is given the name 'question analysis', seeks to
I
,
discover whether item or feature questions (or both) are asked, whether these concern identity or quantity (or both), and which types of item or feature, how generic, how specific, are the subject of the questions. Incidentally, it also shows at which level of the holotheme the system is to operate. Thus the statement: 'I want to index the labour force of the factory so as to find information about individual people and also so as to be able to make searches for people of various sorts, and to prepare tables for statistical purposes' shows that both item and feature questions are to be asked, the latter being both of the quantity and the identity varieties, and that the level of interest is that of (what the holotheme-language rather impersonally calls) whole biomorphs. If the statement goes on to list features of interest, features of age, sex, occupation and the like, then the various degrees of genericity and specificity required will become apparent. Further statements may reveal that accidents, or absenteeism, illnesses, or other matters, are of interest, and may lead to the discovery that more than one type of item is of importance. In a library, or an office filing system, it may appear that quantity questions are never asked, although both item- and featureidentity questions are important. In the case of a social survey or other enquiry into the behaviour of a population, item questions ('what did the first person who answered say?') and feature questions ('how many people answered yes to the first question?') may both be important, the first type being identitive and the second quantitative, but feature-identity questions may be rare or nonexistent. In some statistical work, the only questions asked are of the feature-quantity type. When we encounter a statement such as 'I want to find out how age is related to salary in salesmen' or 'I want to discover how running speed is related to fatigue in back axle shafts' we enter the realm of secondary handling. These are relations between arrays of features and may call for extensive calculation. Other secondary handling may call for the optimum arrangement of items to achieve some purpose. In such cases, we must ask whether the
primary information, the result of the primary handling, is already available, for, if it is not, there is a prior decision to make, of how to deal with the primary work. Early decisions
It may well appear that the secondary handling is irrelevant. It may not be required, or it may already be well provided for. This leaves the case in which secondary handling is needed and has not been provided, and here the main question is whether to integrate it into a single system with the primary, or whether to treat the two separately. It is apparent that to put the results of an original survey into a computer's memory permits the searching and identification or counting processes to be carried out, the results being put into a different part of the memory and taken out when the calculations are to be performed. It is not always certain that this is the best method. It may be better to keep the primary work outside the computer and to feed in the results, for processing, when this is necessary. The primary handling may then be on a vehicle which is compatible with computer infeed: possibilities are punched or magnetic tape, which is perhaps most valuable as a transferring or holding device, and punched 80-column item cards or punched feature cards, both of which can give good primary, and some secondary, facilities. Administrative and economic reasons also here play a part in the choice. It may be judged better to store primary data in a computer and to carry out primary handling with it to ensure that it is fully loaded with work, particularly if special costs may be incurred in setting up an external primary handling system. On the other hand, in default of immediate access to the computer, say from keyboards in various offices, there may be goed reason to establish, say, a punched feature card system for simple statistics and for rapid searches. Finally, we may note that events may be caught on the wing, totals or other information being fed into a computer direct, as generated by events outside, in which case immediate secondary
processing may take place, often in nanoseconds, for immediate control of the same events. Primary handling
l
1
'
/
l
Let us suppose that, either for comparison before making a final choice, or because the separation of primary from secondary processing is necessary, or because secondary handling is not wanted, we concentrate on the primary work. In this case, we look at the major types of question relating the two extensions of the data Geld and draw some conclusions. For a start, we may consider the possibility that feature questions are never asked of our information. In this case, no feature provision need be made. Item vehicles are all that is needed, and the features of the items may be recorded on these for use as required. The consideration of type of vehicle then usually enters the field of work study or organisationand-methods, and concerns the item vehicles described in chapters 1 0Next, a n d l lwe . may consider the occasions upon which feature questions are asked. Here we have more choice. They may be quantity or identity questions, and the items sought may be important because of features of generic or specific type, or of both, or of any intermediate stage of genericity or specificity. Suppose that feature quantity questions are absent, so that we always identify and retrieve items by the use of features, but (not being in the statistics business) never count them. Suppose also that we are content to allow each item to possess one very specific feature, made up of a number of more generic features occurring in, for ease in making a storage order, agreed hierarchical positions. In this case, an index which is transversive of items may well suit us: a classification based on a hierarchy of features. Such an arrangement was described in chapter 5. It may be thought of as a single-access system, since each item or each item vehicle occurs once, in the one place which is right for it and which bears its descriptive name. If the rigidities of such a system are troublesome, we may
L
consider using an alternative method, which is also transversive. In this case we duplicate the items or the item vehicles, once for each feature however generic or however specific. An example of such an arrangement is a file in which item cards occur under many different feature headings. Like the preceding system, this is inconjugate; there are item, but no feature vehicles. This multiple-access system gets rid of the problem of making a hierarchy to control it, permits easy expansion, and puts an end to hard decisions about where to place an item when it might, as with a document containing complex subject matter, be described by several different hierarchical codenames. On the other hand, it does not make it much easier to combine any number of features in any required order. The only provision it can make for this is to provide every possible combination of feature headings, which is a very large undertaking. To tackle this situation, we may turn to separate feature vehicles, such as punched feature cards, which can easily be stacked in any order and in any numbers, and can represent features of any degree of genericity. Such cards provide a simple and rapid answer to feature questions, and we may note that they provide a swift method of counting as well as of identification. Hole-counts can be fiddling things if there are many holes close together, but machine assistance can be provided if the work is extensive. Counting holes by machine is much quicker than counting cards by machine, although this rule does not always hold good when counting is manual. However, we should stay for a moment with identity questions. Feature vehcles, punched or plain, provide a means of tackling feature questions, the punched variety being much the more easy to use, but the result is normally no more than a set of reference numbers to the items which meet the requirements of the search. If this is all that is required, then this method of handling information will suit. It, too, is inconjugate, however, and in this case the effect is to cause difficulty if item cluestions are also asked of the information. In the case of a transGersive item index this problem only arises if the item cards bear nothing but item names or
numbers, a rare event, for they almost always carry a good deal of information about the items they represent. Thus, if we ask feature questions in order to find items, of which we then propose to ask item questions, and if we forego a transversive system of items because of its restrictions, then a normal inconjugate feature index will not by itself meet our needs. It will give the most effective handling of the features, but it must be provided with a set of item vehicles also, becoming a conjugate index, if it is to be helpful on the item side of our work. If this is not done, then to answer an item question we must find a means of serially searching through all the feature vehicles for those features which are possessed by a given item. Curiously enough, this long-winded operation is often undertaken when aqswering feature questions by means of an inconjugate item system, the item vehicles bearing tabs, slots, notches or other devices to make it possible to search for those items possessing a given feature. This is the last method of answering feature questions which may be mentioned here. It is the one which takes longest in itself, although many mechanical aids for it exist and its adoption may be justified by other considerations. Of these, one is that, in the world as we know it, information normally arrives item by item, so that the use of item vehicles, normal as in the case now discussed, or transversive, is a simple and easy means of recording. Another is that, after a search, no reference-onward is called for; the item vehicles which answer the item question following the search are already to hand. The force of this latter argument varies according to the way in which the item vehicles, or the items themselves, are stored, for it can be as fast to take these from a series of clearly shown storage positions as to take them from a pile which has resulted from a search through a set of item vehicles. We border on work study again. Let us turn for a moment to the possibility that we may ask feature quantity questions. Here, we may never wish to answer item questions at all, in which case a set of feature vehicles may be all that we need. We may count the holes or carry out the other
appropriate operations on the feature-organised information, and obtain our results. We may, at times, be able to obtain an approximation to secondary handling also, from our primary system, as, for instance, when the slipping of punched feature cards out of register randomises the holes and produces a sample of expectation, as opposed to the observed frequencies which are shown when they are in register.
Storage order In any index, both the items and the features must be kept in an appropriate order, which need not necessarily be designed for ease in finding them immediately but is very often designed for this end. In an index using punched feature cards, for instance, the cards may be kept in an order which makes it easy to discover those which are required, but the items may be held in the order of their arrival or even in a quite random arrangement. In a classification, the features occur in a standard order, which is learned by the user, and the items hold an order which depends on this. We have already dealt with the situation which arises when we decide to place terms according to a hierarchy; here it may be helpful to look at the possibilities available to us when we decide to form a schedule. This is often a feature schedule, to be used in titling feature cards and placing them in order. Three schools of thought appear in this case. The first, connected with documentation, calls for the features to be classified according to the order of the letters of the alphabet composing the words of their preferred names. The argument is that any user can easily think of a word and reach out to pick up the card corresponding to it, which will be in the correct alphabetical position. The second school of thought opposes this, arguing that we may not think of the right word (we may think instead of a synonym) or that, although we find a card corresponding to a word of which we have thought, that card may in fact refer to a different meaning (the problem of the homonym). Further, it is said, if we wish to add a new word to our alphabetical
schedule, or 'keyword list', we have no means, other than extensive scanning of the list, of knowing that the idea represented by the new word is not already there, under a different name. Consequently, the second school argues, we should class the features we use so that similar concepts come together. The difference can be shown by two schedules,thus : blue large medium octagonal orange red small square triangular
blue orange red small medium large triangular square octagonal
In the second of these lists, the indentation is only to show a difference between main types of feature, those of colour, those of size, and those of shape. If we now wish to add 'tawny' to the list, thinking of the same colour as that we name 'orange', the classed list will show that the word 'orange' is already available, while the other will give us no news unless we either read it right through, or think of the synonym 'orange' and look for it. Further, the classed list will show us where the gaps lie. In the classed arrangement, we can see whether we wish to add, say, green to our colours, or other shapes to our shape collection. On the other hand, we must learn our schedule, at least in outline (as if we did not once have to learn the alphabet!). This school of thought usually seeks to make this learning easier by arranging the features according to some grand design. The third school has affiliations with the second, since it holds that features should be classed according to their content as ideas, and not according to the words which represent them. It differs, however, in that it is not interested in classing ideas according to any grand design above the mere individual groupings of shape,
size and so on. This is because its concern is with fitting the features to the shape and capacity of some specific data vehicle. It is entirely pragmatic. A classed schedule derived from such a viewpoint is likely to have its features in the order in which they may be encountered by an indexing clerk running down a list of information available about a given item. For instance, in a medical case history it is likely to start with features of name, sex, age, date, surgeon or doctor concerned, and so on, proceelng then to the history of the case, and concluding with the result and such summary features as may be helpful when the whole story is known. This business of how to arrange the terms in the net runs parallel to that of how to embody the display of units of the field in appropriate vehicles, and it is clear that the two interact. In this chapter we placed the choice of vehicle first because, with exceptions, this affects the efficiency with which a particular type of question is answered more closely than does the arrangement of the net. Yet the arrangement of the net is a matter we dare not overlook. This is because an 'efficient' index may turn out a wrong answer as a result of being asked a wrong question, or may turn out no answer at all because, not being able to find the terms we wish to use, we cannot approach it. Three parts t o a system
Many, if not most, information handling systems are governed by three documents: the arrangement of features - often known as the schedule, the hierarchy, the classification, the keyword list or the code - the indexing instructions, and the administrative routines. These separate parts of the software may not always be written out in full, and in a private index to a stamp collection, or to the information gathered for a thesis or the like, they are unlikely to be written out at all. Nevertheless, in some sense or other they are usually there. The schedule or hierarchy arranges the features, the items being known, and may allot codewords to them. The indexing instructions are concerned with how the equipment which has
been chosen to embody the data field is to be handled. The administrative routines gear the installation as a whole into the remainder of the work of the organisation it serves. Computer programmes are a type of indexing instruction in this sense. So is a document such as the British Standards Institution's guide to the Universal Decimal Classification, which by our definition is a hierarchy. So are the working manuals issued with punched card handling machinery. Stages in set-up
Let us leave aside the installation of large-scale data processing systems, which may well be preceded by extensive feasibility studies, systems analysis and comparisons between competing automatic data processing equipment. This done, we may take the main stages in developing a small or moderate-sized index or data handling system to be the following: Analyse the questions to be asked. From this, draw preliminary conclusions about the type of index best suited to answer them, about which type of item is to be indexed and about which types of feature are to be used. If the starting collection of features can be found, list these. If, as in the case of subject matter, much of the feature extension of the field is unknown, set up a means of listing them as they are found. On the basis of the above, estimate the numbers of items and features to be handled, and note rates of growth or change or removal, or any or all of these as may be appropriate. Compare the best answer, as to the equipment required, with economic and other considerations external to the plain question analysis, in case modification of the equipment choice must be made.
Make a final selection of equipment, order it, and prepare such ancillary documentation as may be necessary - card designs, transfer sheets, formsfor the collection of original information and the like. Select staff, if the index is not to be a one-man or a one-woman show, and collect any starting information not already available. Carry out such preliminary staff training as is possible and draft the indexing instructions and administrative routines, perhaps with the aid of those who supply the equipment. On arrival of the equipment, enter the backlog of starting information. This operation can be used to provide additional staff training and to adjust indexing instructions and administrative routines as required. Proceedto use the installation thus created, as may be needed, for the continuing index or the once-off survey or the series of studies, the generation and use of information for secondary handling, and the like.
Maintenance
And so, having set up our data system, we may set to andoperate it. It may well provide us with a few surprises. But these we have built into it ourselves, for no system can behave other than in accordance with the behaviour of the data field, as embodied in the equipment we have chosen and as represented in the way we have decided. Experience of using it will show which features of the items indexed are of value, and which may be left out. As a result, we may congratulate ourselves on choosing a method whereby this is easy to do, or feel annoyed because we chose a method which makes alterations difficult. In an industrial context, staff may change and new staff may need to be trained. In any context there is a possibility that some of the features in use may alter as a result of the words we apply to them changing their meaning by imperceptible degrees. A fully effective system may have to be duplicated for use elsewhere. Many alterations may take place, and the index is likely to be static only if it is intended to analyse a once-for-all survey. All other
indexes or data handling systems should be designed from the start to change with the times, and a proper care for their continuance will see that they do. Many measures of the cost, size and efficiency of information handling methods have been devised. The f.p.i. (features-per-item) ratio deals with the amount of work which may be needed in order to set an index up. The recall ratio measures, so far as may be, the efficiency of a system in finding items it is known to contain, and the relevance ratio measures how many of the items found are those whlch are relevant to the finder's needs. Graphs may be drawn showing the rate of arrival of new features. On the statistical side, the cost per count or per cell in a table may be calculated. On the side of retrieval, and particularly of retrieval of documents, however, there is an element which can hardly be costed. This is the amount of effort, money, time - measure it how you will -which may be saved by finding the one piece of information which is needed, and without which a costly research programme, duplicating other work, may be mounted, or some other necessary step may be omitted in carrying out a project, or some great mistake may be made. This is where investment in effective information handling takes on the qualities of an insurance policy, and where we begin to ask, not what we will gain by an effective system, but what we will lose without ,one. Summary
After collecting information and recording it, we encounter primary and secondary handling. The first of these is the main topic of this book and is concerned with the production of answers to quantity and identity questions. The second takes over where these leave off, and consists of the mathematical calculations which may be based on the counts, or of the logical or other work which may be based on the identifications. Data handling systems may deal with the primary processing only, or with the secondary, or with both. The data field plays a prominent part on the primary side.
Figure 54.
The number of relevant items retrieved, related to all items retrieved, is known as the precision ratio, pertinency factor or acceptance ratio
is known as the noise factor The number of irrelevant items rejected by the index, related to all irrelevant
Question analysis, matched with a knowledge of the behaviour of the field and of the available hardware, makes it possible to take preliminary decisions about the best type of index to set up. A study of the ways in which the net of names may be represented and arranged helps to ensure that the most efficient method of embodiment yields the most relevant answers. Three interlinked documents govern a system: the arrangement of the meanings, the set of indexing instructions, and the set of administrative routines.
Bibliography
The subject we have here called data study is very unevenly documented. Some of the sciences on which it draws are widely taught and highly developed, venerable and at the same time in a state of exciting growth. Logic and mathematics are such. Other parts of it are hardly touched upon in the literature. Such is the section concerned with the holotheme. To deal with that portion first: the idea of an integrative level is found in Joseph Needham's Time, the Refreshing River, published in 1943 but containing a lecture on the subject delivered in Oxford in 1937. It underlies Stephen Toulrnin and June Goodfield's absorbing The Architecture of Matter and much of the philosophy of science, and is treated specifically in an article by J.K. Feibleman in Focus on Information, edited by Barbara Kyle. The integrative levels mentioned in the article are not so clearly patterned as those described in these pages, and are not so plainly based on set theory and on the behaviour of relations. The idea of formative stages does not seem to have been dealt with elsewhere, although there are many works which state, in passing, the properties of active and passive relations. Among such works are those arising from logic and mathematics. A simple but thorough treatment, for the newcomer, of number theory, is Irving Adler's The New Mathematics. Robert R. Stoll's Sets, Logic and Axiomatic Theories is tough but rewarding, on what we have here thought of as the entitive side. The behaviour of relations can be seen at the foundation of such books as Basic Mathematics by R. G. D. Allen, and the clear, formal A Survey of Modern Algebra by Birkhoff and MacLane. These background sciences concerned with the data field as a whole and the pattern of meaning of which it forms a part accompany the advanced technologies of statistics, data processing, systems analysis and the like as they are carried on from day to day. Among works on these subjects, mention should be made of Becker and Hayes' Information Storage and Retrieval. It gives a wide and useful survey of the types of equipment available and much information about methods of arranging data; if it has a limitation, this is the small space it gives to feature card methods in comparison with others. Historically, however, these took their place very late in the range of techniques and equipment now in normal use. Two books by Brian Vickery - Classification and Indexing in Science and On Retrieval System Theory - are especially relevant to the arranging of subject matter. The Guide to the Universal Decimal Classification issued by the British Standards Institution and written by J.Mills offers a first-class account of this important hierarchical system. Ranganathan's Prolegomena
to Library Classification shows an alternative approach to the problem of tackling textual matter. The library of works on statistics both general and special is extensive. Three books worth reading, in order of difficulty (but none of them very difficult) are McIntosh's Statistics for the Teacher,' Moroney's Facts from Figures and Quenouille's Introductory Statistics. For those iinterested in computers, Electronic Computers by Hollingdale and Tootill is good reading.
Bibliographical details for the books mentioned above are as follows. (If a book has been published both in Britain and in the United States both publishers are listed, the British one being named first. Dates are of first publication.) Adler, I., The New Mathematics, Dobson, London, 1964. Allen, R. G. D., Basic Mathematics, Macmillan/St Martins, 1962. Becker, J. and Hayes, R.M., Information Storage and Retrieval, Wiley, London and New York, 1962. Birkhoff, G. and MacLane, S., A Survey of Modern Algebra (3rd ed.), Macmillan, London and New York, 1965. Guide to the Universal Decimal Class$cation, British Standards Institution, London, 1965. Hollingdale, S. H. and Tootill, G .C., Electronic Computers, Penguin, London and New York, 1965. Kyle, B. (ed.), Fonts on Information, The Association of Special Libraries and Information Bureaux, London, 1965. McIntosh, D.M., Statistics for the Teacher, Pergamon Press, Oxford and New York, 1963. Moroney, M. J., Facts from Figures, Penguin, London and New York, 1951. Needham, J., Time, the Refreshing River, Allen and Unwin/Macmillan, 1943. Quenouille, M. H., Introductory Statistics, Pergamon Press, Oxford and New York, 1950. Ranganathan, S.R., Prolegomena to Library Classification (2nd ed.), The Library Association, London, 1957. Stoll, R., Sets, Logic and Axiomatic Theories, Freeman, San Francisco and London, 1964. Toulmin, S. and Goodfield, J., The Architecture of Matter, Hutchinson/ Harper, 1962 (Penguin Books, London, 1963).
Vickery, B. C., Classificationand Indexing in Science (2nd ed.), Buttenvorthsl Academic Press, 1959. Vickery, B.C., On Retrieval System Theory (2nd ed.), Buttenvorths/ Academic Press, 1965.
Index
Producing an index to a book on indexing is a dangerous task. The following pages are not presented as a perfect example of the art. They are essentially the working index created as the book was written, to allow the author to find his way about the pages as they came from the typewriter. Each main heading is followed by a brief note of what is said on the page concerned, and then follows page references. Synonyms have been given a brief set of references of their own. The main headings may be thought of as a list of keywords, with the pages - the items - noted against them. Each item is given a phrase which contains other applicable keywords, which in most cases appear as main headings in their turn. The reader may like to make a co-ordinate index by taking each heading and recording the items which possess it on any suitable set of data vehicles.
l
1
/
Absence non-possess~onof a feature by an Item, symbolised by nng m data field hagram, pontlve m feel 15, a state, symbol for 18, m the field as a texhle 23, shown by lack of data vehlcle, or by lack of mark (implied) 33, and the semantic contmuum, and blnary codes 115; symmetry w~thpresence In subcodes 129, and blnonuals 135-6 Abstracting reduclng information to slmple terms 115 Accuracy and venficat~onof tab cards 178, and venficat~onof punched feature cards 179, and parlty checks, punched tape and magneac tape 201 Actlve relat~ons operations 164,208 A d d ~ t ~ o n~ t affin~t~es s w ~ t hunlon 56-7, example of a relahon 21 6-1 7 Add~hvesubcodes. and blnary codrng 137-8 Addressing mach~nesment~oned,thelr plates or stencils, as Item veh~cles188-9 Aggregates. orderless arrays 82, and dimens~ons82,83 Alphabet. and convenhonal order 72-4, a means of tmposlng or represenhng order 74,75 A l p h a M c order Introduced 74, of words, does not gxve lmmed~ateorder to terms 75 Ambifeatures always b~nary, composed of two complementary features 17, notatlon for 18, m the field as a text~le23; as dlviders m a class~ficat~on 80
244
Ambutems always bmary , composed of two complementary Items 17, notatlon for 18, m the field as a text~le23 Amb~subterms sets of data unrts, better known as runs 21 Amb~terms always blnary 17, composed of two complementary terms 17,18, m the field as a text~le23, as dlvlders In a classlficat~on80 Arrays of features 76, mutually exclusive and collect~velyexhaustwe, brnary and other 81, and parbhons 81-82, of Items, and dimensions, and aggregates 82 Assembl~es of unlts, typ~callnhabltants of a formatwe stage 158-9, relat~ons concerning, m the holotheme 217-18 Assoclatlon, associatlvlty. of achve relations 216 214-15, related to Intrans~t~v~ty Assoclat~on(statlst~cal) between features ln punched feature card Indexes 58 Automat~ctypewnter and edge-tracked cards 180, and punched paper tape 200
.
Bar charts: and the data field; showng dimensions 85 Base: number of characters or elements In a code alphabet 120 Blnary coding: implied 17; and the semantic continuum 114-15. See also binary subcodes. Blnary numbers: coded 129,130-1 Binary terms: and unary terms 127,13&1. See also ambiterms
Binary subcodes: displayed 127,130-1 ; and lattice diagrams 128; and the carrying rule 129-31 ; positional; commutative 13&1; additive 137 Binomial theorem: and subcodes 135-6; and chance or expectation 136 Biology: and hierarchies; and classification; and its nomenclature 103-4 Bits: and data units 18; binary units of information 18,36; and the holothemk 156,158 Blind card indexes: mentioned (libraries) 102; described 182-3; and feature cards 186,192 Books: as connected data vehicles 199 Capacity (of subcodes): formula for (commutative) 122-3; formula for (positional) 124; and Pascal's triangle 134-5; and the binominal theorem 135-6 Cards: as data vehicles 33; punched and plain 3 3 4 ; edge-notched 137,147,149, 170-2; 80-column (tab cards or tabulating machine cards) 132,149,174-9; as separate vehicles 168-96; defined 169-70; slotted 1 7 3 4 ; pre-scored 175; edge-tracked (sometimes known as edge-punched) 180-1,186. See also punched cards, feature cards, item cards, edge-notched cards and the like for further indexing Card indexes (blind): mentioned (libraries) 102; described 182-3 Cardwheels: described 183 Carrying rule: in binary subcoding 129-31 Cascade diagrams: representing a classification and a hierarchy 79 Chain: and simple order 72 Chance (likelihood, expectation): and density of punching on feature cards 58 Changes: a semantic type in the holotheme 161,162; and time 163 Characters: letters, numerals or other symbols 77; contrasted with elements of codewords 77-8; and graphic codes 111-12 Character recognition: mentioned 188 Checks: and verification of tab cards 178; and verification of punched feature cards (mutual exclusion) 179; and parity, punched tape and magnetic tape 201. See accuracy checks and verification. Check cards: in punched feature card indexes 209-10 Classed schedules : introduced 208 ; conditions in 208-13 Classification : based transversely on a
hierarchy; permits generation of positional transverse codewords; represented by cascade diagrams 79 ; dividers in; and ambiterms 80; and pre- and post-coordination 95,96; and manifolds 96; and transformation 97-8; and dependence 98; and collapse 99; and immediate retrieval 105; faceted 150-1; and question analysis 230 Classing: and storage order 233-5; and schedules 234 Codes: means of representation; lists of characters with rules for their use 77; direct 110; graphic 110-12; for numbers 134; commutative and positional, in superimposed coding 139; for relations 216-18 Codewords: usually formed of characters; act as names; contrasted with words of plain language 77; commutative 78; positional; normal and transverse 78-9; arising from a classification 79; direct 110; graphic 110-12; and faceted classifications; positional, and grammar 151 Coding: a form of translation 77; allotting codewords to terms 77-8; direct 110; graphic l 10-12; superimposed 139; and the Universal Decimal Classification 150; and faceted classifications 151; on tab or 80-column cards 175 Collapse: of subfields into runs, and into units 21 ; in the field as a textile 24; in hierarchies and classifications; as a means of compaction 99; and compressed fields 116-17 Collectives: in the holotheme 1 6 3 4 Collective exhaustiveness: within arrays 81, 82; and positional codewords 124 College of Arms: heraldic classification 102 Combines: of systems, in formative stages 159; relations concerning, in the holotheme 217-1 8 Commutation, commutativeness: a property of the extensions of the data field 22; arising from cornpresence of terms 23; in codewords 78; in subcodes 120; of relations: related to symmetry in relations 214 Commutative subcodes: in general 78, 12&3; and crossovers 120-2; capacities 122-3; and aggregates; related to positional subcodes 126; superimposed 138-9 Comparison: by stacking 40; of terms represented by connected data vehicles 168 Complementation : switching from absence to presence and the reverse; of data units 19; represented in the field as a textile 23;
245
and the logc of the data field, symbol for, an active relahon between sets 50; altering ~dent~ry, quantity, pattern and dens~tyof terms 55; ~ t affin~ties s w~th subtract~on56 Compressed data fields: and crossovers 115-17; related to collapse 11617; and unlon; and textual matter 1 17 Computers: and connected data veh~cles169. and 80-column tab card Input 175, and punched feature card Input 192; and punched paper tape Input 200; and then memones ; and data fields 202-3 ; and calculat~on204; and questlons 204-5; and pnmary and secondary data handling 226,229; and their programmes, as ~ndexlnglnstructlons 236 Cond~tions. passive relat~ons;related to operations 164; m classed schedules 208, entltlve and attnbut~ve(between sets and between numbers) 216 Conjugate mdexes: w t h both Item and feature veb~cles1467; and Inconjugate Indexes 147 Connected data veh~les:m general 198-206 Conta~n~ng (proper mclus~on): as a relahon 70-1 ; and syllogsms 213 See also lnclus~on Cont~nuat~on vehicles: ment~oned110, as sect~onalveh~cles198 Control control sltuatlons and field questlons 63 CO-ordmate Indexing the use of schedules to form man~folds96 Count~ng.m operation sequence 93; as prlmary handhng 226; cards and holes 23 1 Crossovers (false drops). and commutative subcodes 112-1 3,120-2 ; examples, method of generahon 112-13 ; obv~at~on by dupl~cahngtransverse terms; obvlatlon by dupl~cahngsets of normal terms 113, and textual matter 115-17; and compressed fields 117; palhahon by warnlng cards 121; other methods of pall~at~on 122,125; m pos~t~onal subcodes 125; calculat~onof expected numbers 139; and monadlc terms 142 Cumulat~on.and mclus~on;and transparencies; and punched feature cards; and edge-notched cards 21 1 Cumulat~vefeatures. ment~oned208 ; and inclus~on;and transparencies; and punched feature cards; and edge-notched cards 21 1 Data field: collect~onof items and features 9; as network of rows and columns 15,16;
~ t extensions s 16, ~ t symmetry, s as a pattern of statements 19, as a d~splayof states (the state of the field), the result of standard~satlon20; ~ t commutatlveness s 23 ; cons~deredas a text~le2 3 4 ; as a net of names 24; as a bas~sfor questions 26, 27; capacity, m data unlts and subfields 42, self-determ~nateand representational 44; ~ t unlverse s and plen~tude51, for a hierarchy 105-6; compressed 116-7, ~ t s posltlon m the holotheme 164, In a computer memory 202-3; slmpl~ficat~on of 221 ; number of d~fferentforms 223 Data handllng pnmary 226,230-8, secondary 226. See also data study Data study. related to other d~scipllnes8,9, 10; scope of 9,10; theory beh~ndmformatlon handling 9; and ~nformat~on theory 10 Data unlts. and b~ts,hanary and unary; named by two names; crossings of features and Items (or ambfeatures and ambutems) 18; and complementat~on; representing statements 19, f o m n g the state of a term l subfields, terms, 20, as m n ~ m a fields, subterms . . .; formed by collapse of subfields, expanded Into runs and subfields 21 ; as a bass for questlons 26, 27, number of, in data field 42, and the holotheme 156, I58 Data veh~cles.carry records of ~nformat~on, marks on; properties depend on part of data field represented 31, unit veh~cles 3 1-3, feature vehlcles 33-4; Item vehicles 34, subfield vehicles 34-6; related to questlons and Indexes 38-40,46,47; conhnuation 110; separate 168-96, connected 168-9,198-206; sectlonal 198 Deamal class~ficat~ons: and arrays 81 ; Dewey ; Universal 102; and codes for numbers 1034,125,134; and code sequences for speafic, complex subject matter 150 Dec~malsystem and codewords for numbers 1034,125, and power serles 134 Decision pomt. the zero of the semanhc continuum 115 (Impbed), 117 Defina~on ostensive 154; structural l55 De Morgan's laws. example 53,54 Denstty: of a term: the relabon of ~ t s quanhty to that of the relevant extension of the data field; alterat~onby means of complementation 55; of holes m punched feature cards and m punched ~ t e mcards 58; pattern on graphs 84-5, and the binom~altheorem 136 Dependence ' m class~ficahons and hierarch~es 97-8, makes h~erarchrescompact 98-9,
l
1 i
'
1
1 l
displayed In the data field 105,106, and mxed codes 150 Descnpt~on:of ~temsby features and the transverse; dependent on amb~terms22, and s~gnificantnames 74,75 ; brlngs together transverse terms 91, and search 91,92; In operation sequence 94 Descnpt~veconbnuum: mtroduced 114. See semanuc continuum Dewey Decimal Class~ficahon: ment~oned 102 Digraphs. mtroduced 111, number In alphabet 120; and 80column (tab) card codlng 175. See also polygraphs D~mens~ons. In statistical tables 68,69, and quantmes; pos~tlonalarrays 82; and aggregates 82,83; and graphs 84-5, and bar charts 85-6 D~rectcodmg. rntroduced 110, as a supercode 112; and crossovers (false drops) 112-13, and 80-column (tab) cards 175 D~rectrange. part of sematlc contmuum above generic point 115,117 D~rectterms: noted 110 D~sjunctlonand arrays 62, related to mutual exclus~on82,209, and classed schedules 208,209; examples of an ~ntransverslvemtransltlve symmetric irreflexive relabon 216 Dlsplay' composed of data unlts 20,24, forms a data field when fitted into a net of terms 24-5 D~stances. In the holotheme, contrasted wlth poslhons; a type of quahty 162 Dlv~ders.m class~ficatlons80 Documentation mtroduced (as textual matter) 115. See textual matter Drills. for feature cards, compared w ~ t h punches 190-2 Dual~ty.pnnc~pleof, In the algebra of sets 54 Edge-notched cards: and '1247' coding 137; and interpretabon 149; generally represent items 170; described; advantages and d~sadvantages1762; and cumulat~ve features 21 1 Edge-tracked (edge-punched) cards: and automatic typewriters 180; described 180-1 ; and verbcal-visible storage 186 Elements: of codewords representing several terms 77,78; contrasted with characters 78; of subcode words 115 Emptiness: lack of information 15 Empty set: symbol for 70; included in every set, s~milantyto zero 71 Energy: related to occurrences m the
holotheme l63 Engineering stores: and classifications 102 Equal~ty(ident~cal). a cond~tion208; and classed schedules 208,212 Equipment: 'hardware' 37; related to codewords and to the marks representing them 77,90; for edge-notched cards 170; for slotted cards 173-4; for 80-column (tab) cards 175-9; for microcards (m~crofilmcards) 179; for edge-tracked (edge-punched) cards 180; for blind-tiled cards 183; flat-tray visible 183-6; verticalvislble 186; for hanging cards 188; for addressing machines 189; for punched feature cards 190-2; and the structure of the data field 221-2 Equivalence: and classed schedules; and synonyms 208,212; and copy checkmg 213 Events: m the holotheme; contrasted with processes; a type of occurrence 162 Expansion (binomal) : and porportions of presence and absence 135-6 Expans~on(in data field): as reverse of collapse; of units into runs and subfields 21 Expectation (chance, hkelihood): and density of punchlng on feature cards 58 Extensions: of the data field, transverse to each other 16; commutativeness within 23, of features - the plenitude; of items - the universe 51 Faceted clasmficat~onsof many h~erarchtes 150-1, and the~rcodewords 151 False drops. ~ntroduced112-1 3 See crossovers Features. Charactenst~csof thmgs represented m an Index 9,14; binary and unary, and amblfeatures: treated as unary, treated as present 17; notatlon for (present and absent) 18, m the field as a textile 23; as a bas~sfor quesbons 26; as classes of Items 27; speclfic and generlc 43, then ~dent~ty - the Items possessmg them 54; thetr quantlty - the number of such items, thetr pattern - a relat~onof ~dentlt~es, then dens~ty- a relatlon of quantlnes 55; direct 110; and subfeatures l I l Feature cards represent features or amblfeahlres 33, punched 33,39, plain 34,193, related to questions and Indexes 37-40,46, 47, names for 45, and log~cteach~ng54, 213 , and companson of chances 58, and search 60; and '1247' codmg 137-8, and matchmg systems 145-6, m general; equipment for 190-3 See also punched feature cards
247
Feature-card readers ment~onedl92 Feature-ident~tyquestlons. introduced 56, m practlce 59 Feature mdexes lnformat~onarranged feature by feature, not necessarily composed of feature vehlcles; normal and transverse 37, related to vehlcles and to questlons 46,47 Feature-quant~tyquestlons. Introduced 56, m practice 59 Feature questions. mentlon one type of term or ambiterm only 26; related to Indexes and types of data vehlcle 39-41, 46,47; subdlv~dedInto feature-~dent~ty and feature-quanhty questtons 5 5 4 , and connected data veh~cles168; and questlon analyas 230-3 Feature vehicles represent features or amb~features,feature cards 33, related to questlons and to Indexes 39,46,47. See also feature cards Fleld questions, menhon no terms or amb~terms27; call for a scan of both extensions of the data field 27,62; in control and study sltuatlons 62-3 Flat-tray v~siblemdexes. descnbed 183-6, and signals 184, and serial search 184-6, and h~erarchies, and personnel records l86 Forms: documents beanng no term names, examples of 36 Formatlve stages' degrees of complexity w~thlnlntegratlve levels 158-9, llsted 159; and relat~ons217-8 Full set. opposlte of empty set 70 Genenc polnt as unlty ln the semantlc contlnuum 114,154, ~ t ~dentlficat~on s 154 Generlc terms Introduced 43, and lnseparabll~ty44, equivalent to Intersected terms 51-2, and genenc names 54, and codewords, as lntersectlons of more spec~ficterms 77, related to transverslveness 77-8, in a demonstrat~onIndex 94, and the semanhc conmuurn 114; and monadlc terms 142, them ~dent~ficat~on 154-5 Grammar links and roles 143,144, and pont~onalcodewords 151 Graphs related to a data field 83, displaymg d~mens~ons 83,84, and pos~tlonal codewords 84 Graphlc codlng as a subcode 110-12. as spelllng and othemse, and crossovers (false drops) 112 Graph~crange part of semantlc contlnuum below genenc poxnt 115,117
Greateriless than (relative magnitude): discussed 71-2. Hanglng cards. described 188 Hard copy. of microfilm 199 Hardware equpment for lnformatlon handling 37. See equipment Heraldry and class~fication100-3 Hlerarchles a bans for class~ficat~on 79, and codewords 79-80,128; and dependence 98, and collapse 99; and codewords for numbers 103-4, shown In data field 105-6; and faceted class~ficat~ons 150-1 ; and flat-tray n s ~ b l ecards 186 Holotheme also 'pattern of meaning', complete collection of all ava~lableterms and relat~ons45; and the Unrversal and 102, In Dewey decimal class~ficat~ons general 154-65, and lts lntegrahve levels 156-7, and ~ t formative s stages 15840, s types 161-3, and mass. and ~ t semantlc length, ttme, energy 163; repehtlve patterns ln 165; relat~onsln 164,217-20; and queshon analysls 228 Homeostas~s property of unlts in the holotheme l59 Idempotency . of relat~ons,related to reflex~veness214 Ident~ficat~onIn operation sequence 91-2, as pnmary data handllng 226-7 Ident~ty of a term the collection of transverse terms ~tpossesses 54, alteration by means of complementation 55, used to subdivide term questlons 55-6; orthogonal to quantlty 129 Ident~tyquestlons a subdlvls~onof term questions 5 5 4 , examples 59,60 Immediate order. and the alphabet, words and terms 75 Immed~ateretrieval questlons introduced 94, of class~ficat~ons 105 Inclusion, proper lnclus~on dlstmgulshed from each other 70,71, compared w~th relatlve magn~tude71, and order 72; a c o n d ~ t ~ o208, n and classed schedules 208. 21 1-2, and cumulat~on,and punched feature cards, and edge-notched cards 2 11, and syllog~sms21 3 Inconjugate Indexes Introduced 148; and questlon analysls 232 Independence ln class~ficat~ons and h~erarch~es 98, and mlxed codes 150 Indexes Item, feature 40, normal and transverse 40,46,47, and storage order 47, mamfold, CO-ordmate96. and llbranes
102, palred 144-5, match~ng145-6, conjugate 146-8, strip (ment~oned)148, direct posltional 150 See Chapters 4 and 5 generally, and under special subjects Ind~fferences. commutative, opposed to sequences, as non-poslt~onalcodewords 82 Infinity and reflextveness217-8 Informat~on generated by change 8; unit of %,18, 'what we know' 14 Informat~onhandl~ng pract~calaspect of data study, as m m c r y 9, as a succession of actlvlt~es92 Informat~ontheory ment~oned10; and blts 18 Inseparable terms Introduced 44, and indexes 94 Installations, software plus hardware, rules and methods plus equ~pment37 Integrative levels levels of complexity of structure, named, typ~calconcepts m 156-7 Interpretation. a type of translation, and edge-notched cards 149, and 80-column (tab) cards 149,178 Intersected terms the equivalent of generic terms 51-2, related to transversxveness and to unlted terms 77 Intersect~on symbol for 51, an actlve relat~onbetween sets 51-2, related to unlon 51-4, its affinit~eswlth multlphcation 57, In a Venn diagram 68, In a statlst~caltable 68,69, related to transverse terms 71,72,94,95,97; and collapse 116, an operatlon 208, as an example of a symmetnc, assoclatlve relat~on214-6 Items th~ngsrepresented In an index 9,14, and ambiitems, b~naryand unary; treated as unary; treated as present 17, notahon for (present and absent) 18, m the field as a text~le23, as a basls for questlons 26, as classes of features 27, specfic and genenc 43-4, their ~ d e n t ~ t ythe features they possess 54, thelr quantlty the number of features they possess, then pattern, a relatmn of ldenhaes, t h e r dens~ty,a relation of quantltxes 55; dlrect 110, sub~tems111, as features 116 Item cards represent ltems or ambutems, plan and punched 34, related to questlons and to Indexes 3740,46,47, names for 45, and search 60, edge-notched 137,149, 170-2, slotted 173-4, 80-column (tab) 132,149,174-9, mlcrocards (coded microfilm cards) 179, edge-tracked (edge-punched) 180,186; bllnd card Indexes 182-3, m flat-tray viable lndexes 183-6, in vertmal-visible indexes 186;
hanang cards 188, addressing machine plates or stenclls 189. See also punched Item cards Item-ldentity quest~ons Introduced 56, m practlce 59 Item-quantity questions. Introduced 56,60 Item ~ n d e x e s~nfonnationarranged Item by item, not necessarily composed of item veh~cles,normal and transverse 37; related to vehlcles and to questlons 47 Item qucstlons. mentlon one type of term or ambiterm only 26, related to Indexes and types of date veh~cle39-41,46,47, subd~v~ded Into ~tem-identityand Itemquantlty questlons 55-6, and questlon analysls 230-3 Item vehicles represent items or ambutems, Item cards 34, related to questlons and to ~ndexes3740,46,47, See also ltem cards Keywords: preferred words for terms (features) in schedules 233-4 Lattice diagrams. described 69-70; show~ng inclusion 70, for two terms 70-1 ; and chains; and order 72; for three terms 73, for four terms 127-8, and b~narycodes 128 Length: related to scales and to qual~ties m the holotheme l63 Lesslgreater than (relative magnitude)' d~scussed71-2 L~brarles and hierarchies 80, and class~fications; and Indexes 102; and crossovers 115,117; and order of storage of data vehicles 149, and questlon analys~s228 See also textual matter Life sclences and classficatlon 103; and nomenclature l04 L ~ g hboxes. t and punched feature cards 190 Llkellhood (chance, expectation). and dennty of punchlng on feature cards 58 Links. and monad~cterms; and subitems 143, and textual matter 143-4, and supercodes; and roles; as a means of duplicatmng vocabular~esl44 Logic. and feature cards 54 Magnetic cards: mentioned 188 Magnetic core memories: in computers, described 202 Magnetic tape and film: described, and parity checks 201 Manifolds: and class~ficat~ons and schedules 96 Mark: a separate part of a record 31 ; read by data handling equipment 90 Mark-sensing: mentioned 188
Mass: related to things, in the holotheme 163 Matching indexes. described 145-6 Meaning, pattern of: mentioned 45; m general 154-65. See holotheme Medi-co-ordination : mtermediate between pre- and post-CO-ordmation 96 Membership: a relation between terms and transverse terms; symbol for 19-20 M~crocards(coded microfilm cards): described 179 Microfiches: mentioned 179,188 Microfilm: described briefly 199 Mixed codes: and subcodes; w ~ t hdependent and independent elements 150 Monadic terms: and crossovers 142 Multiplication related to lntersection 57 'Mutex'. mutually exclusive 210 Mutual exclusion: introduced 36; and arrays 81-2; and dlsjunction 82,209; and crossovers 121 ; and positional codewords 124; and verificat~onor check~ng179, 209-10; a condition 208; and classed schedules 208-10; and punched feature cards 209; and syllo~sms214 Names. spoken or wntten symbols for terms 15-6; not to be confused w ~ t hthe~r terms 20; network of (net) 24, generlc and specific 43,54, of a unlon of features, of a set (an Intersect of sets) of Items 71-2, arbltrary ; descnpt~veor sign~ficant74,75; normal and transverse 75; words and codewords 77; and storage order 91, and translat~on91-2 Net: of names, and the d~splayof states, formng data field 24 Nose: that which garbles ~nformat~on 10 Nomenclature: varled and non-standard 9,14. See ternunology Normal indexes: use veh~clesrepresenting terms of the type used for arrangmg the informat~on37, use veh~clescarrymg arb~trarynames 75; m libraries 102 Normal names arbltrary ; put normal ~ndexesm order, contrasted wlth transverse names 75 Not and complementation 50 Notat~on.for presence, for absence, for b~narydata unlts, for terms and amb~terms18, for membersh~p19; for rec~procalmembership and reciprocal non-membershp; for runs 20; for subfields 21; for complementation 50, for Intersection 5 1; for resultancy ; for union 52, omisslon of denominator m add~tionand subtract~on58, for the
empty set 70; positional75-6, and grammar in positional codewords 150; and faceted classificat~ons Notchmg: deep and shallow, on edgenotched cards 172 Number, numbers: and decimal system 1034,125; orthogonal to idenuty 129; defined as a set of sets; and set-reciprocally defining; and power series 134, codes on edge-notched cards, and the '1247' method 137; relations concerning; and conditions and operations 216. See also quantlty Numerals: and natural order 72,74; used to impose and to represent order 74 Objects. a varlety of t h ~ n g ,contrasted wlth substances 160; In the holotheme 160-1 Observat~on:contrasted w ~ t hexpectation 58 Occurrences: a semantic type in the holotheme 161, consist of events and processes 162; and energy 163 Operations active relations; related to cond~t~ons 164; entltlve and attribut~ve (between sets and between numbers) 216 Order part~al,simple, and latuce d~agrams 72, natural numbers and numerals 7 3 4 , convenaonal the alphabet, represented or imposed by alphabets and numerals 74, ~ m m e d ~ a75; t e and codeword elements 79, and pos~tionalarrays 82, storage 91 ; arrangement of polygraphs 120. See also storage order Ostensive definit~on. ment~oned74,154 Overlap. a cond~tion,and classed schedules 208; and check procedures 210-1 ; and syllogisms 213-4 Palred ~ndexes Items of one the features of the other, and the transverse 144-5 Panty checks described 201 Partial order. and inclusion 72,211,212 Partltlon. of a set of ~temsby an array of features and the transverse 81-2 Pascal's tnangle displayed 135; and subcode capacmes; and the b ~ n o m ~theorem al 135-6 Passive relations cond~tions164,208 Pattern of a term the relation of ~ t s rdent~tyto the appropnate extension of the data field 54-5, alterat~onby means of complementat~on55 Pattern recognition (character recognltlon) ment~oned188 Pattern of Mean~ng complete collection of all available terms and relahons 45, also 'holotheme' 45,154-65. See holotheme Personnel records. and data handllng operations sequence 92; and translation
m conjugate ~ndexes149, and flat-tray visible indexes (clasnficat~onand h~erarchy)186 Phrases as sets of dist~nctterms 110 Plain cards. unpunched, Item, feature 34, vanous 182-9,193-5 Pla~nfeature cards. normally unary 34, descnbed: unlterm cards 45,193 Plain item cards: normally unary 34, bhnd card indexes 182-3, cardwheels 183, in flat-tray visible indexes 183-6, m vertical-v~s~ble indexes 186; hangmg cards 188, address~ngmachine plates and stencils 189 Plemtude the feature extension of a data field, all the features 51 Polygraphs. introduced 111; numbers ln alphabet, order of citatlon or arrangement 120, and capacities of subcodes 122-3, related to b~narysubcodes 127 Postt~ons. tn the holothemes; contrasted w t h d~stances;a vanety of quality 162 Pos~t~onal mdexes. daect ; and crossovers, and mxed ~ndexes150 Pos~tionalnotation mtroduced 7576, in codewords 78,79, in transverse codewords, poslhon of denved from a class~ficat~on, symbol affects its meaning. dependence 79, and graphs 84; in data field 106. superimposed 139, mxed 150 See also posit~onalsubcodes Posit~onalsubcodes mentioned 78-9, structure 123, capacity, and mutual exclusion 124, and declmal systems 125, related to commutahve subcodes; weights, money values, dates; d~mensions126, and h~narynotatlon 129-31. super~mposed139 Post-CO-ordinauon: and pre-CO-ordmation; co-ordinates terms after storage 95 Power senes and codes for numbers 134 Pre-co-ordination and post-co-ordination, CO-ord~nates terms before storage 95 Pre-scored cards mentioned l75 Presence: possesaon of a feature by an Item; symbolised by blob on data field diagram, pontive In feel 15. a state, symbol for 18, In the field as a textile 23, and binary codes 114; and the semantic contmuum 115, symmetry w t h absence In subcodes 129, and binom~als135-6 Pnmary handling (of data) countmg, ident~fying226, m general 226-7,230-6 Processes in the holotheme; contrasted with events, a varlety of occurrence 162 Proper inclunon . and mclus~on71, and svllogsms 213. See ~nclusion
Punched c a r d s feature 33, item 34; in data field 38-40, vanous 170-81,190-2. See also punched feature cards, punched item cards, edge-notched cards and the hke Punched feature cards: represent ambifeatures, generally bmary 33 ; and transfer sheets 36, related to feature questions and the data field 38-40,46,47; related to Item questions 46,47, and logic teach~ng53,54, 213, and positional codes 125; and a d d ~ t ~ vsubcodes e 137; and paired indexes 145, and 80-column (tab) cards 175; and self-venficat~on179, and transparenaes 190,209, in general; equpment for 190-2; capacities of 192; and accuracy checks 209,210; and cumulat~vefeatures 21 1, and logic (syllogisms) 213 Punched Item cards: represent ambuterms: generally binary 34, ~ncludeedge-notched, slotted, body-punched, edge-tracked (edge-punched) types 38, related to item questions and the data field 38-40,46,47; related to feature questlons 40,41; and pos~tionalcodes 125; edge-notched 137, 149,170 2; slotted 173-4, 80-column (tab) 174-9, mcrocards (coded microfilm cards) (~ncluded)179, edge-tracked (edgepunched) cards l80 Punched paper tape. mentioned 132,180, descr~bed133,200; and paper roll; and automatic typewnters 200; and panty checks 201 Punches: for 80-column (tab) cards. ment~oned175; for feature cards, compared w t h dr~lls190-2 Quant~t~esa semantlc type in the holotheme 161-2, conslst of posiuons and distances, persistence of 162, and length 163 Quanhty of a term the number of transverse terms it possesses, its reducuon by means of complementation 55, used to subdivide term questions 55-6, arrays of d~mensions82, orthogonal to ident~ty129 Quant~tyquestions subdivision of term questlons 55-6 Quesuons Item, feature, term 26,30; field 27, unit 27,30, subfield 30, related to Indexing methods 38-43, related to types of data veh~cle3843,46,47: meffiaency of ask~ngtransverse questlons of a normal Index 41,46,47, identity and quantity 55-6, immediate retrieval 94,105, and computers 204-5 Question analys~s mtroduced 27, discussed 227-8, and pnmary data handling 23&3
Random access and computers, and separate data veh~cles169 Rank posltlon In an array 82 Reclprocat membersh~p symmetnc relahon between ltems and features 19-20, symbol for 20, m the field as a textlle 24, ~na demostrahon Index 93, as an example of a transverslve relahon 216 Reclprocat non-membersh~p symmetnc relat~onbetween ltems and features 1%20, symbol for 20, In the field as a text~le24 Record the change brought about m a data vehlcle to represent lnformahon 31 Reflexiveness of relahons, related to ~dempotency214, related to lnfinlty 217-18 Relahons transverslveness 16, complementatlon 18, membersh~p19, rec~procal membersh~p19,215, rec~procalnonmembersh~p20, collapse 20,21,24,99, mutual excluslon 36, resultancy 50, lntersechon 51-2,21&6, unlon 52-4, subtract~on56, addltlon 56-7.216, multlpllcahon 57-8, lncluslon 70, proper lnclus~on(contammg) , relahve magn~tude 7 1, collect~veexhaust~veness8 1, transformahon 97-8, dependence 98, compresston 115,116,117, contrasted wlth terms, in the holotheme, conslst of cond~t~ons and operatlons 164, passlve and actlve 164,208, m general 208-23, equahty , equivalence 208,212-3, reflexlve, symmetnc, idempotent, commutatzve 214, transltlve , assoclatwe 214-5, sequence of 2144,218, transverslve 216, codes for, dlsjunctlon 216-7; reflexlve, and the lnfinlte substances, ~rreflexhve,and the fin~testructures, objects 217-8, m the holotheme 217-20. See also the varlous relat~onslndlvldually Relat~oncodes thelr format~on216-7, and formatlve stages 217, and lntegratlve levels 220 Relatlve magn~tude symbol for, compared w ~ t hlnclus~on71, and slmple order 72 Representahonal fields noted 44 Result notat~onfor 50 Retneval plckmg out something whose address 1s known, and translahon, and storage order 91, tn operahon sequence 92, and lmmed~ateretrieval questions 94,105 Roles and lmks, and textual matter, parts played by term m sentence 144 Runs sets of data unlts 20, and amblsubterms 20-1 Run queshons mtroduced 27, a type of
252
unit question 28; mentlon one of one type of term and many of the other 42; series of; constant part and changing part 60 Schedules and manifolds, and h~erarch~es 96, and monad~cterms 142, and condlUons 208; classed 208-13, and storage order233-5 Search. nmultaneous or senal, of punched cards 40,41; a fundamental operation of normal punched item card Indexes 41, as a scan of the data field, as a senes of umt queshons 60, ends wlth findlng 91, and descnpt~on;uses transverse to find normal terms; followed by identlficahon or countlng 91-2, ~noperatlon sequence 92, obv~atedIn spec~alcases 94, may be obvlated m h~erarch~es or class~ficat~ons 102, and connected and separate data vehlcles 168-9; and edge-notched cards (s~multaneous)17&1, and slotted cards (s~multaneous)173-4, and 80-column (tab) cards (senal) 175, and mlcrocards (senal) 179, and Rat-tray vls~bleIndexes (serial) 184-6; and queshon analys~s231-2 Secondary handlmg (of data) calculat~on, mterpretahon 226-7 Secuonal data vehlcles exemplaiied by 80column (tab) cards 198 Self-determmate of data fields 44 Self-venficatlon ln punched feature card Indexes 209-10 Semantlc contlnuum composed of terms and parts of terms 113-4, and the contlnuum of rat~onalnumbers 114 Semant~ctypes thmgs, quahties, occurrences, changes, ent~tiveand attr~butlve; passlve and acuve 161-2; terms and relat~ons164 Separable terms- Introduced 44, and Indexes 94 Separate data vehicles In general 168-96 subwdes, opposed Sequences as pos~t~onal to lnd~fferences82 Set. or class 19,54, orthogonal to number 129, defined by number 132,134; rec~procally,defines number 134, relations concerning (cond~t~ons and operatlons) 214-16 See also ~ d e n t ~ t y S~gndicance(statlstlcal) m punched feature card Indexes S8 Slm~lar~ty' of Items and features w~thlnthe data field 2 2 S~mpleorder. and relatrve magn~tude72, and lncluslon 72,211 Slotted cards described, equipment for 173-4 Software rules for lnformat~onhandllng 37
)
l
1
Spec~ficterms. introduced 43; and mseparabil~ty44; the equ~valentof united terms; difference from un~tedterms; and specific names 54; unlons of more generic terms 54,77; and codewords 77; in a demonstration Index 94; m the semantlc contlnuum 114,154; and monadic terms 142-3 ,Stacking: fundamental operation of normal punched feature card indexes 40; of punched cards. to seek transverse terms 40-1 Standard~sat~on: of data field 20 Statements. data field a set of 19 States presence and absence 18; properties of data unlts; change of 19; mtermed~ate; of terms and of the data field 20; and unrt questlons 21 Statlst~cs and the data field 56-8, tables 68-9, graphs 8 3 4 , bar charts 85, and crossovers 117, and subcodes (the bmom~altheorem) l36 Storage order and terms, and types of data veh~cle37, normal and transverse 37.47, and vehlcles and questlons 47, and names 91, and order of arnval of Items m ~ndexes106, general d~scuss~on of, m schedules, and classlng 233-5 Stnp ~ndexes ment~oned(translat~on)148, descnbed 193-5 Structural definmon ~ntroduced135 Subassembl~es w~thtnlntegratlve levels 159, relat~onsconcernlng, m the holotheme 21 7-8 Subcodes as graph~ccodes, and generlc and spec~ficterms, and crossovers (false drops) 112, syllab~c113, and the semantlc contlnuum 113-4, below genenc p o ~ n t 114, commutatlve 120-3,126, capacltles of commutatlve 122-3, posttlonal 123-5, 126, capacltles of pomt~onal123-4, b~nary127-32, and the blnom~altheorem 134-6, add~tlve137-8 Subcomb~nes w~thlnIntegrative levels 159, relattons concernmg, m the holotheme 217-8 Subfeatures unlon of, forms features 111 Subfields as data unas; as runs, collapse of 21, and questlons 30, most numerous structure m the data field 42, m the data field for a h~erarchy106 Subfield questlons ment~oned28, a type of unlt question, examples of 30 Subfield veh~cles represent many features or amblfeatures, sfiown agalnst many Items or ambntems 34, and transfer sheets 36 Sub~tems unlon of, forms Items 111 Subject matter discussed 115,116,117, 142-4 See textual matter
.
Substances contrasted w ~ t hobjects, a vanety of thmg 160, m the holotheme 160-1 Subterms unlon of, forms terms 111 Subtract~on ~ t affimt~es s w ~ t hcomplementahon 56 Subsystems w ~ t h ~lntegrahve n levels 159, relat~onsconcernmg, In the holotheme 217-8 Subunits wthm lntegratlve levels 159, relat~onsconcernlng, m the holotheme 217-8 Supercodes relat~onof genenc to spec~fic terms In 112, and crossovers (false drops) Super~mposedcodlng m use 138-9, pos~t~onal and commutatlve 139 Syllab~csubcodes ment~oned113 Syllogsms descnbed, In terms of the data field, and punched feature cards, and lncluslon 213, and overlap 213,214; and mutual exclus~on214 Symbols for vanous relat~ons18,19,50,51, 52,215 See the varlous concepts symbohsed, also see notat~on Symmetry of the data field, arlslng from rectprocal membersh~p19, effect of, on unlon and ~ntersectlon72, of presence and absence 129, of relahons, related to commutatzveness 214 Synonyms and equ~valence212-3 Systems of unlts and assembltes, typ~cal ~nhab~tants of a formatwe stage 158-9, relat~onsconcernmg, m the holotheme 217-8 Tab cards (SO-column cards) In general 1 7 4 9 See 80-column cards Tags documents beanng a name only, examples of 36 Tape punched paper 132-3,180,200, magnetlc 201 Taxonomy study of class~ficat~on In the l~fe saences 103 Terms Items or features 15. unary and bmary, and amb~terms17, notatlon for (present and absent) 17-8, state of, not to be confused w t h theu names or states 20, m the data field as a texhle 23, as a bas~sfor questlons 26, spec~ficand generic 43, separable and ~nseparable44, Intersected (genenc) 51-2, un~ted(spec~fic) 54, t h e ~ ~denhty r 54-5, thelr quanhty, the11 pattern, thelr denslty 55, and coordlnat~on95-6, d~rect110 and subterms 111. represented by graph~c codes or subcodes 111,112, genenc and spec~fic,related to subcodes 112, monad~c 142. as a semantlc type 162
Term cards name sometimes means feature cards only 15; here name signifiesitem cards and feature cards 40 Term questlons. item questions or feature questions, call for scan of one extension of the data field 26; and vehicles and indexes 39-41,46,47, subdivided into identity and quantity questions 55-6 Termnology vaned and non-standard 9, 14; the word 'term' 15; 'term cards' 40, names for cards 45. retrieval 91 , pre-, medi- and post-CO-ordination 95,96, co-ordinate ~ndexlng96, 'descriptive continuum' 114, total and complete indexing 146-8 Tetragraphs introduced 111, number in alphabet 120, as example of supenmposed coding 138-9. See also polygraphs Textual matter. and crossovers (false drops) 115-6; and compressed data fields 1167, and monadic terms 142; and links 143-4. and roles 144, and alphabetic order of terms 149 Things. consist of objects and substances 160-1, a semanuc type in the holotheme 161-2; and mass 163 Time: related to changes in the holotheme 163 Transfer sheets as subfield vehicles 36, In punched feature card Indexes 36,210, and self-venfication 210 Transformation of class~ficationsand hierarchies 97.98 Trans~tlvity. of relations 214; related to non-associativlty 21 4 6 Translation: coding a form of 77, series of 90, and retrieval 91, in operation sequence 92-3; and strip indexes 148, in conjugate and other indexes 148-9 Transparencies and punched feature cards 190,209,211, and self-venficatlon 209; and cumulative features 21 1, and copy checking (equivalence) 213 Transverse indexes use vehicles representing terms transverse to the type used for arranging the information 37, and retneval 92,93,102, obvlate search, ldenthcatlon, translatlon 93; and hbranes 102 Transverse names descriptive, s m c a n t , order transverse indexes; contrasted with normal 75 Transversiveness relation of the extensions of the data field to each other 16, of relations; between sets 216 Typewriter (automatic) and edge-tracked cards 180
.
2 54
Unary terms: and binary terms 17 See also terms Unlon an active relation between sets; symbol for 52, related to mtersection 52-4, its affinities with addition 57, III a Venn diagram 68 ; in a stabstlcal table 68-9, related to transverse terms 71,72, 93,94,95, and digraphs, polygraphs; and graphic codlng 111, and direct coding 113, and compresmon 116-7, an operauon 208 Unlts as typical mhabitants of a formative stage 158; relations concerning, in the holotheme 217-8. See also data unit, bit. Unit cards discussed 3 1-3 Unit quesbons. introduced 27, related to indexes and to data vehicles 42 Urut indexes made of unit vehicles 32 U N vehicles. ~ represent data units 31-3 United terms- equivalent to spechc terms 54, related to transversiveness and to intersected terms 77 Uniterm cards mentioned 45,193 Universal Decimal Clasmficatlon. mentloned 102, code sequence; for complex subject matter 150 Unlverse the item extension of the data field; all the items 5 1 Venn dlagrams descnbed 67-8, showlng unions and Intersections of sets 68 Verification. of 80-column (tab) cards 178; of punched feature cards 178-9; and panty checks 201 Venfiers described 178 Vertical-visible indexes. descnbed; and plain Item cards 186; and punched feature cards 186,192 Vievvlng frames (hght boxes): and punched feature cards 190 Visible Indexing flat-tray visible 183-6, vertical vislble 186; hanging (top-edge) visible 188 Warning cards and crossovers (false drops) 121 Words. of plain language, and dictionary mealungs, contrasted wlth codewords 77, In computer memories or stores 203 Zero. the origm, the starting polnt 8, simlanty to the empty set 71, and the semantlc continuum 115 40-column cards (tab cards): mentioned 174 80-column cards (tab cards): and coding 132; and lnterpretation 149, descnbed 174-9, acting as feature cards; their coding, and computers 175; equipment for 175-9, as sectional vehicles 198, and pnmary and secondary data handling 229