Mathematical Biosciences 5 (1969), 327-340

Copyright © 1969 by American Elsevier Publishing Company, Inc.

 

The Theory of the TAXIR Accessioner *

 

GEORGE F. ESTABROOK and ROBERT C. BRILL

University of Colorado, Boulder, Colorado

Communicated by S. M. Ulam

* Paper No. 18 from the Taximetrics Laboratory, University of Colorado, Boulder Colorado; Supported by the National Science Foundation under grant no. GN 656.

 

ABSTRACT

TAXIR is a recently developed and currently operative information-retrieval system for biology. The "accessioner" module of TAXIR automatically abstracts from a collection of items those that satisfy a given locating criterion. The TAXIR accessioner stores a large number of descriptors in relatively little storage space, in such a way that any Boolean combination of those descriptors may be used as a locating criterion. Retrieval is effected by calculation as opposed to comparison.

 

I. INTRODUCTION

TAXIR (the name stands for taxonomic information retrieval), a product of the Taximetrics Laboratory of the University of Colorado, Boulder, Colorado, is a powerful automatic information-retrieval system recently developed for biology (related work may be found in [1-4]). It embraces several independent but related modules, each specifically designed to perform some function basic to information retrieval. These modules may be used separately or together in accordance with the specific information-retrieval requirements of any given TAXIR user.

The TAXIR Accessioner is one of the TAXIR modules. As a locating device, it serves to discover which items in some large collection of items satisfy a given locating criterion. As an abstracting device, it provides requested information associated with items that satisfy a given descriptive criterion. As an identifying device, it provides identifications in response to descriptions of unknowns. How the TAXIR Accessioner works to make an efficient use of time and space while it performs these functions is the topic of this article.

 

II. BASIC CONCEPTS

TAXIR is designed to aid in the management of the information associated with a collection of items (specimens, lots, jars, populations, etc.) such as might be found in a museum or in a collection assembled for monographic, floristic, or faunistic work. The term item is used consistently throughout this article to refer to one of these basic elements of the collection.

Information for use in the TAXIR system is structured into descriptors. There are many ways of envisioning this concept. Let us discuss two.

1. A descriptor is a basis for comparing any two items in the collection.

With respect to this basis for comparison, it must always be possible to decide if two items are similar or different. An example of a descriptor is the family to which an item belongs. Two items are similar if they belong to the same family and different if they belong to different families. Likewise, country of collection, collector, genus, species, storage location in museum, month of collection, and many more bases for comparison could be considered as useful descriptors.

2. A descriptor is a partition, or a dividing into exclusive and exhaustive subsets, of the collection of items. The notion of similar and different discussed in the preceding paragraph is derived from this partition in a natural way: items that belong to the same subdivision of the partition are similar; those that belong to different subdivisions are different. Any partition can be a descriptor; to be useful, however, the notion of similar and different associated with the partition must be interpretable in a meaningful way.

This notion of similar and different is known mathematically as an equivalence relation. A theorem in mathematics teaches us that there is a bi-unique correspondence between equivalence relations and partitions. A moment's reflection, however, should convince us that concept (1) and concept (2) are indeed the same thing.

In analogue with the two equivalent concepts of descriptor, there are two concepts of descriptor state. If we think of a descriptor as a partition, then we may think of a descriptor state as one of the "classes" or subdivisions of the partition. If we think of a descriptor as a basis for comparison, then we may think of a descriptor state as all the items characterized by one of the possible conditions of that basis. In this way, a description of the common property shared by all the items which belong to the same state of a descriptor may be considered as a name for that descriptor state. To continue our earlier example, the descriptor "family" would have for its state names the names of the families represented by the items in the collection.

To design an information-retrieval system using the TAXIR modules, it is necessary to have a specific collection of items in mind. It is also necessary to designate some number of descriptors for this collection. The data bank is the totality of items, each respectively associated with the descriptor states to which it belongs.

The totality of all descriptor names, together with all descriptor state names, is called the control vocabulary. With this vocabulary, enriched slightly by the addition of a few special words, expressed through the grammar and syntax of the TAXIR language, we communicate with the TAXIR Accessioner.

The remaining basic concept required to describe the class of data banks with which the TAXIR Accessioner can be meaningfully employed is that of item identification number. Each item in the collection must be assigned an item identification number (positive integer). It is through this item identification number that the vital correspondence between the item in the collection and its associated information in the TAXIR Accessioner is maintained. The TAXIR user is free to assign item identification numbers in accordance with any scheme he chooses to employ.

 

III. THE TAXIR ACCESSIONER QUERY LANGUAGE

A. Boolean Expressions

Boolean expressions are not only important for understanding the query language of the TAXIR Accessioner, but are also central to understanding the design theory of the accessioner module itself. The accessioner module of the TAXIR system accepts locating criteria as Boolean expressions, and responds by providing lists of item identification numbers for items that satisfy these criteria. The query language of the TAXIR Accessioner thus embraces the algebra of Boolean expressions.

In general, whenever a set of operands is specified, Boolean combinations of operands (Boolean expressions) may be inductively defined as follows.

i. If A is an operand, then A is a Boolean expression.

ii. If A and B are Boolean expressions, then so are: NOT (A); (A) AND (B); (A) OR (B).

iii. If A is a Boolean expression, then so is any logical equivalent of A.

A good, relatively unsophisticated, and very readable discussion of logical equivalency is given in Chapters 1 and 2 of [7].

In the specific case of the TAXIR Accessioner query language, the set of operands may be defined with the concept "descriptor range." A range for a descriptor is a combination of some of the states for that descriptor, which can take the forms.

1. a descriptor state: abbreviated DS;

2. FROM DS TO DS;

3. X1 OR X2 OR ... OR Xn where Xi is either DS or FROM DS TO DS, i = 1, 2,..., n, and n is any integer greater than 1.

An operand for the Boolean expressions implicated in the TAXIR Accessioner query language may now be defined as a pair: descriptor name, any range for that descriptor. (The one other form that a TAXIR operand can take is discussed at a more appropriate point, the end of Section IV.) The function of the TAXIR Accessioner may now be more precisely stated: (a) to accept a Boolean expression built as defined, and (b) to respond with the list of item identification numbers for exactly those items which satisfy that Boolean expression.

B. Example

In order to illustrate the concepts presented in this development, an example is provided here and carried throughout. The example data bank is described by Table I. In this data bank there are 10 items whose item identification numbers are shown in the first column. There are four descriptors: flower color, leaf venation, geographical location, and month of flowering. The states of one of these descriptors, month of flowering, have a natural order, April, May, June, July, and August, of which the TAXIR Accessioner is apprised. The appropriate state names are given in the body of the table.

TABLE I

Item I.D.

number

Flower

color

Leaf

venation

Geographical

location

Month of

flowering

100 Blue Palmate Vermont April
102 Blue Pinnate Delaware August
120 Red Unknown Maryland May
116 White Parallel Virginia June
103 Violet Palmate Maryland June
114 Red Palmate Maryland July
115 Pink Parallel New Jersey April
106 Yellow Pinnate Virginia May
110 Pink Pinnate Maryland June
112 Blue Pinnate New York August

A query that could be posed to the TAXIR Accessioner concerning this small example data bank might be:

Query--List items with (flower color, red OR pink OR NOT leaf venation, AND month of flowering, FROM April TO June AND geographical location, Maryland*.

An easy scan of Table I reveals that the items that constitute the response are those with identification numbers 120, 103, and 110.

 

IV. BASIC DESIGN OF THE TAXIR ACCESSIONER

The information represented by a data bank is processed and stored by the TAXIR Accessioner in such a way that the list of items that satisfy any given Boolean expression may be readily discovered. The concept of a characteristic function is central to the understanding of this process.

A. Characteristic Functions

Let us denote the collection of items implicated in a data bank with the letter S. To each subset A of S there corresponds a characteristic function, denoted XA.   XA is the function with domain of arguments, S, defined as follows.

XA(x) = 1 if x ∈ A;

XA(x) = 0 if x ∉ A.

Each state of each descriptor may be thought of as a subset of S. Let A be such a subset. The information concerning which items of S belong to A (i.e., are described by the descriptor state in question) is embodied in XA.

Characteristic functions are a particularly convenient means of storing information in computing machines, as they may be represented as strings of bits. If a fixed but arbitrary linear order of the items in S is determined (automatically by the TAXIR Accessioner itself), the bit string corresponding to XA may be constructed to have a 1 in the ith place if the ith item in the linear order of the items of S belongs to A; otherwise 0.

Characteristic functions are also convenient for executing any Boolean expression, for there is a natural analogue between characteristic function operations and Boolean operations. If A is the subset of S of items with flower color, red, then S - A is the subset of S of items with NOT flower color, red.

‾XA(x) = XS-A(x) = 1 if XA(x) = 0;

‾XA(x) = XS-A(x) = 0 if XA(x) = 1;

If B is the subset of S of items with leaf venation, pinnate, then AB is the subset of S of items with flower color, red AND leaf venation, pinnate.

(XA AND XB)(x) = XAB (x) = XA(x) · XB(x).

Similarly, AB is the subset of S of items with flower color, red OR leaf venation, pinnate.

(XA OR XB)(x) = XAB(x) = max [XA(x), XB(x)].

If characteristic functions for the operands in a Boolean expression are available, the characteristic function for the subset of S that satisfies which Boolean expression (result characteristic function) may be calculated as illustrated above. These calculations, often referred to as logical arithmetic, are easily and quickly performed by modern computers.

B. Example Continued

The 10 items in the example data bank are placed in the order in which they are described to the system (Section III, Table I). Assume that this order is 100, 102,..., 112, the order in which they occur in Table I.

The operand characteristic functions necessary to compute the result characteristic function for the example query are shown in Table II.

TABLE II

Operand

Characteristic function

Flower color, red OR pink

0

0

1

0

0

1

1

0

1

0

leaf venation, pinnate

0

1

0

0

0

0

0

1

1

1

Month of flowering, FROM April TO June

1

0

1

1

1

0

1

1

1

0

Geographical location, Maryland

0

0

1

0

1

1

0

0

1

0

Executions

 

leaf venation, pinnate

0

1

0

0

0

0

0

1

1

1

 

NOT

NOT leaf venation, pinnate

1

0

1

1

1

1

1

0

0

0

OR

OR

Flower color, red OR pink

0

0

1

0

0

1

1

0

1

0

 

1

0

1

1

1

1

1

0

1

0

AND

AND

Month of flowering, FROM April TO June

1

0

1

1

1

0

1

1

1

0

 

1

0

1

1

1

0

1

0

1

0

AND

AND

Geographical location, Maryland

0

0

1

0

1

1

0

0

1

0

Result characteristic function

0

0

1

0

1

0

0

0

1

0

The result characteristic function has 1's in positions 3, 5, and 9 corresponding to items 120, 103, and 110 as before.

C. Base Characteristic Functions

To store a characteristic function for every conceivable operand (descriptor, descriptor range) for a data bank would be convenient for computation but would require a great deal of space. The TAXIR Accessioner stores instead base characteristic functions. These are significantly fewer in number and have the property that any descriptor state characteristic functions (and thus any operand) can be calculated from them. We are assured of the existence of base characteristic functions by the following theorem.

Let T be the set of characteristic functions for the subsets of S. A choice function for T will be a function with domain T and range T. If p is a choice function for T, then for CT, p(C) = C or p(C) = ‾C. Choice functions are similar to characteristic functions of characteristic functions.

 

THEOREM 1. Let X0, X1, X2,..., Xm-1 be the characteristic functions for the m states of a descriptor for the set S of items. Then (1) there exists a subset UT, U = [C1, C2,..., Cn] (the base characteristic functions) with n < ln2(m) + 1, and (2) for each i, i = 0, 1, 2,..., m - 1, there exists a corresponding sequence of choice functions pi1, pi2, ..., pin such that for all s ∈ S, Xi(s) =j=1 n [pij (Cs)](s), where denotes arithmetic product (logical AND).

 

Proof. Let Cj(s) = 1 iff for that value of i such that Xi(s) = 1, i divided by 2j leaves h as a remainder, with 2j-1h < 2j. In this way the set U is constructed.

Let bnbn-1bn-2· · · b1 be the binary representation of i. Choose a sequence of choice functions pi1, pi2, ..., pin for i with the following values on U.

pij(Cj) = Cj   if bj = 1

and

pij(Cj) = ‾Cj   if bj = 0,   j = 1, 2, ..., n.

Now, Xi(s) = 1 iff ∀ j[Cj(s) = 1 iff bj = 1] iff [pi1(C1)](s) · [pi2(C2)](s) · ... · [pin(Cn)](s) = 1, as was to be shown.

The members of the set U are the base characteristic functions for the descriptor in question. When m is large, n will be significantly smaller (n < ln2(m) + 1). In order to reconstruct Xi from U, the binary representation of the ordinal code i provides the appropriate choices in U, for an item s belonging to state i has Cj(s) = bj.

D. Example Continued

Consider the descriptor geographical location, whose seven states, their ordinal codes, and their characteristic functions are shown in Table III. (The state "unknown" is tacitly considered to be a state of every descriptor; it is assigned the ordinal code zero in all cases.)

TABLE III

 

 

 

Characteristic functions

State name

Ordinal code

 

100

102

120

116

103

114

115

106

110

112

Unknown

0

X0 =

0

0

0

0

0

0

0

0

0

0

Vermont

1

X1 =

1

0

0

0

0

0

0

0

0

0

Delaware

2

X2 =

0

1

0

0

0

0

0

0

0

0

Maryland

3

X3 =

0

0

1

0

1

1

0

0

1

0

Virginia

4

X4 =

0

0

0

1

0

0

0

1

0

0

New Jersey

5

X5 =

0

0

0

0

0

0

1

0

0

0

New York

6

X6 =

0

0

0

0

0

0

0

0

0

1

The three base functions for the descriptor geographical location are presented in Table IV.

TABLE IV

Characteristic Functions

Ordinal code

 

100

102

120

116

103

114

115

106

110

112

1

C1 =

1

0

1

0

1

1

1

0

1

0

2

C2 =

0

1

1

0

1

1

0

0

1

1

3

C3 =

0

0

0

1

0

0

1

1

0

1

 

To illustrate the construction of characteristic functions for descriptor states from base characteristic functions, consider X3, the characteristic function for the state Maryland. The ordinal code for Maryland is 3.

310 = 0112

Thus by proof of Theorem 1

X3 = NOT C3 AND C2 AND C1

C3

= 0001001101

NOT C3

= 1110110010

C2

= 0110110011

NOT C3 AND C2

= 0110110010

C1

= 1010111010

NOT C3 AND C2 AND C1

= 0010110010

 

= X3

 

E. Reconstruction of Operands

A locating criterion is presented to the TAXIR Accessioner as a Boolean expression. Recall the definition, given in Section III, A, of the operands for these Boolean expressions. In order for the TAXIR Accessioner to execute a Boolean expression it is necessary to reconstruct, from the base characteristic functions, characteristic functions for the operands from which this Boolean expression is made.

We have already illustrated how the characteristic functions for operands of the first form listed in Section III, A, "descriptor, descriptor state", can be reconstructed from the base characteristic functions (see proof of Theorem 1). Characteristic functions for operands of the second form listed earlier, FROM DS TO DS, could be constructed by reconstructing the characteristic functions for each state occurring in the FROM-TO range, and then connecting them by logical or operations. This procedure, however, could give rise to extremely long logical expressions, which would be inefficient to store or execute. Fortunately, there exists a relatively concise logical combination of the base characteristic functions themselves which is logically equivalent to the characteristic function for that subset of the items which fall into a given range of descriptor state values for a descriptor. Thus, characteristic functions for operands of the form FROM DS TO DS may be constructed from base characteristic functions even more readily and simply than from descriptor state characteristic functions.

 

(a) The FROM-TO Algorithm

As in the proof of Theorem 1, let C1, C2,...,Cn be the n base characteristic functions for the m states of a descriptor whose states have the natural order 1, 2, 3,..., m-1. Suppose we want to construct the characteristic function for the descriptor range from X to Y where 1 ≤ ‾X < ‾Ym-1, where ‾X is the ordinal value of the state name X and ‾Y is the ordinal value of the state name Y, in the respective ordered sequence of states. Let an an-l · · · al be the binary representation of ‾X and bn bn-l · · · bl be the binary representation of ‾Y. Let k be that index for which akbk but ai = bi for all i > k (k may equal n in some cases). We now have, as a necessary condition for items to belong to the FROM X TO Y range, the characteristic function

N = pn(Cn) AND pn-1(Cn-1) AND · · · AND pk+1(Ck+1)

where

pi(Ci)

= Ci if ai = 1

 

= ‾Ci if ai = 0.

Unless ai = 0 for all ik and bi = 1 for all ik, this condition is not sufficient.

In the nonsufficient case (by far the more common) we can find a number I with binary representation an an-1 · · ·  ak 1 1 · · · 1 and ‾XI ≤ ‾Y with not both equalities holding. It is now convenient to construct L, the characteristic function for the interval [‾X, I] and, whenever I < ‾Y, M, the characteristic function for the interval [I + 1, ‾Y]. The characteristic function for the descriptor range FROM X TO Y is now of the form

N AND (L OR M)

To construct L, consider the numbers ak ak-1 · · · a1 (ak = 0 as ‾X < ‾Y). These digits are scanned from right to left. The scan stops with aq = 1, k > q, ai = 0 for all i < q or with ak if ai = 0 for all i < k. In correspondence with ak is "NOT Ck." In correspondence with each subsequent digit ai scanned is "AND Ci" if ai = 1, but "AND (Ci OR (NOT Ci" if ai = 0. When the scan ends, the parentheses are closed and we have the expression for L.

The Boolean expression M is constructed in a similar manner. There, we must consider bk bk-1 · · · b1. In this case bk is known to be 1, and we scan from left to right, stopping with bk if all previous bi are 1, else stopping with bq = 0 and bi = 1 for all i < q. In correspondence with each digit bi scanned is "AND NOT Ci" if bi = 0, but "AND (NOT Ci OR (Ci" if bi = 1. When the scan ends, the parentheses are closed and we have the expression for M.

As an arbitrary example, let us suppose a data bank concerns the twentieth-century collections of some plant genus. One of the descriptors might be year of collection and its states might be 1900, 1901, 1902,..., 1968, present, in that order. In this case state 1 is 1900, state 2 is 1901, and so forth, up to state 70 is present.

A query to the TAXIR Accessioner on this bank might be

Query--List specimens with year of collection FROM 1925 TO 1942*

The descriptor year of collection has 71 states; thus it gives rise to seven base characteristic functions, which we call C1, C2,..., C7. The year 1925 is state 26 with binary representation 0011010 and 1942 is state 43 with binary representation 0101011. In this case k = 6, I = 31, N = (NOT C7), L corresponds to the range 26-31, and M corresponds to the range 32-43.

L = (NOT C6 AND C5 AND C4 AND (C3 OR (NOT C3 AND C2)))

M = (C6 AND NOT C5 AND (NOT C4 OR (C4 AND NOT C3)))

Thus the characteristic function for the specimens with year of collection FROM 1925 TO 1942 is

N AND (L OR M) = (NOT C7) AND ((NOT C6 AND C5 AND C4 AND (C3 OR (NOT C3 AND C2))) OR (C6 AND NOT C5 AND (NOT C4 OR (C4 AND NOT C3)))).

 

(b) The General Descriptor Range

The most general form of a descriptor range (cf. Section III, A) is type 3. Now that we can reconstruct characteristic functions for single states as well as for FROM-TO intervals, the general descriptor range is built by joining the components Xi with the logical or operation exactly as shown in Section III, A. The name of the descriptor specifies which collection of base characteristic functions will be implicated in the descriptor range and the following "operations"; OR or FROM-TO specify how these base characteristic functions should be combined to construct the operand.

 

(c) The TAXIR Operand

Recall a query language operand as previously defined to be of the form "descriptor name, any range for that descriptor." It is now known in the light of the execution theory for the TAXIR Accessioner that operands are characteristic functions (Section IV, Table II). When a query is executed, a result characteristic function is produced, which is then interpreted into item identification numbers for the convenience of the user. This result characteristic function is not destroyed, however, and may be implicated as an operand in the query immediately following. The word RESULT is the query language operand that stands for the result characteristic function of the preceding query. It may be used in a query in the same way as any previously defined operand.

Now that the definition of operand has been completed, the inductive definition of Section III, A completes the exposition of the TAXIR Accessioner query language and execution design.

 

F. Example Concluded

The example data bank (Table I) is processed as described into base characteristic functions and a dictionary of the control vocabulary as shown in Tables V and VI.

TABLE V

DICTIONARY

1. Flower color

2. Leaf venation

3. Geographical location

4. Month of flowering

1. Blue

l. Palmate

1. Vermont

1. April

2. Red

2. Pinnate

2. Delaware

2. May

3. White

3. Parallel

3. Maryland

3. June

4. Violet

 

4. Virginia

4. July

5. Pink

 

5. New Jersey

5. August

6. Yellow

 

6. New York

 

 

 

TABLE VI

BASE CHARACTERISTIC FUNCTIONS

 

100

102

120

116

103

114

115

106

110

112

C11

1

1

0

1

0

0

1

0

1

1

C12

0

0

1

1

0

1

0

1

0

0

C13

0

0

0

0

1

0

1

1

1

0

C21

1

0

0

1

1

1

1

0

0

0

C22

0

1

0

1

0

0

1

1

1

1

C31

1

0

1

0

1

1

1

0

1

0

C32

0

1

1

0

1

1

0

0

1

1

C33

0

0

0

1

0

0

1

1

0

1

C41

1

1

0

1

1

0

1

0

1

1

C42

0

0

1

1

1

0

0

1

1

0

C43

0

1

0

0

0

1

0

0

0

1

Unless an order for the states of a descriptor is specified, as in the case of month of flowering, they are given their ordinal codes in the order in which they are encountered in the TAXIR "define-item" statements, which assert state membership for each item. Here Cij is base function j for character i. Recall the sample query

     Query--List items with (flower color, red OR pink OR NOT leaf venation, pinnate) AND month of flowering, FROM April TO June AND geographical location, Maryland*.

This is the form in which the user poses the query. The TAXIR Accessioner automatically and internally translates this to its equivalent form in base characteristic functions:

(((NOT C11 AND 12 AND NOT C13) OR (C11 AND NOT C12 AND C13)) OR NOT (NOT C21 AND C22)) AND (NOT C43 AND ((NOT C42 AND C41) OR C42)) AND (C31 AND C32 AND NOT C33)

A result characteristic function (such as that in Section IV, E, (c)) may now be calculated and the appropriate response provided.

 

V. CONCLUDING REMARKS

The TAXIR System is a currently operative, computer-assisted information storage and retrieval system comprised of several related but independent modules, each specifically designed to serve a particular function in information retrieval. The TAXIR Accessioner is one such module. Its task is to accept any Boolean expression (logical combination of search criteria) for a collection of items and respond with the list of items that meet the conditions of that Boolean expression.

The system stores the data bank (records of descriptor state membership for each item) as base characteristic functions. This technique enables vast amounts of information to be stored in relatively little space. Boolean operands can be reconstructed by a simple calculation with these base characteristic functions. Further, the ease and flexibility with which any logical combination of search criteria may be implicated, through execution with base characteristic functions, makes feasible the presently uncommon practice of actively using any conceivable criterion instead of just the three or four most critical ones.

For queries with moderately sophisticated logical combinations of search criteria (operands), such as the query in the example in this exposition, the TAXIR Accessioner executes with considerable efficiency, as calculation largely replaces traditional file searching.

 

APPENDIX

If an equiprobable measure w is assumed for S, it is possible to make a brief information theoretic analysis [6] of the TAXIR Accessioner. To the jth of n descriptors, Kj, may be associated a probability distribution, Pj(x), over the states x; Pj(x) = s∈xw(s).

The information content of Pj(x) may be computed as

-x Pj(x)ln2(Pj(x)) ≡ H(Kj).

If stochastic independence is not assumed among the descriptors, the information content of the entire data bank is

H(j=1n Kj) = I,

which measures the information theoretic minimum of bits per item required for its storage. To realize this minimum, analysis of the stochastic dependence between descriptors must be made. Although this is possible [5] it is not practical at this point, for each time new items are added to S by users, the distributions Pj(x) would, in general, be changed and the data bank would have to be restructured for storage. Thus the TAXIR Accessioner treats descriptors for storage as if they were independent.

With this constraint, the information-theoretic minimum of storage would be

j=1n H(Kj) = JI bits per item of storage.

The change in Pj(x) with the addition (or deletion) of items in S also precipitates, in general, changes in H(Kj) that would require restructuring the data bank. Thus the TAXIR Accessioner treats the descriptor as if the distributions Pj(x) were themselves equiprobable. With this constraint, the information-theoretic minimum of storage would be

j=1n ln2(mj) = KJI bits per item of storage

(where mj is the number of states of Kj).

For algorithmic considerations in the execution of queries, it is further desirable to store for each item a whole number of bits for each state. This renders possible the efficient process for the reconstruction of operands described in the article. Thus the storage requirements of the TAXIR Accessioner data bank (excluding, of course, the program itself) are exactly

j=1n ln2(mj) = AKJI,

where [x] may be read "least integer not less than x," bits per item. These losses, minor in most cases, are more than made up for by storage and computation convenience.

 

REFERENCES

1 C. J. Austin, The MEDLARS system, Datamation 10(1964), 28-31.

2 C. R. Blunt, 1966. An information retrieval system model. HRB-Singer, Millersville, Pennsylvania, April, 1966.

3 C. R. Blunt, R. Duquet, and P. Luckie, 1967. Simulation of information systems, in Proc. Amer. Documentation Inst., 5th Ann. Meeting (New York, 1967), Vol. IV, pp. 75-79. Washington, D.C.

4 D. L. Childs, Feasibility of a set-theoretic data structure, Tech. Rept. 6 (1968), CONCOMP, Univ. of Michigan.

5 G. Estabrook, 1967. An information theory model for character analysis, Taxon 16(1967), 86-97.

6 Feinstein, Foundations of information theory. McGraw-Hill, New York, 1958.

7 J. G. Kemeny, H. Milkil, J. L. Snell, and G. L. Thompson, Finite mathematical structures. Prentice-Hall, Englewood Cliffs, New Jersey, 1960.