BIT 18 (1978"1,425-435
2-3 BROTHER TREES H. P. KRIEGEL, V. K. VAISHNAVI and D. WOOD
Abstract. 2-3 brother trees form a...
9 downloads
763 Views
440KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
BIT 18 (1978"1,425-435
2-3 BROTHER TREES H. P. KRIEGEL, V. K. VAISHNAVI and D. WOOD
Abstract. 2-3 brother trees form a dense class of search trees having O(logN) insertion and deletion algorithms. In this paper we provide an O(togN) insertion algorithm and show that these trees have much better density and storage utilization than 2-3 trees. Thus we demonstrate that the "brother property" which has so far been used only for binary trees can be usefully applied to 2-3 trees. 1. Introduction. Knuth (Problem 26, page 471 [2]) poses the problem: Investigate the properties of balanced multiway trees. Vaishnavi, Kriegel and Wood [5] demonstrate that the concept of "height balancing" can be usefully applied to 2-3 trees by providing O(log2N) insertion and deletion algorithms for height balanced 2-3 trees. The class of height balanced 2-3 trees properly contains the class of 2-3 trees. This fact is reflected in better restructuring behaviour, but worse density and therefore worse search behaviour for height-balanced 2-3 trees as compared to 2-3 trees [5]. Thus height balanced 2-3 trees are suitable for applications with relatively more insertions or deletions as compared to retrievals. N o w the question arises whether other balancing concepts can be similarly applied. In particular, we are interested in a balancing concept which leads to a restricted class of 2-3 trees having O(log2N ) insertion and deletion algorithms. Such a class of trees is expected to have better density and therefore better search behaviour at the cost of possibly worse restructuring behaviour compared to 2-3 trees. This is of interest for applications with relatively more retrievals than insertions or deletions. With this motivation we investigate the 2-3 brother trees obtained by applying the balancing concept "brother property" to 2-3 trees; the brother property being that, except for the sons of a binary root, each binary node has a ternary brother. This paper summarizes the results given in detail in [3]. In particular, an O(logN) deletion algorithm and an estimation of average storage utilization may be found therein. In section 2 we provide bounds for the height of an N-key 2-3 brother tree. This gives a quantitative idea of the improvement in the worst Work supported partially by a Natural Sciences and Engineering Research Council of Canada Grant No. A-7700 and partially by the German Academic Exchange Service under Nato Research Grant No. 430/402/584/8. Received April 3, 1978. Revised August 22, 1978.
426
H . P . KRIEGEL, V. K. VAISHNAVI AND D, W O O D
case height and hence retrieval time for 2-3 brother trees as compared to 2-3 trees. In section 3 we design an O(log N) insertion algorithm for 2-3 brother trees. Because of the brother property we expect 2-3 brother trees to have better storage utilization than 2-3 trees. In order to confirm these expectations we analyze the worst case storage utilization for 2-3 brother trees in section 4. Finally, in section 5 the results obtained in this paper are summarized and compared with those for 2-3 trees and 1-2 brother trees [4].
2. 2-3 brother trees.
We assume familiarity with trees and their related notions. A 2-3 tree is a tree in which every internal (non-ieat) node has either 2 or 3 sons with all external (leaf) nodes at the same level. A node with two sons is called a binary node and a node with 3 sons a ternary node. A 2-3 tree is used as a search tree by placing keys in the nodes in the following way: A node with s subtrees (s = 2, 3) accommodates s - 1 keys. All keys in the left (resp., right) subtree of a given node are smaller (larger) in magnitude than the key(s) in the node; should s = 3 , the keys in the middle subtree are strictly intermediate in magnitude between the keys in the node. (Note that this definition of 2-3 trees storing keys in internal nodes follows Knuth [2] rather than Hopcroft [1] who introduced 2-3 trees.)
5
13
7
L0
Figure 1.1. Example of a 2-3 tree of height 3.
The above definition implies an algorithm for searching for a given key in a 2-3 search tree. In a successful search (the tree contains the given key) the algorithm will return the node containing the given key; in an unsuccessful search the algorithm will terminate at the leaf in which the key may be inserted. Clearly, the time complexity of the search algorithm is in the worst case O(h), where h is the height of the tree. A 2-3 brother tree is a 2-3 tree satisfying the additional brother property: except for the sons of a binary root, each binary node has a ternary brother.
427
2-3 BROTHER TREES
Obviously, the class of 2-3 brother trees is properly contained in the class of 23 trees. Observe that the 2-3 tree given in Figure 1.1 is a 2-3 brother tree. Since in the worst case the time complexity for searching a key in a 2-3 brother tree is O (h), we are interested in the lower and upper bounds for the height of a 2-3 brother tree with a given number of keys. F o r this purpose we investigate the minimum and maximum number of keys that can be stored in a 2-3 brother tree of a given height h, denoted by k h and K h, respectively. Let khn denote the number of keys in a 2-3 brother tree of height h with a binary root, which is a proper subtree of a minimal 2-3 brother tree (observe that in such a tree each binary node has a ternary brother), and let k T denote the number of keys in a minimal 2-3 brother tree of height h with a ternary root. Then the following recurrence equations may be obtained: kh8 = l+kha_l+kT_l,
koa = 0
k T = 2 + 2 k h _nl + k h _ l ,r
kro = 0
and k h = 1+2k~_1
,
ko = 0 .
Using standard methods we obtain: kh = ½(1 +]//2) h-1 +½(1 --V2) h-1 -- 1 ~ ½ ( 1 + / 2 )
h-1 .
Thus h =
1
log 2 (1 +]//2)
log2 kh+O(1) ~ 0.78641og 2 k h .
As far as "maximal" 2-3 brother trees are concerned, it is clear that they are complete ternary trees. Therefore: h-1
Kh = 2 ~
3 i = 3h - l ,
h>l.
i=0
Thus, for the height of an arbitrary N-key 2-3 brother tree we have 0.63091og 2 ( N + I ) < h < 0.78641og2N. Observe that the height of a minimal 2-3 brother tree is less than that of a minimal 2-3 tree (which is log2 N). Therefore, the worst case retrieval time in 2-3 brother trees is significantly less than that in 2-3 trees. For describing the insertion algorithm for 2-3 brother trees we use the following notation: 1. Small letters inside nodes denote keys, for example
H.P.KRIEGEL,V.K.VAISHNAVIANDD.WOOD
428
2. A triangle Z~ indicates that it is irrelevant whether the specified node is binary or ternary. 3. A capital letter to the left or under a node denotes its name, for example
s G
,
Ao,
'
A
B
C
In order to shorten the description of the insertion algorithm we will not consider mirror images of the cases. 3. Insertion algorithm.
We assume that given a key k, the tree is searched to determine the semileaf P at which the key k should be inserted. The key k is inserted in P at the appropriate place and then INSERT(k) is called. For the description of the insertion algorithm we use * on top of a node to indicate that it contains 3 keys (and has 4 sons). This implies that the tree needs restructuring. INSERT(k): On entry, k is inserted in the node P. As a result
Case 1: P contains two keys:
FINISH
Case 2: P contains three keys:
or
and CALL UP(P)
and CALL UP(P) We now give the procedure UP.
~
and CALL UP(P)
or
429
2-3 BROTHER TREES
UP(S): On entry we have:
for the first call of UP
for the recursive calls of UP
The procedure UP is given for the latter situation. The description of the procedure UP for the former situation can be obtained simply by replacing all leaf nodes in all the relevant diagrams by square (external) nodes. Case 1: S is the root.
FINISH
Case 2: Father P of S is binary.
2.1: S is the right son of P. 2.1.1: Brother of S is binary.
FINISH
430
H. P, KRIEGEL, V. K. VA|SHNAVI AND D. W O O D
2.1.2: Brother of S is ternary.
p
P~
FINISH
Case 3: Father P of S is ternary. 3.1: S is the right son of P. 3.1.1: One of the brothers of S is binary.
FINISH
3.1.2: Both brothers of S are ternary. 3.1.2.1: At least one of A and B is ternary.
Pt
l A
B
and CALL UP(P)
431
2-3 BROTHER TREES
3.1.2.2: Both A and B are binary (thus C is ternary).
FINISH
If D or E is of the form
, then replace t
h
e
~
(Binarizing the ternary roots of the subtrees creates no problems in all the considered cases). 3.2: S is the middle son of P. 3.2.1: At least one of the brothers of S is binary. 3.2.1.1: Left brother of S is binary.
.
FINISH
432
n . P . KRIEGEL, V. K. VAISHNAVI AND D. WOOD
3.2.2: Both brothers of S are ternary.
and CALL UP(P)
Clearly, in the worst case, the procedure UP may be called for all nodes on the path from a semileaf up to the root. Therefore the time complexity of the insertion algorithm is in the worst case O(h), where h is the height of the tree and thus it is O(log2N), where N is the number of keys in the tree.
4. Storage utilization. We are interested in the number of internal nodes n(N) needed for storing N keys in a 2-3 brother tree. We derive an upper bound for the value of n(N), denoted by fi(N). For this purpose, the storage utilization factor f ( T ) for a 2-3 brother tree Tis defined as
f(T) =
number of internal nodes in T number of keys in T
Let f be the maximum storage utilization factor for all 2-3 brother trees. Then ~(N) can be expressed as f N . Let f (h) be the maximum storage utilization factor for all 2-3 brother trees of height h. Clearly, j7 (h) is the storage utilization factor of a 2-3 brother tree of height h having the smallest possible number of ternary nodes. Such a tree is a minimal tree of height h (with tlae minimum number of keys, see Section 2), denoted by t h.
Thus
f(h) =
number of internal nodes in number of keys in th
In order to compute f(h), we define number of internal nodes in t, at level l fl(h) = number of keys in th at level l
l<_l<_h,
h>=l .
th
,
h>_l .
--
2-3 BROTHER TREES
433
Denoting the total number of keys in th by /~h we have (1)
nh = nh-1 + ~
1
(nh-l + l ) .
In the above equation we use the fact that the number of internal nodes in level h is equal to the number of keys in th-1 plus 1. (1) can be written as fh (h) -
th
at
nh- 1 + 1 nh -- nh - 1
Substituting for n h and n~_ 1 (see Section 2) we obtain ½(1 + / 2 ) h-2 +½(1 - ~//2)h-2 -- 1 + 1 ½(1 +]/2) h-1 +½(I-]//2) h - l - 1 - ½ ( I +1/2) h-2 - ½ ( i - 1//2)h-2 + i =
(1+1/2)h-2+(1--1/2) h-2
,,,
1 ~ 0.7071.
1/2((1 + 1 / 2 ) h - 2 - ( i - 1 / 2 ) h - ' ) - I//2 -In fact, it can be verified that fh(h)=0.7071 for h > 8 . Thus fh(h) is in effect independent of h (for h>8), and for large h
)=
= Yh(h)
= - - -I ~
I/2-
0.7071.
Thus, for large N, 0.7071N is an upper bound for the number of internal nodes needed to store N keys in a 2-3 brother tree. Observe that this bound cannot be improved because 0.7071 is the storage utilization factor for all minimal 2-3 brother trees of large height. Compared to 2-3 trees, where in the worst case N internal nodes are needed for storing N keys, 2-3 brother trees are significantly better as far as worst case storage utilization is concerned. The lower bound for the number of internal nodes in an N-key 2-3 brother tree is 0.5N, the same as for an N-key 2-3 tree. Thus we have 0.5N < n(N) < 0.7071N
5. Conclusions.
In this paper we have studied the class of 2-3 brother trees and provided an O(logN) insertion algorithm. (An O(logN) deletion algorithm is also available, see [3].) Moreover, we have investigated the following quantities for N-key 2-3 brother trees: t. Bounds for the height. 2. Bounds for the number of internal nodes (storage utilization). The results obtained as well as the corresponding results for 2-3 trees are
434
H.P. KRIEGEL,V. K. VAISHNAVIAND D. WOOD
summarized in table 5.1 (in each case the lower bound is in the first line and the upper bound in the second line).
Table 5.1. 2-3 brother tree
2-3 tree
height
0.6309 log2 (N+ 1) 0.7864 log2 N
0.6309 log2 (N+ 1) 1.0 log2 (N + 1)
storage utilization
0.5 N 0.7071 N
0.5 N 1.0 N
We observe that in both the quantities investigated 2-3 brother trees are significantly better than 2-3 trees. Yao [6] has provided an analysis of the average storage utilization of 2-3 trees. Also in this case 2-3 brother trees are better than 2-3 trees [3]. Ottmann and Wood [4] study 1-2 brother trees, give bounds for storage utilization for these trees, and compare their results with those for 2-3 trees under the premise that each internal node of a 2-3 tree requires 2 units of storage as compared to 1 in the case of a 1-2 brother tree. Under the same premise we summarize the storage utilization results for 2-3 brother trees, t - 2 brother trees and 2-3 trees in table 5.2.
Table 5.2.
storage utilization
2-3 brother trees
1-2 brother trees
2-3 trees
1.0N 1.4142N
1.0N 1.618N
1.0N 2.0N
Note that as far as storage utilization is concerned, while 1-2 brother trees are superior to 2-3 trees, 2-3 brother trees are superior to 1-2 brother trees.
Acknowledgement. The authors would like to thank Th. Ottmann and an anonymous referee for helpful comments on an earlier version of this paper which have hopefully lead to an improvement in presentation.
2-3 BROTHER TREES
435
REFERENCES 1. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms, Addison-Wesley, Reading (1974). 2. D. E. Knuth, The art of computer programming, Vol. III: Sorting and searching, Addison-Wesley, Reading (1973). 3. H. P. Kriegel, V. K. Vaishnavi, and D. Wood, 2-3 brother trees, Computer Science Technical Report 78-CS-6, Department of Applied Mathematics, McMaster University, Hamilton, (1978). 4. Th. Ottmann and D. Wood, 1-2 brother trees, Computer Journal (1978), to appear. 5. V. K. Vaishnavi, H. P. Kriegel, and D. Wood, Height balanced 2-3 trees, Computing (1978), to appear. 6. A. C.-C. Yao, On random 2-3 trees, Acta Informatica 9 (1978), 159-170.
COMPUTER SCIENCEGROUP DEPARTMENT OF APPLIEDMATHEMATICS McMASTER UNIVERSITY HAMILTON, ONTARIO LSS 4K1 CANADA