McGill University:
School of Computer Science
Winter 1999 Projects for 308-251B

DATA STRUCTURES AND ALGORITHMS

Project #18: de la Briandais Tree

Last update: February 25, 1999

http://www.CS.McGill.CA/~cs251/OldCourses/1997/topic7 Rene de la Briandais proposed his tree in the context of file searching during a 1959 Western Joint Computer Conference (WJCC). A de la Briandais tree is a positional tree in which there are paths from root to a leaf corresponding only to actual keys in a keyset. This is to be distinguished from another type of positional tree, called a trie, in which the number of paths is intermediate to the number of possible or actual keys. A de la Briandais tree is a positional tree with character nodes in which keys correspond to paths from root to leafs and every leaf represents a key. ------------------------------------------------------------------------------------------ Fig1.: A Node of a de la Briandais tree: node = { *child pointer, key, *sibling pointer } __________________________________ | | | | |Child | |Sybling | |Pointer |KEY |Pointer |-------------------> | [] |[letter] | [] | |__________|___________|__________| | | | V ------------------------------------------------------------------------------------------ Fig2.: A de la Briandais tree for the keys: hal, hul, hem, tan, tin, tim, ted | | V [][h][] ---------->[][t][]---->(N) | | | | V V [][a][]-->(N) [][a][]------------>[][i][]-------------------------->[][e][]---->(N) | | | | | | | | V V V V [][l][]-->(N) [][n][]---->(N) [][n][]-->[][m][]-->(N) [][d][]-->(N) | | | | | | | | | | V V V V V [][$][] [][$][] [][$][] [][$][] [][$][] where: (N) = Null Pointer [][$][] = end of word marker, whose sibling and child_pointer are both Null. ------------------------------------------------------------------------------------------- ALGORITHMS/IMPLEMENTATION: The following are algorithms that I wrote for INSERTION, DELETION, and MEMBER: INSERTION (tree pointer TREE, word WORD) { out = 0; i = 0; letter = word [at position i]; current = TREE; //we start with the first node DO WHILE (( i <= length of WORD) and (out=0)) { IF (letter equals (current->key)) //if we have a match { i = i +1; letter = word[ at position i]; //get next letter current = current --> child_pointer; //follow the child } ELSE //we have to check siblings {IF (current --> sibling_pointer does not equal Null) //if there is a sibling current = current->sibling //check it ELSE out=1; //no sibling, we add new node } } (end of DO WHILE) // we're at the point of insertion of new node unless the word was already there: IF (out = 0) EXIT //if the word was already there, exit // otherwise add a new node with the current letter current-->sibling_pointer = CREATE new NODE (letter); i = i + 1; // and move on to append the rest of the letters FOR (m=i ; m<= length of WORD; m++) { current.child_pointer = CREATE new NODE ( WORD[at position m]); current = current --> child_pointer; } // now we just add the $ marker for end of word current.child_pointer = CREATE new TERMINAL $ MARKER NODE; }end of INSERTION ---------------------------------------------------------------------------------------- DELETE (tree pointer TREE, word WORD) { current = TREE; // we start with the first node again out=0; letter = WORD [at position i]; i=0; DO WHILE (( i <= length of WORD) and (out=0)) { IF (letter is equal to current --> key) current = current -> child_pointer; //we move down one level i = i + 1; letter = WORD [at position i]; // we move onto next letter ELSE ( IF there is a sibling node) {last_sibling = current; //store the last sibling current = current --> sibling // move right in the tree to the sibling node } (IF there is no sibling node) out =1; // EXIT with ERROR because the word to be deleted is not there } DO WHILE PUSH last_sybling --> sybling_pointer onto STACK PUSH all the children of last_sybling -->sybling_pointer onto STACK one by one until the $ marker Set all of the pointers on STACK to Null.as you're POPPING them off the STACK } end of DELETE ----------------------------------------------------------------------------------------------- MEMBER (tree pointer TREE, word WORD) { out = 0; i=0; letter = WORD [at position i]; //start with the first letter of the WORD current = TREE; //and the first node of the TREE WHILE ((i <= length of WORD) and (out =0)) DO { IF (letter is equal to current --> key) //if we have a match { current = current --> child_pointer // go down one level in tree i = i + 1; letter = WORD [at position i] // move on to the next letter ELSE (IF there is a sybling, ie., current --> sybling_pointer is not equal to Null) { current = current --> sybling_pointer // move right along same level} (ELSE if there is no sybling EXIT, out = 1) }(WHILE) IF ( out is equal to 0) { IF (current --> child_pointer points to a TERMINAL $ marker) RETURN (True); } ELSE RETURN (False); }end of MEMBER ---------------------------------------------------------------------------------------------- ANALYSIS: The MEMBER, INSERT and DELETE times for these algorithms is O(n log n) Let n be the average number of nodes on each level i if we make the assumption that the nodes are equally likely to occur, the expected number of comparisons for the ith level is = (1 + n)/2 if the average number of levels is h, the expected number of comparisons is E() = Sigma (from i=1 to h) ( (1 + n)/2 ) which is approximately equal to = h * ( 1 + na ) / 2 where na is the size of the filial set averaged over all levels. na = (1/h) * Sigma (from i=1 to h) ( n ) Thus, de la Briandais search time is O ( na * h). but h is approximately equal to log (base na) N ie., h = log (base na) N where N is the total number of words in de la Briandais tree. Thus, we get E() = O ( na * h) = O ( na * log (base na) N ) = O (n log n) ------------------------------------------------------------------------------------------------ de la Briandais trees have been implemented successfully as a way of storing dictionnaries. What makes them advantageous over an ordinary trie is that an ordinary trie uses an array of pointers for all possible letters (26 letters in case of English alphabet) for every node. Thus, each level in the tree has at least that one array of pointers, one for each letter. In my example (see Fig2) there would have to be one array at level 1 with only two pointers used (h and t) and 25 not used. At the second level we'd have two arrays, first with only one pointer used (l) and second with three pointers used (a, i, e). One can clearly see, that there is a lot of wasted space happening with tries, compared to de la Briandais. Same goes with PATRICIA, except that with PATRICIA we have some compaction and the need for array-nodes with only one pointer used is practically eliminated. Although this saves on space, it still uses up more of it than de la Briandais, which is clearly the most economical for space. The only time de la Briandais doesn't do as well as tries on space is when the filial set at a level is very high, (that is, when there is a word for every letter in the node array). In this case, de la Briandais actually uses up more space because of it's two pointers per letter. An ordinary trie would hold only an array of pointers, each member of the array being used. Thus, de la Briandais is not convenient for levels with a high filial set. A further disadvantage of de la Briandais is that it takes longer to insert, search and delete. With ordinary tries or PAT, insertion and deletion is proportional only to the height of the tree, not to the filial set. Thus, one would want to use de la Briandais when one wants to minimize on the usage of space for storage while still holding on to reasonable time ( O(n log n)) of the operations such as delete, insert and member. The important thing to notice is that the time needed to list all of the elements of a de la Briandais, an ordinary trie or PATRICIA is the same. Thus, if we were to implement a database where updates and changes are relatively infrequent compared to listing all the members then de la Briandais seems ideal, since its slower performance on updates becomes less important due to their infrequency. Such is the case of a file directory system, where on systems like UNIX, the command 'ls' is used much more often than save or remove. In cases such as this, de la Briandais seems like an ideal option. The filial set at each level remains fairly low as long as the number of files is not too large, and there is some repetition in the prefixes of filenames. This can often be the case, however, since filenames are often sequential, eg., tomasz1, tomasz2, tomasz3, tomasz4, etc. Storing such a sequence of keys would be much more economical using de la Briandais than an ordinary trie or PAT. ------------------------------------------------------------------------ Bibliography: Renee de la Briandais, "File Searching Using Variable Length Keys", Proceedings of the Western Joint Computer Conference. "Data Types and Structures" C. C. Gotlieb and L. R. Gotlieb University of Toronto. Prentice Hall, Inc., Englewood Cliffs, New Jersey, 1978 "A Practical Introduction to Data Structures and Algorithm Analysis" Clifford A. Shaffer. Prentice Hall, Upper Saddle River, New Jersey, 1998. "Data Structures and Algorithms" Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman. Bell Telephone Laboratories, Incorporated, 1983 "Data Structures and Algorithm Analysis" Mark Allen Weiss. Florida International University, 1992 The creator of this page is: Tomasz Neugebauer http://tom.biodome.org/