Binary Search Trees
CMPT 225
ADTs related to Sets
- Set: unordered collection of values/objects.
- Operations:
- insert(x) // add x to set
- member(x) // check if x in set. a.k.a. find(x), search(x), lookup(x), …
- remove(x) // remove x from set
- size() // get size of set
- empty() // is set empty
- clear() // remove all elements (i.e., make set empty)
- We call the values we store keys,
- We assume the keys are from some unordered set S.
- i.e. for any two keys , we have exactly one of
- What implementaions where all operations are efficient/fast
- Q: What will count as “fast”?
ADTs related to Sets
Consider time complexity of operations ofr simple list + array implementations!
type | insert | find | remove |
---|---|---|---|
un-ordered array | (green) O(1) | O(n) | O(n) |
ordered array | (red) O(n) | (purple outline) O(log n) | (red) O(n) |
un-ordered linked list | (green) O(1) | (red) O(n) | (red) O(n) |
ordered linked list | (red) O(n) | (red) O(n) | (red) O(n) |
Q: What will count as “fast”?
A: Time O(log n) //n is size of set
Some Related Container ADTs
- Multiset: like set, but with multiplicities (aka bag)
- count(x)
- Map: unordered collection of >key,value< pairs, associating at most one value with each key. (e.g. partial function keys -> values)
- put(key,val) // in place of insert(x)
- get(key) // return value associated with key
- Dictionary: like map, but associates a collection of values with each key.
Implementations of these are simple abstractions to implementations of sets, which we focus on.
Binary Search Tree (B.S.T.)
A BST is:
- a binary tree // a structure invariant
- with nodes labled by keys
- satisfying the following order invariant: for every two nodes u,v:
- if u is in left subtree of v, then
- if u is in the right subtree of v, then
Ex.
Example 1 (checked):
-
5
- 3
- 6
Example 2 (X):
-
5
-
3
- 2
- 6
- 8
-
3
Example 3 (check) (right and left are written explicitly when there are not two nodes to write. Otherwise, left is written first and right is listed second.):
-
5
-
10 (right)
-
20 (right)
-
30 (right)
- 25 (left)
-
30 (right)
-
20 (right)
-
10 (right)
Example 4 (check):
- 5
Every sub-tree of a BST is a BST
-
500
-
200
- 100
-
300
- 250
- 350
-
700
-
600
- 650 (keys in this subtree would be >600 , <700)
- 560
- 800
-
600
-
200
This makes recursive algorithms very nature.
Fact:
In-order traversal of a BST visits keys in non-decreasing order.
Proof sketch:
- Basic: , so one node
- I.H.: the claim holds for all trees of
- I.S.: T is: v with left tree of A and right tree of B. (A, B may b e empty)
- We:
- traverse A, visiting keys in sequence:
- visit v
- tranverse B, visiting keys in sequence
- Overall, we visit:
- By I.E.
- Because T is a BST, so:
BST Find/Search: examples
(Trasncriber’s note: the links are the search path for the algorithms)
find(5):
find(1):
Find 6:
-
5
-
8 (right, X)
- 10 (right)
-
8 (right, X)
Some notation:
Suppose v
is a node of BST. We write:
- left(v) = left child of v
- right(v) = right child of v
- key(v) = key labelling v
- node(v) node v s.t. key(v)=x
BSD find(x) Pseudo-code
find(x){// return true iff t is in the tree.
return find(t,root)
}
find(t,v)// return true if t appears in ubstree rooted at v.
{
if t < key(v) & v has a left subtree
return find(t, left(v))
if t > key(v) & v has a right subtree
return find(t, right(v))
if key(v) = t
return true
return false //v is a leaf, does not have t
}
BST find(t,v) pseudo-code – alternate version
find(t,v) // return true if t appears in subtree rooted at v
{
if key(v)=t
return true
if t < key(v) & v has a left subtree
return find(t,left(v))
if t > key(v) & v has a right subtree
return find(t,right(v))
return false
}
Q: Which version is better?
A: key(v)=t will almsot always be false, so the first return should do fewer comparisons and usually be false.
BST insert(x) Pseudo-code
insert(t){
// adds t to the tree
// assumes t is not in the tree already*
u <- node at which find(t,root) terminates**
if t<key(u)
give u a new left child with key t.
else
give u a new right child with key t.
}
* Excersise: Write the version that does not make this assumption.
** Excersise: Write the version where the search is excplicit.
BST Insert Examples
insert(1):
insert(7):
BST insert(x) Pseudo-code – explicit search version…
insert(t){ //adds t to the tree if it is not already there
insert(t, root)
}
insert(t,v) //insert t in the subtree rooted at v, if it is not there
{
if t < key(v) & v has a left subtree
insert(t, left(v))
if t > key(v) & v has a right subtree
insert(t, right(v))
if t < key(v) //here v has no left child
give v a new left child with key t
if t > key(v) //here v has no right child
give v a new right child with key t.
// if we reach here, t=key(v), so do nothing.
}
Insertion Over for BSTs: Examples
1)
- start with an empty BST
- insert 5,2,3,7,8,1,6 in the given order
-
5
-
2
- 1
- 3
-
7
- 6
- 8
-
2
2)
- start with an empty BST
- insert 1,2,3,5,6,7,8 in the order given
-
1
-
2 (right)
-
3 (right)
-
4 (right)
-
5 (right)
-
6 (right)
-
7 (right)
- 8 (right)
-
7 (right)
-
6 (right)
-
5 (right)
-
4 (right)
-
3 (right)
-
2 (right)
Notes
- Insertion order affects the shape of a BST.
- Removal order can too.
BST remove(t)
We consider 3 cases, increasing difficulty.
- Case 1: t is at a leaf (example figure #1):
- find the node v with key(v)=t
- delete v
- Case 2: t is a node with 1 child (example figure #2 and example figure #3)
- find the node v with key(v)=t
- let u be the child of v
- replace v with the subtree rooted at u
- For case 3, see the next section
Example Figure #1
remove(7):
Example Figure #2
remove(3)
step 1 (original)
-
5
-
3 (Xed out)
- 1 (left)
- 10
-
3 (Xed out)
step 2
-
5
- 10 (right)
- 1
step 3
-
5
- 1
- 10
Example Figure #4
remove(10)
step 1 (original)
-
4
- 2
-
10 (Xed out)
-
7 (left)
-
6
- 5 (left)
- 8
-
6
-
7 (left)
step 2
-
4
- 2 (left)
-
7
-
6
- 5 (left
- 8
-
6
step 3
-
4
- 2
-
7
-
6
- 5 (left
- 8
-
6
BST remove: Case 3 Preperation: Successors
- In an ordered collection
- is the predocessor of
- is the successor if
- Write:
- Let be the nodes of the tree ordered as per an in-order traversal.
- Let \rangle$$ be the keys, in non-decreasing order.
- Then: i.e., the next node has the next key.
BST remove: Case 3 Preperation: Successorts in BSTs
- If S is a set of keys, and , then the successor of x in S is the smallest value . Ex.
- In a BST, in-order traversal visits keys in order.
- Let S be the set of keys in BST T.
- the successor of
x
in S is where u is the node of T that an in-order traversal of T visits next after v.
-
5
-
3
- 2 (left)
-
8
- 7
- 9
-
3
- If v is a node of BST T, then we can say the successor of v in T is the node of T visited just after v by an in-order traversal of T. Then:
- Or: if , we can find the successor of x by finding the successor node of v, and getting its key:
BST remove: Case 3 Preperation: Successors
If node v has a right child, it is easy to find its successor: is the first node visited by an in-order traversal of the right subtree of v.
Ex. 6 diagrams. All of which give v a right subtree, one of one node, one of one node with a left child, one with a left leaf and right subtree of its own, and three variations on arbitrary numbers of children attached to the left node of v.
To find the successor of node v that has a right child, use:
succ(v){
u<-right(v)
while(left(u) exists){
u<-left(u)
}
return u
}
BST remove(t)
Case 3: t is at a node with 2 children:
- find the node v with key(v)=t
- find the successor of v – call it u.
- key(v)<-key(u) //replace t with succ(t) at v.
- delete u:
- if u is a leaf, delete it.
- if u is not a leaf, it has one child w, replace u with the subtree rooted at w.
Notice: 4.1 is like case 1; 4.2 is like case 2.
BST remove(k) when node(k) has two children
Ex. to remove 5:
- Find 5
- Find successor of 5
- Replace 5 with its succ.
- In this example, succ(5) has no children so just delete the node where it was.
Example tree:
- 20 (link starts step 1.)
After switching 5 and succ(5):
(transcriber’s note: may be incorrect, but I’m writing what’s there)
-
20
-
6
- 2
-
10
-
7
- 8 (right)
- 12
-
7
- ...
-
6
Example tree 2:
To remove 6:
- Find 6
- Find successor of 6
- Replace 6 with its successor
- Replace succ(6) with its non-empty subtree
Tree:
Becomes, by step 4:
-
30
-
2
- 1
-
7
- ...
-
14
-
11
-
9
- 8
- 10
- 12
-
9
- ...
-
11
- ...
-
2
Complexity of BST Operations
- Measure as a function of: height (h) or size/# of keys (n).
- All operations essentially involve traversing a path from the root to a node v, where in the worst case v is a leaf of maximum depth.
- So:
- find: O(h), O(n)
- insert: O(h), O(n)
- remove: O(h), O(n)
- For “short bushy” trees (e.g. T1) h is small relative in n.
- For “tall skinny” trees (e.g. T2) h is proportional to n.
Q: Can we always have short bushy BSTs?
T1
-
node
-
node
- node
- node
-
node
- node
- node
-
node
T2
-
node
-
node
-
node
-
...
-
node
- node
-
node
-
...
-
node
-
node
Perfect Binary Tree
- A perfect binary tree of height h is a binary tree of height h with the max number of nodes:
1 (yes):
- node
2 (no):
-
node
- node (left)
3 (yes):
-
node
- node
- node
4 (yes):
-
node
-
node
- node
- node
-
node
- node
- node
-
node
5 (no):
-
node
-
node
-
node
- node
- node
-
node
- node
- node
-
node
-
node
-
node
- node (right)
-
node
- node
- node
-
node
-
node
6 (no):
-
node
-
node (right)
-
node (right)
-
node (right)
- node (right)
-
node (right)
-
node (right)
-
node (right)
- Claim: Every perfect binary tree of height h has nodes.
- Pg: By induction on h, or on the structure of the tree.
- Basis: If h=0, there is one node (the root). We have as required.
- I.H.: Let , and assume that every perfect binary tree of height k has nodes.
- T.S.: (Need to show a plot of height k+1 has nodes). A perfect binary tree of height k+1 is constructed as: k is height of left (A) or right (B) subtree; k+1 is the height of the subtree plus one (the root). Where A,B are perfect binary trees of height k. By I.H. they have nodes. So, the tree has , as required.
Existance of Optimal BSTs
Claim: For every set S of n keys, there exists a BST for S with height at most
Proof: Let h be the smallest integer s.t. , and let . So,
let T be the perfect binary tree of height h
Label the first n nodes of T (as visited by an in-order traversal) with the keys of S, and delete the remaining ndoes (to get ).
is a BST for S with height
So, there is always a BST with height .
Optimal BST Insertion Order
Given a set of keys, we can insert them so as to get a minimum height BST:
Consider:
Graph of a perfect tree, with height of 4. Every node has two children, except for the 8 leafs.
What can we say about the key at the root? It is the median key.
Observe: the first key inserted into a BST is at the root forever (unless we remove it from the BST).
Given a set of keys, we can insert them to get a minimum height BST:
(transcriber’s note: I may have done this wrong, the drawing of the following tree is very weird.)
-
1
- 2
- 2
* apply the “root is the median key” principle to each subtree.
So, there is always a BST with height
Can we maintain min. height with as we insert and remove keys?
Consider A:
-
5
-
3
- 2
- 4
-
7
- 6 (left)
-
3
insert(1) would make it become B:
-
4
-
2
- 1
- 3
-
6
- 5
- 7
-
2
- B is the only min height BST for 1..7.
- A -> B requires “moving every node”
- To get operations, we need antoher kind of search tree, other than plain BSTs.
- To get efficient search trees, give up at least one of:
- binary
- min height
- Next: self-balancing search trees.
End (transciber’s note: not the end)
(some repeated slides and graphics)
Notice:
Because a perfect binary tree of height h has:
- h height
- nodes
- internal nodes (nodes with children)
- leaves
Then: