Binary Search Trees

CMPT 225

Set: unordered collection of values/objects.
Operations:
- insert(x) // add x to set
- member(x) // check if x in set. a.k.a. find(x), search(x), lookup(x), …
- remove(x) // remove x from set
- size() // get size of set
- empty() // is set empty
- clear() // remove all elements (i.e., make set empty)
We call the values we store keys,
We assume the keys are from some unordered set S.
- i.e. for any two keys $x,y\in S$ , we have exactly one of $x< y,x=y,y< x$
What implementaions where all operations are efficient/fast
- Q: What will count as “fast”?

Consider time complexity of operations ofr simple list + array implementations!

type	insert	find	remove
un-ordered array	(green) O(1)	O(n)	O(n)
ordered array	(red) O(n)	(purple outline) O(log n)	(red) O(n)
un-ordered linked list	(green) O(1)	(red) O(n)	(red) O(n)
ordered linked list	(red) O(n)	(red) O(n)	(red) O(n)

Q: What will count as “fast”?

A: Time O(log n) //n is size of set

Multiset: like set, but with multiplicities (aka bag)
- count(x)
Map: unordered collection of >key,value< pairs, associating at most one value with each key. (e.g. partial function keys -> values)
- put(key,val) // in place of insert(x)
- get(key) // return value associated with key
Dictionary: like map, but associates a collection of values with each key.

Implementations of these are simple abstractions to implementations of sets, which we focus on.

Binary Search Tree (B.S.T.)

A BST is:

a binary tree // a structure invariant
with nodes labled by keys
satisfying the following order invariant: for every two nodes u,v:
- if u is in left subtree of v, then $\text{key}(u) < \text{key}(v)$
- if u is in the right subtree of v, then $\text{key}(u) > \text{key}(v)$

Ex.

Example 1 (checked):

5
- 3
- 6

Example 2 (X):

5
- 3
  - 2
  - 6
- 8

Example 3 (check) (right and left are written explicitly when there are not two nodes to write. Otherwise, left is written first and right is listed second.):

5
- 10 (right)
  - 20 (right)
    - 30 (right)
      - 25 (left)

Example 4 (check):

Every sub-tree of a BST is a BST

500
- 200
  - 100
  - 300
    - 250
    - 350
- 700
  - 600
    - 650 (keys in this subtree would be >600 , <700)
    - 560
  - 800

This makes recursive algorithms very nature.

Fact:

In-order traversal of a BST visits keys in non-decreasing order.

Proof sketch:

Basic: $h=0$ , so one node
I.H.: the claim holds for all trees of $\text{height} \leq h$
I.S.: T is: v with left tree of A and right tree of B. (A, B may b e empty)
We:
1. traverse A, visiting keys in sequence: $a_{1}, a_{2}, \dots a_{k}$
2. visit v
3. tranverse B, visiting keys in sequence $b_{1}, b_{2}, \dots b_{m}$
Overall, we visit: $a_1, a_2, \dots a_k, b_1, b_2, \dots b_m$
By I.E. $a_1 \leq a_2 \leq \dots \leq a_k$
$b_1, \leq b_2 \leq \dots \leq b_m$
Because T is a BST, so:

$a_k \leq \text{key}(v) < b_1\\ \therefore a_1 \leq a_2 \leq \dots \leq a_k \leq \text{keys}(v) \leq b_1 \leq b_2 \dots \leq b_m$

BST Find/Search: examples

(Trasncriber’s note: the links are the search path for the algorithms)

find(5):

3
- 2
- 8
  - 5 (check)
    - 4
    - 6
  - 9

find(1):

3
- 2
- 8
  - 5
    - 4
    - 6 (X)
  - 9

Find 6:

5
- 8 (right, X)
  - 10 (right)

Some notation:

Suppose v is a node of BST. We write:

left(v) = left child of v
right(v) = right child of v
key(v) = key labelling v
node(v) node v s.t. key(v)=x

BSD find(x) Pseudo-code

find(x){// return true iff t is in the tree.
  return find(t,root)
}

find(t,v)// return true if t appears in ubstree rooted at v.
{
  if t < key(v) & v has a left subtree
    return find(t, left(v))
  if t > key(v) & v has a right subtree
    return find(t, right(v))
  if key(v) = t
    return true
  return false //v is a leaf, does not have t
}

BST find(t,v) pseudo-code – alternate version

find(t,v) // return true if t appears in subtree rooted at v
{
  if key(v)=t
    return true
  if t < key(v) & v has a left subtree
    return find(t,left(v))
  if t > key(v) & v has a right subtree
    return find(t,right(v))
  return false
}

Q: Which version is better?

A: key(v)=t will almsot always be false, so the first return should do fewer comparisons and usually be false.

BST insert(x) Pseudo-code

insert(t){
  // adds t to the tree
  // assumes t is not in the tree already*
  u <- node at which find(t,root) terminates**
  if t<key(u)
    give u a new left child with key t.
  else
    give u a new right child with key t.
}

* Excersise: Write the version that does not make this assumption.

** Excersise: Write the version where the search is excplicit.

BST Insert Examples

insert(1):

3
- 2
  - 1 (inserted)
- 8
  - 5
    - 4
    - 6
  - 9

insert(7):

3
- 2
- 8
  - 5
    - 4
    - 6
      - 7 (right)
  - 9

BST insert(x) Pseudo-code – explicit search version…

insert(t){ //adds t to the tree if it is not already there
  insert(t, root)
}
insert(t,v) //insert t in the subtree rooted at v, if it is not there
{
  if t < key(v) & v has a left subtree
    insert(t, left(v))
  if t > key(v) & v has a right subtree
    insert(t, right(v))
  if t < key(v) //here v has no left child
    give v a new left child with key t
  if t > key(v) //here v has no right child
    give v a new right child with key t.
  // if we reach here, t=key(v), so do nothing.
}

Insertion Over for BSTs: Examples

1)

start with an empty BST
insert 5,2,3,7,8,1,6 in the given order

5
- 2
  - 1
  - 3
- 7
  - 6
  - 8

2)

start with an empty BST
insert 1,2,3,5,6,7,8 in the order given

1
- 2 (right)
  - 3 (right)
    - 4 (right)
      - 5 (right)
        
        6 (right)
        
        7 (right)
        
        8 (right)

Notes

Insertion order affects the shape of a BST.
Removal order can too.

BST remove(t)

We consider 3 cases, increasing difficulty.

Case 1: t is at a leaf (example figure #1):
1. find the node v with key(v)=t
2. delete v
Case 2: t is a node with 1 child (example figure #2 and example figure #3)
1. find the node v with key(v)=t
2. let u be the child of v
3. replace v with the subtree rooted at u
For case 3, see the next section

Example Figure #1

remove(7):

5
- 3
  - 2 (left)
- 8
  - 7 (Xed out)
  - 9

Example Figure #2

remove(3)

step 1 (original)

5
- 3 (Xed out)
  - 1 (left)
- 10

step 2

5
- 10 (right)

step 3

5
- 1
- 10

Example Figure #4

remove(10)

step 1 (original)

4
- 2
- 10 (Xed out)
  - 7 (left)
    - 6
      - 5 (left)
    - 8

step 2

4
- 2 (left)

7
- 6
  - 5 (left
- 8

step 3

4
- 2
- 7
  - 6
    - 5 (left
  - 8

BST remove: Case 3 Preperation: Successors

In an ordered collection $X=\langle \cdots s_{i-1}, s_{i}, s_{i+1}, s_{i+2} \cdots \rangle$ $X = ⟨ \dots s_{i - 1}, s_{i}, s_{i + 1}, s_{i + 2} \dots ⟩$
- $s_{i-1}$ is the predocessor of $s_{i}$
- $s_{i+1}$ is the successor if $s_{i}$
- Write: $\text{succ}_{x}(s_{i}) = s_{i+1}$
Let $V=\langle v_{1},\cdots v_{n}\rangle$ be the nodes of the tree ordered as per an in-order traversal.
Let $K=\langle k_{1},\cdots ,k_{n}$ \rangle$$ be the keys, in non-decreasing order.
Then: $y=\text{key}(u) \implies \text{succ}_{k}(y) = \text{key}(\text{succ}_{v}(u))$ i.e., the next node has the next key.

BST remove: Case 3 Preperation: Successorts in BSTs

If S is a set of keys, and $x\in S$ , then the successor of x in S is the smallest value $y\in S \text{ s.t. } x< y$ . Ex. $S=\{ 19, 27, 8, 3, 12 \}, \text{succ}(8)=12, \text{succ}(12)=19, \cdots$ $(S=\{3,8,12,19,27\})$
In a BST, in-order traversal visits keys in order.
- Let S be the set of keys in BST T.
- the successor of x in S is $\text{key}(u)$ where u is the node of T that an in-order traversal of T visits next after v.

5
- 3
  - 2 (left)
- 8
  - 7
  - 9

If v is a node of BST T, then we can say the successor of v in T is the node of T visited just after v by an in-order traversal of T. Then: $\text{succ}(x)=\text{key}(\text{succ}(\text{node}(x)))$
Or: if $\text{key}(v)=x$ , we can find the successor of x by finding the successor node of v, and getting its key: $\text{succ}(\text{key}(v)) = \text{key}(\text{succ}(v))$

BST remove: Case 3 Preperation: Successors

If node v has a right child, it is easy to find its successor: $\text{succ}(v)$ is the first node visited by an in-order traversal of the right subtree of v.

Ex. 6 diagrams. All of which give v a right subtree, one of one node, one of one node with a left child, one with a left leaf and right subtree of its own, and three variations on arbitrary numbers of children attached to the left node of v.

To find the successor of node v that has a right child, use:

succ(v){
  u<-right(v)
  while(left(u) exists){
    u<-left(u)
  }
  return u
}

BST remove(t)

Case 3: t is at a node with 2 children:

find the node v with key(v)=t
find the successor of v – call it u.
key(v)<-key(u) //replace t with succ(t) at v.
delete u:
1. if u is a leaf, delete it.
2. if u is not a leaf, it has one child w, replace u with the subtree rooted at w.

Notice: 4.1 is like case 1; 4.2 is like case 2.

BST remove(k) when node(k) has two children

Ex. to remove 5:

Find 5
Find successor of 5
Replace 5 with its succ.
In this example, succ(5) has no children so just delete the node where it was.

Example tree:

20 (link starts step 1.)
- 15
  - 5 (left; Xed out; link starts step 2.)
    - 2
    - 10
      - 7
        
        6 (successor of 5)
        
        8
      - 12
- 25
  - 22
  - 26

After switching 5 and succ(5):

(transcriber’s note: may be incorrect, but I’m writing what’s there)

20
- 6
  - 2
  - 10
    - 7
      - 8 (right)
    - 12
- ...

Example tree 2:

To remove 6:

Find 6
Find successor of 6
Replace 6 with its successor
Replace succ(6) with its non-empty subtree

Tree:

30
- 2
  - 1
  - 6 (crossed out; link starts step 2.)
    - ...
    - 14
      - > 11
        
        7 (succ of 6)
        
        9 (right; subtree of succ(6))
        
        8 (subtree of succ(6))
        
        10 (subtree of succ(6))
        
        12
      - ...
- ...

Becomes, by step 4:

30
- 2
  - 1
  - 7
    - ...
    - 14
      - 11
        
        9
        
        8
        
        10
        
        12
      - ...
- ...

Complexity of BST Operations

Measure as a function of: height (h) or size/# of keys (n).
All operations essentially involve traversing a path from the root to a node v, where in the worst case v is a leaf of maximum depth.
So:
- find: O(h), O(n)
- insert: O(h), O(n)
- remove: O(h), O(n)
For “short bushy” trees (e.g. T1) h is small relative in n.
For “tall skinny” trees (e.g. T2) h is proportional to n.

Q: Can we always have short bushy BSTs?

T1 $h = ?$

node
- node
  - node
  - node
- node
  - node
  - node

T2 $h \cong n$

node
- node
  - node
    - ...
      - node
        
        node

Perfect Binary Tree

A perfect binary tree of height h is a binary tree of height h with the max number of nodes:

1 (yes):

node

2 (no):

node
- node (left)

3 (yes):

node
- node
- node

4 (yes):

node
- node
  - node
  - node
- node
  - node
  - node

5 (no):

node
- node
  - node
    - node
    - node
  - node
    - node
    - node
- node
  - node
    - node (right)
  - node
    - node
    - node

6 (no):

node
- node (right)
  - node (right)
    - node (right)
      - node (right)

Claim: Every perfect binary tree of height h has $2^{\text{htl}}-1$ nodes.
Pg: By induction on h, or on the structure of the tree.
Basis: If h=0, there is one node (the root). We have $2^{\text{htl}}-1 = 2^{1}-1=1$ as required.
I.H.: Let $k \geq o$ , and assume that every perfect binary tree of height k has $2^{kh}-1$ nodes.
T.S.: (Need to show a plot of height k+1 has $2^{(k+1)+1}-1$ nodes). A perfect binary tree of height k+1 is constructed as: k is height of left (A) or right (B) subtree; k+1 is the height of the subtree plus one (the root). Where A,B are perfect binary trees of height k. By I.H. they have $2^{k+1}-1$ nodes. So, the tree has $2^{k+1}-1 + 2^{k+1}-1 + 1 = 2\times 2^{k+1} -1 = 2^{(k+1)+1}-1$ , as required.

Existance of Optimal BSTs

Claim: For every set S of n keys, there exists a BST for S with height at most $1+\log_{2} n$

Proof: Let h be the smallest integer s.t. $2^{h} \geq n$ , and let $m=2^{h}$ . So,

$2^{h} \geq n > 2^{h-1}\\ \log_{2} 2^{h} \geq \log_{2} n > \log_{2} 2^{h-1}\\ h \geq \log_{2} n > h-1\\ h < 1+\log_{2} n$

let T be the perfect binary tree of height h

Label the first n nodes of T (as visited by an in-order traversal) with the keys of S, and delete the remaining ndoes (to get $T^{1}$ ).

$T^{1}$ is a BST for S with height $h< 1+\log_{2} n$

So, there is always a BST with height $O(\log n)$ .

Optimal BST Insertion Order

Given a set of keys, we can insert them so as to get a minimum height BST:

Consider:

Graph of a perfect tree, with height of 4. Every node has two children, except for the 8 leafs.

What can we say about the key at the root? It is the median key.

Observe: the first key inserted into a BST is at the root forever (unless we remove it from the BST).

Given a set of keys, we can insert them to get a minimum height BST:

(transcriber’s note: I may have done this wrong, the drawing of the following tree is very weird.)

1
- 2
- 2

* apply the “root is the median key” principle to each subtree.

So, there is always a BST with height $\cong\log n$

Can we maintain min. height with $O(\log n)$ as we insert and remove keys?

Consider A:

5
- 3
  - 2
  - 4
- 7
  - 6 (left)

insert(1) would make it become B:

4
- 2
  - 1
  - 3
- 6
  - 5
  - 7

B is the only min height BST for 1..7.
A -> B requires “moving every node”
To get $O(\log n)$ operations, we need antoher kind of search tree, other than plain BSTs.
To get efficient search trees, give up at least one of:
- binary
- min height
Next: self-balancing search trees.

End (transciber’s note: not the end)

(some repeated slides and graphics)

Notice:

Because a perfect binary tree of height h has:

h height
$2^{h+1}-1$ nodes
$2^{h}-1$ internal nodes (nodes with children)
$2^h$ leaves

Then: $2^{h} + 2^{h}-1 = 2\times 2^{h}-1 = 2^{h+1}-1$

Binary Search Trees

ADTs related to Sets

ADTs related to Sets

Some Related Container ADTs

Binary Search Tree (B.S.T.)

Ex.

Every sub-tree of a BST is a BST

This makes recursive algorithms very nature.

Fact:

BST Find/Search: examples

Some notation:

BSD find(x) Pseudo-code

BST find(t,v) pseudo-code – alternate version

BST insert(x) Pseudo-code

BST Insert Examples

BST insert(x) Pseudo-code – explicit search version…

Insertion Over for BSTs: Examples

1)

2)

Notes

BST remove(t)

Example Figure #1

Example Figure #2

step 1 (original)

step 2

step 3

Example Figure #4

step 1 (original)

step 2

step 3

BST remove: Case 3 Preperation: Successors

BST remove: Case 3 Preperation: Successorts in BSTs

BST remove: Case 3 Preperation: Successors

BST remove(t)

BST remove(k) when node(k) has two children

Complexity of BST Operations

T1 h=? h = ? h=?

T2 h≅nh \cong nh≅n

Perfect Binary Tree

Existance of Optimal BSTs

Optimal BST Insertion Order

End (transciber’s note: not the end)

Notice:

Actual end

T1 $h = ?$

T2 $h \cong n$