Sorting
Sorting
- re-arranging elements of a sequence
- We will look at 5 sorting algorithms:
- 3 iterative
- 2 recursive
The iterative algorithms
- maintain a partition: “unsorted part” & “srtoed part”
- sort a sequence of n elements in n-1 stages
- at each stage, move 1 element from the unsorted part to the sorted part:
- (Diagram of a generic array with unseen “sorted” items on the left and “unsorted” elements on the right. Caption: “1 stage moves 1 element”)
sort(A){
* initialize
* repeat n-1 times
* move 1 element from unsorted and sorted part
}
- The algorithms differ in how they:
- select an element to remove from the unsorted part
- insert it into the sorted part
Insertion Sort
- Initially sorted part is just A[0]
- Repeat n-1 times
- remove the first element from the unsorted part
- insert it into the sorted part (shifting elements to the right as needed)
Diagram of array as it gets sorted in three stages:
- Stage 1: sorted is leftmost (0th) element; n-1 elements are unsorted on the right.
- Stage 2: approximately half of the array is sorted; an arrow points from the leftmost value inside the unsorted side to an arbitrary position inside the sorted side.
- Stage 3: just over half of the array is sorted now.
Code:
insertion sort(A){
for(i=1 to n-1){
pivot = A[i] // first element in unsorted part
j=i-1
// The following loop shifts all elements in sorted parts that are larger than pivot 1 "to the right"
while(j>=0 AND A[i] > pivot){
A[j+i] = A[j] // shift jth
j = j-1
}
A[j+i] = pivot // move pivot into position.
}
}
Insertion Sort Example
Stages:
- Stage 0: Original
-
5 4 2 6 1 3
-
- Stage 1: (label: 4)
-
4 5 2 6 1 3
-
- Stage 2: (label: 2)
-
2 4 5 6 1 3
-
- Stage 3: (label: 6)
-
2 4 5 6 1 3
-
- Stage 4: (label: 1)
-
1 2 4 5 6 3
-
- Stage 5: (label: 3)
-
1 2 3 4 5 6
-
Selection Sort
- initially sorted part is empty
- repeat n-1 times
- find the smallest element in the unsorted part
- make it the first position which becomes the now last position of sorted part.
Diagram of parts:
- Initially, the entire array is all unsorted.
- Over time the sorted elements stack up on the left.
- Every time an element is moved, it is moved from the unsorted part (lowest element) and swapped with the element just after the end of the sorted part, making the sorted part one element bigger.
- Eventually all elements are sorted in descending order.
Code:
selection_sort(A){
for(i=1 to n-1){
// find min element of unsorted
j=i-1 // j is index of min found so far.
k=i
while(k<n){
if(A[k]<A[j]) j=k;
k=k+1
}
swap A[i-1] and A[j]
}
}
Process of Selection Sort:
- Original: all unsorted
-
5 4 2 6 1 3
-
- Stage 1: [0] is sorted; 1 and 5 swap
-
1 4 2 6 5 3
-
- Stage 2: [0..1] is sorted; 2 and 4 swap
-
1 2 4 6 5 3
-
- Stage 3: [0..2] is sorted; 3 and 4 swap
-
1 2 3 6 5 4
-
- Stage 4: [0..3] is sorted; 4 and 6 swap
-
1 2 3 4 5 6
-
- Stage 5: [0..4] is sorted; annotation: s.t. s (final stage)
-
1 2 3 4 5 6
-
Heapsort (Selection Sort is crossed out)
- Initially sorted part empty
- (highlighted) make unsorted part into a heap
- repeat n-1 times
- find the smallest element in the unsorted part (Note: heap extract takes log(n) time vs. Θ(n) for the scan in selection sort)
- move it to the first position which becomes the new last position of the started part.
Consider the organization of array contents:
- (Diagram of array with sorted half on the right and the unsorted half on the left.) A purple arrow points to the leftmost element in the unsorted portion. The note reads: “if this is the root of the heap, then it is also the smallest element in the unsorted part, so is in its correct final position. To use this arrangement, the root of the heap keeps moving, so we have lots of shifting to do.”
- (A diagram showing the same array with sorted and unsorted halves.) A purple arrow points to the last element in the array; it points to a purple circle. A purple square is at the leftmost element of the unsorted half (the one discussed in the last item). The note reads: “If this is the root of the, then everything works:
- We extract the final element (purple circle); move the last leaf (purple square) to the root + do a percolate-down; store the final element (purple circle) where the last element of the unsorted list (purple square) was, which is now free, and is the correct final location for the previously final element (purple circle); after which we have:
- (Diagram of array with the “sorted” half extended one cell over to encompass the purple circle) * But: we must re-code our heap implementation s.t. the root is at A[n-1], with the result that the indexing is now less intuitive.
- Instead, we use a max-heap, and this arrangement:
- (Diagram showcasing, as previously, a sorted half to the right and an unsorted half on the left. An orange circle labeled “root of heap” is the very first element of the list and the unsorted half; an orange square labeled “last leaf” sits at the end (rightmost side) of the unsorted half.)
- The heap root is at A[0]
- Heap Extraction remove the root of the heap (orange circle), moves the last leaf (orange square) to A[0], freeing up the spot where the root of the heap (orange circle) belongs.
- This leaves us with: (Diagram of the orange circle near the middle of the array, at the leftmost portion of the sorted half. The orange square is in the center of the unsorted half.)
- Re-coding a min heap into a max heap is just replacing < with > and vice versa.
Heapsort (Selectioon Sort is crossed out)
- initially sorted part empty
- (highlighted) make unsorted part into a max heap
- repeat n-1 times:
- find the largest (smallest is crossed out) element in the unsorted part
- move it to the last (first is crossed out) position which becomes the new first (last is crossed out) position of the sorted part.
Code:
heapsort(A){
buildMaxHeap(A)
for(i=1 to n-1){
A[n-1] extractMax()
}
}
Stages of sorting:
- (Diagram of unsorted array with first element labeled as “heap with max here”.)
- (Diagram of a half-sorted array showing the swap between the first and last elements of the unsorted portion of the array. Labeled as “take max element from root…” and “take last leaf from end of heap” with arrows pointing to one another.)
- (Diagram of a half+1 sorted array, displaying the new sorted element that has been swapped from the root element of the heap. Labeled as “newest element of sorted part” and “this is the final location” (the new element just swapped), and “y new root of heap (which then gets percolated down)” (what is now the first element of the array, which was also just swapped).)
Unsorted heap of size 1 has smallest element.
Heapsort with in-line percolate-down
Code:
heapsort(A){
makeMaxHeap(A)
for(i=1 to n-1){
swap A[0] and A[n-1] // move last leaf to root and old root to where last leaf was
size <- n-i+1 // size of heap = size of unsorted part
// start of percolate down
j <- 0
while(2j+1 < size){
child <- 2j+1
if(2j+2 < size AND A[2j+2] < A[2j+1]){
child <- 2j+2
}
if(A[child]<A[j]){
swap A[child] and A[j]
j <- child
} else {
j <- size // termite the while
}
} // end of percolate down
}
}
Heapsort Example
- Original:
-
5 4 2 6 1 3
-
- Turn into heap:
-
6 5 3 4 1 2
-
- Swap root (6) and last unsorted element (2):
-
2 5 3 4 1 6
-
- Re-heap the unsorted portion: [0..4]
-
5 4 3 2 1 6
-
- Swap root (5) and the last unsorted element (2):
-
1 4 3 2 5 6
-
- Re-heap the unsorted portion: [0..3]
-
4 2 3 1 5 6
-
- Swap root (4) and the last unsorted element (1):
-
1 2 3 4 5 6
-
- Re-heap unsorted portion: [0..2]
-
3 2 1 4 5 6
-
- Swap root (3) and last unsorted element (1):
-
1 2 3 4 5 6
-
- Re-heap unsorted portion: [0..1]
-
2 1 3 4 5 6
-
- Swap root (2) and last unsorted element (1):
-
1 2 3 4 5 6
-
- Array is sorted because unsorted portion is only 1 element.
Tree version of above (heap):
Original:
-
5
-
4
- 6
- 1
-
2
- 3 (left)
-
4
After re-heap and one removal:
-
2
-
5
- 2
- 1
- 3
-
5
After a second re-heap and removal:
-
1
-
4
- 2 (left)
- 3
-
4
After a third:
-
1
- 2
- 3
Examples stop here.
Heapsort Example (2)
(Repeat same as above, except with different trees.)
Trees (Transcriber’s note: these trees don’t seem relavant to me…. but maybe I’m wrong):
-
2 (crossed out with orange 5)
-
5 (crossed out next to orange 2 which is also crossed out; an orange 4 is not crossed out)
- 4 (crossed out with orange 2)
- 1
- 3
-
5 (crossed out next to orange 2 which is also crossed out; an orange 4 is not crossed out)
-
1 (crossed out with an orange 4)
-
4 (crossed out with orange 1, which is also crossed out; an orange 2, not crossed out is next to it)
- 2 (left; crossed out with orange 1)
- 3
-
4 (crossed out with orange 1, which is also crossed out; an orange 2, not crossed out is next to it)
-
1 (orange 2)
- 2 (left; orange 1)
Time Complexity of Iterative Sorting Algorithms
- each algorithm does exactly n-1 stages
- the work done at the ith stage varies with the algorithm (& input).
- we take # from item comparisons as a measure of work/time*.
- Selection Sort
- exactly n-i comparisons to find num element in unsorted part
- Insertion Sort
- between 1 and i comparisons to find location for pivot
- HeapSort
- between 1 and comparisons for percolate-down
* Number of comparisons
- We must verify # comparisons (or some constant times # comparisons) is an upper bound on work done by each algorithm.
-
of assignments (& swaps) also matters in actual run time.
Selection Sort
On input of size n, # of comparisons is always (regardless of input):
Insertion Sort – Worst Case
Upper Bound:
Lower Bound:
- Worst case initial sequence is in reverse order. e.g.:
-
n n-1 n-2 … 1
-
- In the ith stage we have:
-
n-i+1 n-1+2 … n n-1 n-1-1 … 2 1 -
n-i n-i+1 … n-1 n n-i-1 … 2 1
-
- This takes i comparisons, because the sorted part is of size i.
- So,
So, insertion sort worst case is
(Transcriber’s note: I’m fairly certain you can only use big-O notation when talking about worst case scenario, not Theta. But I’m leaving it as written.)
Insertion Sort Best Case
Best case: initial sequence is fully ordered.
Then: In each stage, exactly 1 comparison is made.
So, .
Heapsort Worst Case
Upper bound:
Lower Bound? (empty space)
Base Case? (What input would lead to no movement during percolate-down? What if we exclude this case?)
Recursive Divide & Conquer Sorting
TODO