Sorting

Sorting

The iterative algorithms

sort(A){
  * initialize 
  * repeat n-1 times
    * move 1 element from unsorted and sorted part
}

Insertion Sort

Diagram of array as it gets sorted in three stages:

Code:

insertion sort(A){
  for(i=1 to n-1){
    pivot = A[i] // first element in unsorted part
    j=i-1
    // The following loop shifts all elements in sorted parts that are larger than pivot 1 "to the right"
    while(j>=0 AND A[i] > pivot){
      A[j+i] = A[j] // shift jth
      j = j-1
    }
    A[j+i] = pivot // move pivot into position.
  }
}

Insertion Sort Example

Stages:

Selection Sort

Diagram of parts:

Code:

selection_sort(A){
  for(i=1 to n-1){
    // find min element of unsorted
    j=i-1 // j is index of min found so far.
    k=i
    while(k<n){
      if(A[k]<A[j]) j=k;
      k=k+1
    }
    swap A[i-1] and A[j]
  }
}

Process of Selection Sort:

Heapsort (Selection Sort is crossed out)

Consider the organization of array contents:

  1. (Diagram of array with sorted half on the right and the unsorted half on the left.) A purple arrow points to the leftmost element in the unsorted portion. The note reads: “if this is the root of the heap, then it is also the smallest element in the unsorted part, so is in its correct final position. To use this arrangement, the root of the heap keeps moving, so we have lots of shifting to do.”
  2. (A diagram showing the same array with sorted and unsorted halves.) A purple arrow points to the last element in the array; it points to a purple circle. A purple square is at the leftmost element of the unsorted half (the one discussed in the last item). The note reads: “If this is the root of the, then everything works:
    • We extract the final element (purple circle); move the last leaf (purple square) to the root + do a percolate-down; store the final element (purple circle) where the last element of the unsorted list (purple square) was, which is now free, and is the correct final location for the previously final element (purple circle); after which we have:
    • (Diagram of array with the “sorted” half extended one cell over to encompass the purple circle) * But: we must re-code our heap implementation s.t. the root is at A[n-1], with the result that the indexing is now less intuitive.
  3. Instead, we use a max-heap, and this arrangement:
    • (Diagram showcasing, as previously, a sorted half to the right and an unsorted half on the left. An orange circle labeled “root of heap” is the very first element of the list and the unsorted half; an orange square labeled “last leaf” sits at the end (rightmost side) of the unsorted half.)
    • The heap root is at A[0]
    • Heap Extraction remove the root of the heap (orange circle), moves the last leaf (orange square) to A[0], freeing up the spot where the root of the heap (orange circle) belongs.
    • This leaves us with: (Diagram of the orange circle near the middle of the array, at the leftmost portion of the sorted half. The orange square is in the center of the unsorted half.)
    • Re-coding a min heap into a max heap is just replacing < with > and vice versa.

Heapsort (Selectioon Sort is crossed out)

Code:

heapsort(A){
  buildMaxHeap(A)
  for(i=1 to n-1){
    A[n-1] extractMax()
  }
}

Stages of sorting:

Unsorted heap of size 1 has smallest element.

Heapsort with in-line percolate-down

Code:

heapsort(A){
  makeMaxHeap(A)
  for(i=1 to n-1){
    swap A[0] and A[n-1] // move last leaf to root and old root to where last leaf was
    size <- n-i+1 // size of heap = size of unsorted part
    // start of percolate down
    j <- 0
    while(2j+1 < size){
      child <- 2j+1
      if(2j+2 < size AND A[2j+2] < A[2j+1]){
        child <- 2j+2
      }
      if(A[child]<A[j]){
        swap A[child] and A[j]
        j <- child
      } else {
        j <- size // termite the while
      }
    } // end of percolate down
  }
}

Heapsort Example

Tree version of above (heap):

Original:

After re-heap and one removal:

After a second re-heap and removal:

After a third:

Examples stop here.

Heapsort Example (2)

(Repeat same as above, except with different trees.)

Trees (Transcriber’s note: these trees don’t seem relavant to me…. but maybe I’m wrong):

Time Complexity of Iterative Sorting Algorithms

Selection Sort
exactly n-i comparisons to find num element in unsorted part
Insertion Sort
between 1 and i comparisons to find location for pivot
HeapSort
between 1 and 2log2(ni1)2\log_{2} (n-i-1) comparisons for percolate-down

* Number of comparisons

Selection Sort

On input of size n, # of comparisons is always (regardless of input):

i=1n1(ni)=i=1n1i=S(ni)=(n1)(n)2=n2n2=Θ(n2) \begin{aligned} \sum_{i=1}^{n-1} (n-i) & = \sum_{i=1}^{n-1} i\\ & = S(n-i)\\ & = \frac{(n-1)(n)}{2}\\ & = \frac{n^2 -n}{2}\\ & = \Theta(n^2) \end{aligned}

Insertion Sort – Worst Case

Upper Bound: # comparisonsi=1n1i=n2n2=O(n2)\text{\# comparisons} \leq \sum_{i=1}^{n-1} i = \frac{n^{2} -n}{2} = O(n^{2})

Lower Bound:

So, insertion sort worst case is Θ(n2)\Theta(n^{2})

(Transcriber’s note: I’m fairly certain you can only use big-O notation when talking about worst case scenario, not Theta. But I’m leaving it as written.)

Insertion Sort Best Case

Best case: initial sequence is fully ordered.

Then: In each stage, exactly 1 comparison is made.

So, # comparisons=n1=Θ(n)\text{\# comparisons} = n-1 = \Theta(n).

Heapsort Worst Case

Upper bound:

# comparisonsi=1n12log2(ni+1)=2i=1n1log2(i+1)=2i=1n1log2n=2nlog2n=O(nlogn) \begin{aligned} \text{\# comparisons} & \leq \sum_{i=1}^{n-1} 2\log_{2} (n-i+1)\\ & = 2\sum_{i=1}^{n-1} \log_{2} (i+1)\\ & = \leq 2\sum_{i=1}^{n-1} \log_{2} n\\ & = \leq 2n\log_{2} n\\ & = O(n \log n) \end{aligned}

Lower Bound? (empty space)

Base Case? (What input would lead to no movement during percolate-down? What if we exclude this case?)

Recursive Divide & Conquer Sorting

Diagram showing A: (Back to second occurrence of the diagram)

  1. Original
    • 20 30 1 6
  2. Partition
    • Two separate arrays with no items shown.
  3. Combine
    • 1 6 20 30

The algorithms differ in how they choose the partition, and how they combine the sorted parts.

Mergesort

Takes O(n) time, where n is the total size.

Mergesort

Diagram showing the sorting of a list:

Mergesort

Code:

mergesort(A,lo,hi){
  if(lo<hi){// there are >=2 items, so work to do
    mid <- floor((lo+hi)/2)
    mergesort(A,lo,mid)
    mergesort(A,mid+1,hi)
    merge(A,low,mid,hi)
  }
}

Merge for Merge sort

Code:

merge(A,lo,mid,hi){
  l <- lo
  r <- mid+1
  n <- lo
  while(l<mid AND r<hi){
    if(A[l]<A[r]){
      B[n] <- A[i]
      l++
    } else {
      B[n] <- A[r]
      r++
    }
  }
  while(l<mid){
    B[n] <- A[l]
    l++;n++
  }
  while(r<hi){
    B[n] <- A[r]
    r++;n++
  }
}// *

After *, the sorted sequence is in B[lo]…B[hi].

Time Complexity of MergeSort: via tree of recursive calls

Each level takes O(n), so total time is log2n\log_{2} n

Recursive Divide & Conquer Sorting

Quicksort

Diagram:

Quicksort

Code:

quicksort(A,lo,hi){
  if(lo<hi){// there are >= 2 items
    pivot position <- partition(A,lo,hi)// partition
    quicksort(A,lo,pivot position-1)
    quicksort(A,pivot position+1,hi)
  }
}

Quicksort is correct as long as every call to partition() returns and leaves the variables satisfying the following:

  1. lo <= pivot position <= hi
  2. for every i,j with loipivot positionjhi;A[i]A[pivot position]A[hi]\text{lo} \leq i \leq \text{pivot position} \leq j \leq \text{hi}; A[i] \leq A[\text{pivot position}] \leq A[\text{hi}]

However efficiency relies critically on choice of pivot.

Ex: Perfect Pivots

Ex: Worst Case Path

Quicksort takes time Θ(n2)\Theta(n^{2}) in the worst case.

Partition

Partition must choose a pivot p and efficiently re-arrange elements

Code:

partition(A,lo,hi){
  pivotindex <- choosePivot(A,lo,hi) // choose pivot
  swap A[pivotindex] and A[hi] // move pivot out of the way
  p <- A[hi] // p is the pivot
  i <- lo // known "small" values will be at nodes < i
  for(j=lo;j<hi;j++){ // "already inspected" values will be at incicies < j
    if(A[j]<=p){ // if we are inspecting a "small"
      swap A[i] and A[j] // swap it with first "non small"
      i <- i+1 // increase size of "smalls" part
    }
  }
  swap A[i] and A[hi] // move pivot where it belongs
  return i // this is pivot position
}
Name Description
lo known to be small
i known to be large
j not inspected yet; j itself is currently being inspected
hi pivot

Partition Example

(Start of a diagram of how a quick sort happens over time.)

Legend:

-
known small
+
known large

The modifier will be displayed after the value to not be confused with negative and positive symbols.

If a variable (i.e. i, j) are set to an index of the array, it will show up in parenthesis after the value.

NOTE: Every table below has unseen elements to the left and right like so:

array values

However, because all the action is happening inside the visible part of the array, the left and right ellipses will not be shown below.

(End of diagram)

Partition

Consider

Some “simple” choosePivot options

Complexity of Quicksort

In practice

Code:

quicksort(A,lo,hi){
  if(lo<hi){// there are >=2 items
    if(lo+15>hi){// less than 15 items
      selectionSort(A,lo,hi)
    }
    else {
      pivotposition <- partition(A,lo,hi) // partition
      quicksort(A,lo,pivotposition-1)
      quicksort(A,pivotposition+1,hi)
    }
  }
}

End