Version: 0.3.38

5. Merge Sort#

Lecture:

Lecture 3.5 (slides)

Objectives:

Understand how does merge sort works, and why does it sort fast

Concepts:

Merge sort using split and merge, master theorem

We now look at another well known recursive sorting algorithm: Merge sort1Yet another great contribution of John Von Neumann., running among others behind Collections.sort in Java. In the worst case, merge sort runs in \(O(n \log n)\) and beats quick sort, whose runs in \(O(n^2)\), in the worst case.

5.1. Merge Sort#

5.1.1. Principle: Split & Merge#

As many recursive algorithms, merge sort is very succinct. It goes as follows:

  1. Split the sequence into two halves, namely left and right

  2. Sort both left and right using merge sort

  3. Merge back left and right into a single, sorted sequence

Why does that work? Conceptually, we recursively split the given sequence until we get single-item sequences, which are sorted by definition. Then we can merge then two by two until have sorted the complete sequence. Fig. 5.8 illustrates this behaviour.

../_images/unfolding.svg

Fig. 5.8 Applying merge sort to the sequence \(s=(23,2,7,18,11,10,5)\).#

5.1.2. The Merge Algorithm#

To merge two sorted sequences, we proceed as follows:

  1. We create a new sequence, which will hold the merge

  2. We keep track of our position (i.e., an iterator) for each sequences, say left and right. We start by pointing the two first items.

  3. As long as we can move in both sequences:

    1. We compare the current left and right values, and we copy the smallest one into our result sequence.

    2. We increment the position of the sequence that yielded that minimum.

  4. As soon as we have exhausted the items in one sequence, we copy the remaining items of the other into the result sequence.

The Fig. 5.9 illustrates this procedure. We are merging two sequences, namely \(\textrm{left} = (2,7,11,23)\) and \(\textrm{right}=(5,10,18)\). One each sequence, we are comparing the 3rd item (10) the 2nd item (11). As 10 is the smallest, we insert at the end of the result sequence, and we increment the right current pointer.

../_images/merging.svg

Fig. 5.9 Merging two sorted sequences (left & right) into a new sequence#

Listing 5.4 Merging two sorted sequences in Ruby#
def merge(left, right)
  result = []
  leftIndex, rightIndex = 0, 0
  while leftIndex < left.length and rightIndex < right.length
    if left[leftIndex] < right[rightIndex]
      result.append(left[leftIndex])
      leftIndex += 1
    else
      result.append(right[rightIndex])
      rightIndex += 1
    end
  end
  result.push(*left[leftIndex..])
  result.push(*right[rightIndex..])
  return result
end

5.2. Efficiency#

Is there a worst case and best case? Well no. We can see that merge sort always does the sane thing: Split the sequence in halves.

5.2.1. Memory#

How much memory does merge sort consumes? Let’s break it down. When the given sequence contains only one items, it is already sorted and there is nothing to do, so no need for any memory. However, when the given sequence contains multiple items (say \(n\) for instance), we first split it into two halves: That requires \(2 \cdot \frac{n}{2} = n\), we sort these two halves, and finally, we merge these results into another sequence of, again, \(n\) items. Note that as we sort these two halves one after the other, the memory used to sort the first one is released before we sort the second one, so the memory is possibly reused.

Again—as for many recursive algorithms—we will formulate that using a recurrence relationship \(m(n)\) as follows:

\[\begin{split}m(n) = \begin{cases} 0 & \textrm{if } n = 1 \\ 2n + m(\frac{n}{2}) & \textrm{otherwise} \end{cases}\end{split}\]

Which we can simplify as follows:

\[\begin{split}m(n) &= 2n + m(\frac{n}{2}) \\ &= 2n + \left[2\frac{n}{2} + m(\frac{n}{4}) \right] \\ &= 2n + n + \left[ 2\frac{n}{4} + m(\frac{n}{8}) \right] \\ &= 2n + n + \frac{n}{2} + \left[ 2\frac{n}{8} + m(\frac{n}{16}) \right] \\ &= 2n + n + \frac{n}{2} + \frac{n}{4} + \frac{n}{8} + ... + m(1) \\ &= 2n + \sum_{i=1}^{\log_2 n} \frac{n}{2i} \\ &= 2n + n-1 \\ &= 3n-1\end{split}\]

Important

We see that merge sort consumes an amount of memory that is proportional to the length of the given sequence.

5.2.2. Runtime#

Let’s now turn to the time it takes for merge sort to process a sequence of length \(n\). Recall that there are three steps:

  1. Split the sequence in two halves

  2. Sort these two halves, using merge sort

  3. Merge these two sorted sequences

The overall time spent by merge sort is the sum of the time spent in each of these steps. Let’s look at each of these:

How much time do we spend splitting?

If we assume we allocate two new smaller sequences for the left and the right parts, then we will spent time copying items in these. Overall we will copy the whole \(n\) items.

How much time do we spend sorting the two halves?

This is our recursive case. The time we spend is the time we spent sorting a sequence twice as small. But we do it twice, once for the right part and one for the left part, that is \(2 \cdot t(\frac{2}{n})\)

How much time do we spend merging?

As for spliting, merging is first and foremost about copying the items from both sides into a new sequences. If left and right contains \(n_L\) and \(n_R\), respectively, we shall copy all of them regardless of which comes first. In total, we will spend \(n_L + n_R\).

On top of that, remember that when the sequence contains only one item, we return the sequence itself, so we do not spend any extra time.

We can formulate that using the following recurrence relationship \(t(n)\):

\[\begin{split}t(n) = \begin{cases} 0 & \textrm{if } n = 1 \\ 2n + 2\cdot(\frac{2}{n}) & \textrm{otherwise} \end{cases}\end{split}\]

This recurrence is a specific form, where the master theorem applies. That shortcut the calculation:

Important

The master theorem simplifies many calculations about recursive algorithms. It goes as follows:

If a recurrence \(T(s)\) has the following form, where \(a\), \(b\) and \(c\) are constants:

\[\begin{split}T(s) = \begin{cases} a \cdot T(\frac{n}{b}) + f(n) & \textrm{if } n \gt 1 \\ c & \textrm{if } n = 1 \end{cases}\end{split}\]

Then, one can conclude that:

\[\begin{split}T(s) \in \begin{cases} \Theta(n^{\log_b a}) & \textrm{if } a > b \\ \Theta(n \log n) & \textrm{if } a = b \\ \Theta(n) & \textrm{if } a \lt b \end{cases}\end{split}\]

In this case, we can see that \(a=b=2\), \(c=0\) and \(f(n) = 2n\). Since \(a=b\), the master theorem tells us that \(t(s) \in O(n \log n)\).