David Kariuki

The Median of Medians Algorithm With Ruby

I stumbled across the median of medians algorithm when I was looking for an efficient way to calculate the nearest 20 cities to any city, given their latitude and longitude.

It is a selection algorithm that has a worst-case O(n) complexity for selecting the kth order statistic (kth smallest number) in an unsorted array with length n.

Say you have an unsorted array and you would like to pick the fourth smallest element. One approach would be to just sort the array and pick the fourth element. However, the best case performance for this approach is O(nlogn).

The algorithm

  1. If n is negligible, e.g. n <= 5, sort the array and return the kth element. (The complexity of sorting small arrays is linear).
  2. Partition the array into sub-arrays of length 5.
  3. Sort each sub-array and select the median of each.
  4. Recursively find the median of the medians you found in 3.
  5. Loop through all n-1 elements, comparing them with the median of medians m and create two arrays left and right, where left contains all elements < m and right contains all elements > m
  6. From 5, we can infer that the position of m, p = |L| + 1, where |L| is the length of left. In other words, m is the (|L| + 1)th smallest element.
  7. If k == p, return m.
  8. If k < p, return the kth smallest element of left.
  9. If k > p, then return the (k – p)th smallest element of right.

An example

Given an array of length n = 10: [10, 6, 8, 3, 7, 1, 2, 4, 9, 5], we would like to find the kth smallest element, where k = 3.

  • Split the array into two sub-arrays of size 5: [10, 6, 8, 3, 7] and [1, 2, 4, 9, 5].
  • Sort each sub-array and select the medians: [3, 6, 7, 8, 10] and [1, 2, 4, 5, 9], medians 7 and 4.
  • Pick the median of the two medians m = 4.
  • Loop over the initial array splitting it into two arrays:
    left = [3, 1, 2] and right = [10, 6, 8, 7, 9, 5].
  • The position of m, p = left.length + 1
  • Since k < p we recursively find the kth smallest element of left.
  • Since left.size <= 5, we follow step 1: sort the array and return the kth element, 3.

The code

Since I tagged this post with ‘TDD’ We’ll need to write a few test cases first. My favorite testing framework for ruby is currently rspec with guard.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
it 'returns the 3rd smallest number' do
  expect(subject.new(array1).kth_smallest(3)).to eq(3)
end

it 'returns the 2nd smallest number' do
  expect(subject.new(array1).kth_smallest(2)).to eq(2)
end

it 'returns the 5th smallest number' do
  expect(subject.new(array1).kth_smallest(5)).to eq(5)
end

it 'computes the same result as a number obtained by sorting' do
  k = 53
  array2_sorted = array2.sort
  expect(subject.new(array2).kth_smallest(k)).to eq(array2_sorted[k-1])
end

Then we fix the breaking specs by writing the lib:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class KthSmallest
  attr_reader :array

  def initialize(array)
    @array = array
  end

  def kth_smallest(array = array, k)
    return select(array, k) if (array.length <= 5)
    medians = []
    left,right = [],[]
    array.each_slice(5) do |array_5|
      medians << select(array_5, array_5.length/2)
    end
    m = kth_smallest(medians, medians.length/2)
    array.each { |n| n < m ? left << n : (n > m ? right << n : nil) }
    m_posn = left.length + 1
    if k == m_posn
      m
    elsif k < m_posn
      kth_smallest(left, k)
    else
      kth_smallest(right, k - m_posn)
    end
  end

  private

  def select(array, k)
    array.sort[k - 1]
  end
end

You can find the complete code on github

Comments