The median of medians algorithm with ruby

Wed Apr 2014

I stumbled across the median of medians algorithm when I was looking for an efficient way to calculate the nearest 20 cities to any city, given their latitude and longitude.

It is a selection algorithm that has a worst-case O(n) complexity for selecting the kth order statistic (kth smallest number) in an unsorted array with length n.

Say you have an unsorted array and you would like to pick the fourth smallest element. One approach would be to just sort the array and pick the fourth element. However, the best case performance for this approach is O(nlogn).

The algorithm

  1. If n is negligible, e.g. n <= 5, sort the array and return the kth element. (The complexity of sorting small arrays is linear).
  2. Partition the array into sub-arrays of length 5.
  3. Sort each sub-array and select the median of each.
  4. Recursively find the median of the medians you found in 3.
  5. Loop through all n-1 elements, comparing them with the median of medians m and create two arrays left and right, where left contains all elements < m and right contains all elements > m
  6. From 5, we can infer that the position of m, p = |L| + 1, where |L| is the length of left

. In other words, m is the (|L| + 1) th smallest element. 7. If k == p, return m. 8. If k < p, return the kth smallest element of left. 9. If k > p, then return the (k - p)th smallest element of right.

An example

Given an array of length n = 10: [10, 6, 8, 3, 7, 1, 2, 4, 9, 5], we would like to find the kth smallest element, where k = 3.

The code

Since I tagged this post with 'TDD' We'll need to write a few test cases first. My favorite testing framework for ruby is currently rspec with guard.

it 'returns the 3rd smallest number' do
  expect(subject.new(array1).kth_smallest(3)).to eq(3)
end
 
it 'returns the 2nd smallest number' do
  expect(subject.new(array1).kth_smallest(2)).to eq(2)
end
 
it 'returns the 5th smallest number' do
  expect(subject.new(array1).kth_smallest(5)).to eq(5)
end
 
it 'computes the same result as a number obtained by sorting' do
  k = 53
  array2_sorted = array2.sort
  expect(subject.new(array2).kth_smallest(k)).to eq(array2_sorted[k-1])
end
it 'returns the 3rd smallest number' do
  expect(subject.new(array1).kth_smallest(3)).to eq(3)
end
 
it 'returns the 2nd smallest number' do
  expect(subject.new(array1).kth_smallest(2)).to eq(2)
end
 
it 'returns the 5th smallest number' do
  expect(subject.new(array1).kth_smallest(5)).to eq(5)
end
 
it 'computes the same result as a number obtained by sorting' do
  k = 53
  array2_sorted = array2.sort
  expect(subject.new(array2).kth_smallest(k)).to eq(array2_sorted[k-1])
end

Then we fix the breaking specs by writing the lib:

class KthSmallest
  attr_reader :array
 
  def initialize(array)
    @array = array
  end
 
  def kth_smallest(array = array, k)
    return select(array, k) if (array.length <= 5)
    medians = []
    left,right = [],[]
    array.each_slice(5) do |array_5|
      medians << select(array_5, array_5.length/2)
    end
    m = kth_smallest(medians, medians.length/2)
    array.each { |n| n < m ? left << n : (n > m ? right << n : nil) }
    m_posn = left.length + 1
    if k == m_posn
      m
    elsif k < m_posn
      kth_smallest(left, k)
    else
      kth_smallest(right, k - m_posn)
    end
  end
 
  private
 
  def select(array, k)
    array.sort[k - 1]
  end
end
class KthSmallest
  attr_reader :array
 
  def initialize(array)
    @array = array
  end
 
  def kth_smallest(array = array, k)
    return select(array, k) if (array.length <= 5)
    medians = []
    left,right = [],[]
    array.each_slice(5) do |array_5|
      medians << select(array_5, array_5.length/2)
    end
    m = kth_smallest(medians, medians.length/2)
    array.each { |n| n < m ? left << n : (n > m ? right << n : nil) }
    m_posn = left.length + 1
    if k == m_posn
      m
    elsif k < m_posn
      kth_smallest(left, k)
    else
      kth_smallest(right, k - m_posn)
    end
  end
 
  private
 
  def select(array, k)
    array.sort[k - 1]
  end
end

You can find the complete code on github