David Kariuki

Computing Rankings in MySQL

Say we wanted to calculate individual subject rankings and overall rankings for a group of students, given their Math, English and Biology scores. We can accomplish this with a single query if we leverage MySQL’s user-defined variables and derived tables. Derived tables are basically subqueries inside a from clause of a SQL statement. User-defined variables allow you to store a value in one statement and access it later in another statement, which makes them perfect for our use case. Let’s walk through an example.

K-means Clustering With Ruby

Let’s say we have a collection of businesses in three distinct regions: the East coast, West coast and fly-over country, and we would like to classify these businesses by region. Clustering algorithms, which aim to partition a set of data points into a small number of clusters, are some of the most commonly used solutions for data classification problems.

The k-means algorithm is what we will use to cluster our data. It is an unsupervised machine learning algorithm that groups data points into k clusters, minimizing the distance from each data point to the centroid of the cluster it belongs to.

The Algorithm

  1. Choose k points to represent the initial centroids for the data. These may be assigned randomly or specified by the user.
  2. Assign each data point to the closest centroid (by euclidean distance or other distance metric), creating k clusters.
  3. Recalculate each centroid’s position such that each data point in the centroid’s cluster is equidistant to the centroid.
  4. Repeat steps 2 and 3 until convergence (no data points change cluster) or up to a maximum number of iterations.

Playing With Ruby Hashes

Ruby’s Hash#new method accepts an optional parameter which is returned when you try to access a key that does not exist in the hash (the default is nil). For example:

h = Hash.new('bar') #=> {}
puts h[:foo]        #=> 'bar'

While the first example may not very useful, Hash#new also accepts a block that will be called with the hash object and the key whenever you try to access a key that does not exist in the hash, for example:

Quicksort (In-place) With Ruby

Quicksort is a popular sorting algorithm with O(nlogn) best and average-case performance, and O(n2) worst case performance when sorting n elements.

The Algorithm

  1. If the list only has a single element return it.
  2. Pick a pivot.
  3. Move elements around until bigger elements are to the right and smaller elements are to the left of the pivot.
  4. Recursively apply the above steps to the left and right groups of elements.

The Median of Medians Algorithm With Ruby

I stumbled across the median of medians algorithm when I was looking for an efficient way to calculate the nearest 20 cities to any city, given their latitude and longitude.

It is a selection algorithm that has a worst-case O(n) complexity for selecting the kth order statistic (kth smallest number) in an unsorted array with length n.

Say you have an unsorted array and you would like to pick the fourth smallest element. One approach would be to just sort the array and pick the fourth element. However, the best case performance for this approach is O(nlogn).