August 2012

My job recently had a code golf challenge: the shortest code wins. The task was to write k-means: take a set of data and reduce it to k points. There are many algorithms to do this, but the specific algorithm for this challenge was to, until you had few enough data points, find the two closest points and merge them. First, my winning 227-character entry:

(def d(p q)(apply +(map(fn(m n)(expt(- m n)2))p q)))(def m(k s)(if(len> s k)(m k(let(r t)(best(fn(p q)(<(d p.0 p.1)(d q.0 q.1)))(apply join(map[map(fn(l)`(,_,l))(cdr(mem _ s))]s)))(cons(map(fn e avg.e)r t)(rem r(rem t s)))))s))

Ok, that's unreadable. I get it. The code, as I started with it:

(def distance (p1 p2)
     (apply +
            (map (fn (d1 d2)
                     (expt (- d1 d2)
                           2))
                 p1
                 p2)))


(def k-means (k points)
     (if (len> points
               k)
         (k-means k
                  (collapse-nearest points))
       points))

(def collapse-nearest (points)
     (let (rem1 rem2) (closest points)
          (cons (map (fn dimensions
                         avg.dimensions)
                     rem1
                     rem2)
                (rem rem1
                     (rem rem2
                          points)))))

(def closest (points)
     (best (fn (p1 p2)
               (< (distance p1.0 p1.1)
                  (distance p2.0 p2.1)))
           (cartesian points)))

(def cartesian (elements)
     (apply join
            (map [map (fn (ele) (cons _ ele))
                      (cdr (mem _
                                elements))]
                 elements)))

To get from the readable code to the golfed code, I did some standard minification techniques: making all names one-character, inlining functions, and removing extra whitespace. Nothing too fancy.

The good:

The bad:

notes: