It all started when, one day, I was thinking about the little I knew about sorting. What I knew was that sorting was usually done by comparing all items and switching them around, hitting the magical O(nlogn) barrier.
Thinking about how I would sort a deck of cards, sorting seemed more like a O(n) operation. With that in mind, I started implementing what turned out to be called Pigeonhole Sort. I also learned that sorting algorithms can be classified as being a Comparison sort or not.
(ns pigeonhole-sort)
(defn int-sort [s]
(let [listmap (reduce #(update-in
(update-in %1 [%2] (fnil inc 0))
[:max] max %2) {:max 0} s)]
(mapcat #(repeat (get listmap % 0) %)
(range (inc (:max listmap))))))
With that out of the way, I started to compare my sort to the sort used by Java and Clojure, which turned out to be using Timsort, originating in Python. In my quest to understand Timsort - and why it was faster than my Pigeonhole Sort - I found this great website that graphs how different sort algorithms work.
Simply put, I was intrigued by the pattern displayed by Heap Sort. So next I found myself reading the Wikipedia pages for Binary Heaps, and shortly after that, I had started implementing that in Clojure.
This is, ladies and gentleman, a persistent heap, in all its heapyness, with a heapsort implemented on top of it.
(ns persistent-heap)
(defn swap [heap idx idy]
(assoc heap idx (get heap idy) idy (get heap idx)))
(defn children [idx]
(let [idx (inc (* idx 2))
idy (inc idx)]
[idx idy]))
(defn parent [idx]
(if (not= 0 idx)
(/ (- idx (if (odd? idx) 1 2)) 2)
nil))
(defn tree
([heap] (tree heap 0))
([heap idx]
(let [[left right] (children idx)
node (get heap idx nil)]
(when node
[node (tree heap left) (tree heap right)]))))
(defn heap-up
([heap] (heap-up heap >= (dec (count heap))))
([heap compfn] (heap-up heap compfn (dec (count heap))))
([heap compfn idx]
(if-let [par (parent idx)]
(if (compfn (get heap idx) (get heap par))
(recur (swap heap idx par) compfn par)
heap)
heap)))
(defn heap-down
([heap] (heap-down (pop (swap heap 0 (dec (count heap)))) >= 0))
([heap compfn] (heap-down (pop (swap heap 0 (dec (count heap)))) compfn 0))
([heap compfn idx]
(let [[left right] (children idx)
leftv (get heap left nil)
rightv (get heap right nil)
node (get heap idx nil)]
(if (and node leftv rightv)
(cond
(compfn leftv (max rightv node))
(recur (swap heap idx left) compfn left)
(compfn rightv (max leftv node))
(recur (swap heap idx right) compfn right)
:else heap)
heap))))
(deftype PersistentHeap [heap]
clojure.lang.ISeq
(first [this] (first heap))
(next [this] (PersistentHeap. (heap-down heap)))
(more [this] (.next this))
(cons [this obj] (PersistentHeap. (heap-up (conj heap obj))))
(seq [this] (seq heap)))
(defn persistent-heap [coll]
(into (PersistentHeap. []) coll))
(defn heapsort [coll]
(->> (persistent-heap coll)
(iterate rest)
(take (count coll))
(map first)))
Some 'benchmark' results:
user=> (time (dorun (heapsort (shuffle (range 1e5)))))
"Elapsed time: 3148.041 msecs"
nil
user=> (time (dorun (int-sort (shuffle (range 1e5)))))
"Elapsed time: 477.781 msecs"
nil
user=> (time (dorun (sort (shuffle (range 1e5)))))
"Elapsed time: 105.354 msecs"
Note that heapsort is lazy, insofar that it does realize but does not sort the sequence untill requested. There is also a bug that allows for the last 2 items to be out of order.