Wishful Coding

Didn't you ever wish your
computer understood you?

How much Clojure do you know?

How much Clojure do you know?

user=> (count (ns-publics 'clojure.core))
546

That are a lot of functions. Unless you wrote half of them I wager you don’t know them all. Don’t worry, neither do I. But it doesn’t hurt to learn some more. That is why I wrote a Clojure quiz.

Simply do the standard stuff(make sure you have git and lein or cake)

git clone git://github.com/pepijndevos/clojure-quiz.git
cake deps
cake repl
(use 'clojure-quiz.core)

Now you can play three types of quizzes.

  • (doc-quiz) Find the correct doc string for the given function.
  • (name-quiz-easy) Find the correct function for the given doc.
  • (name-quiz) Same as above, but without multiple choice.

(name-quiz) is by far the hardest, and I generally score around 70% while I can usually get 100% on the others.

You can tweak the quiz by using binding to set the namespace to use and the number of options to give:

(binding [target 'clojure.set number 5] (doc-quiz))

Have fun!

Published on

Crowd sourced news with Clojure

This snippet gets tweets, filters their links, sorts, resolves and counts them in about 30 lines Clojure, showing off the power of its Java interoperability and concurrent data structures.

(ns news
  (:refer-clojure :exclude [resolve])
  (:use clojure.contrib.json)
  (:import [java.io BufferedReader InputStreamReader]
           [java.net URL]))

(def cred "") ; username:password

(def json (let [con (.openConnection (URL. "http://stream.twitter.com/1/statuses/sample.json"))]
            (.setRequestProperty con "Authorization" (str "Basic "
                                                          (.encode (sun.misc.BASE64Encoder.) (.getBytes cred))))
            (BufferedReader. (InputStreamReader. (.getInputStream con)))))

(def urls (agent (list)))

(def resolve (memoize (fn [url]
  (try
    (let [con (doto (.openConnection (URL. url))
                (.setInstanceFollowRedirects false)
                (.connect))
          loc (.getHeaderField con "Location")]
      (.close (.getInputStream con))
      (if loc loc url))
    (catch Exception _ url)))))

(defn top [urls]
  (reduce #(if (> (val %1) (val %2))
             %1 %2)
          (frequencies @urls)))

(future (doseq [tweet (repeatedly #(read-json json))
                url (:urls (:entities tweet))]
          (send-off urls #(conj % (resolve (:url url))))))

And this is just the start. I want to see if I can use Aleph and ClojureQL to kickstart my own little news service with it.

Clojure versus Python

or

Mian versus Clomian

or

How Clojure sat in a corner converting and boxing while Python did the work

or

A plot about Minecraft(pun intended)

Update: The Clojure version is now a lot faster. Thanks to the people of the Clojure mailing list. I also uploaded the map I used if you want to compare the results.

Okay, back to business. During the writing of the original Python hack I had to do a few tricks I thought would be easy to do in Clojure. So I started to wonder how the Clojure code would look and how fast it’d be.

plot

My original hack was kind of slow, but it’s greatly improved and now renders a whole map in under 10 seconds.

  • 4s for reading all files
  • 3s for calculating the graph
  • 8s total

The code to read all the files:

paths = glob(join(world_dir, '*/*/*.dat'))

raw_blocks = ''
for path in paths:
    nbtfile = NBTFile(path, 'rb')

    raw_blocks += nbtfile['Level']['Blocks'].value

The code to calculate the graph:

layers = [raw_blocks[i::128] for i in xrange(127)]

counts = [[] for i in xrange(len(bt_hexes))]
for bt_index in range(len(bt_hexes)):
    bt_hex = bt_hexes[bt_index]
    for layer in layers:
        counts[bt_index].append(layer.count(bt_hex))

Nice eh? Now the Clojure version. Clojure doesn’t have a nice blob module, so I’ll spare you the code that gives me the data. Sufficient to say is that it also runs in about 4 seconds.

My initial version for the calculating was short and sweet and looked like this:

(defn freqs [blocks]
  (->> blocks
    (partition 128)
    (apply map vector)
    (pmap frequencies)))

Now, this is twice as fast as what I currently have, but it has a problem. While Python operates on bytes the whole time, these lines of Clojure operate on a sequence of objects. These objects are just a tad bigger than the bytes in a string, so keeping 99844096 of those in memory is impossible.

So, either I had to find a way to make Clojure throw away all the objects it had already processed, or I had to make it use a more compact storage for them. I tried both, and ended up with a function to concatenate Java arrays, but working with them is a real pain, so I made my function use them wrapped in Clojure goodness and made sure the Java GC threw them out as soon as I was done.

(defn freqs [blocks]
  (->> blocks
    (partition 128)
    (reduce (fn [counts col]
              (doall (map #(assoc! %1 %2 (inc (get %1 %2 0))) counts col)))
            (repeatedly 128 #(transient {})))
       (map persistent!)))

This is not threaded like to previous example, but it works. Everything I tried to make it use all my cores either started to eat more and more memory, or was slower then the single-treaded one. Most of them where both.

So, how fast is it?

  • 5s file reading
  • Over a minute of processing
  • Over a minute + 5s total

Wait, what? Python did this in 3 seconds, right? Yea… So even if I had used the faster function and had 10GB of RAM it’d be 10 times slower.

Why? I don’t know. All I can come up with is that that Python just acts on a string, while Clojure does boxing and converting 99844096 times. If you happen to know what’s wrong, or how to make it faster, be sure to tell me!