Wishful Coding

Didn't you ever wish your
computer understood you?

C persistent hash map with Python bindings

After using Redis for a bit, and wanting to learn C, I set out to write a database that uses persistent collections and allows arbitrary keys.

It turns out you need to reinvent a lot of wheels to write a big application in C. By the time I realized that, I had reinvented object orientated programming, reference counting and a peristent hash map.

I’m not sure the database will ever see the light of day, but my persistent hash map works well enough that I made it into a C library with a Python extension.

The hash map uses the same algorithm as Clojure. A good explaination of how it works can be found here.

I implement the different node types as what are best described as polymorphic structs. Structs that share the same header as their superclass, so they can be casted back and forth. Example:

struct foo {
	int head;
};

struct bar {
	int head;
	int tail;
};

struct bar *test = malloc(sizeof(struct bar));
test->head = 3;
test->tail = 6;
((foo*)test)->head; // 3

Now I define different node types, that share a header of function pointers to insert, delete and find functions.

Currently, putting a node in a node as a key or value would segfault, but this is only because the function pointers for equality and hashing point into empty space(or kernel space even).

After I had the C code working, and played around with it for a bit, I realized I needed an intelligent way to free memory, so refcounting was reinvented.

Basically every polymorphic struct has a refcount and a free function pointer. I simply define 2 functions to increment and decrement it, the latter calling obj->free if it becomes zero.

The next important discoveries where that C is verbose and Python does not have persistent hash maps. This led me to develop a Python extenson to cure both problems.

First I looked at ctypes, which would be the easiest and most portable. Unfortunately, my datastructure needs to store Python objects, which can’t be done in ctypes.

Next up was a real C extension. This is not too hard for functions, but defining a class is verbose and hard. That is when the nice folks in #python suggested Cython.

I ended up writing an PyObject wrapper in C, to allow Python objects to be nodes in the trie, but wrote the Python interface in Cython. Except for some casting issues, it’s pretty sweet.

I encourage you to check out the Github page, it has silly benchmarks. Clojure is still faster than my C code, but my C code is a lot faster than other Python code.

NXT Forth compiler in Clojure

My first project at Hacker School was to write a compiler for Forth for the LEGO NXT. There are a few real programming languages for the NXT, but before I found Mirah, I never really enjoyed them.

Initially, I wanted to write a Lisp, but the NXT bytecode is so static that even the concept of a cons cell is not viable.

Forth is a very simple stack based language that uses just that, a stack and subroutines(called words). Surpisingly, the only dynamic feature in the NXT are resizable arrays. I think it’s a good fit.

This is the complete stack implementation in NBC.

#define push(stack, val) \
  replace stack.data stack.data stack.offset val \
  add stack.offset stack.offset 1

#define pop(stack, val) \
  sub stack.offset stack.offset 1 \
  index val stack.data stack.offset

My implementation follows the Boostrapping a Forth in 40 lines of Lua code approach of defining a Forth word that evals the host language.

The only difference is that this Forth is compiled, so there is a word for Clojure and one for NBC.

The main Clojure file just defines an atom to store Forth words, and defines the clj word, which evaluates a Clojure expression.

As soon as that is in place, you’re in Forth land.

First thing we do in Forth is some more Clojure defining “:*” to mean “define clojure word”, and then using “:*” to define 3 more words.

  • ”:” start of a Forth word
  • ”;” end of a Forth word
  • “nbc” write a line of assembly

What follows are a few forth words defined in assembly. At this point, the following works

: square
  dup *;

2 square dot

Get the code

Chord Progression Suggester

I did this project together with Sarah, the goal is to suggest good chord progressions to musicians.

GUI screenshot

We both know some music theory, so we started out by reading about what makes a good chord progression. The longer you look at it, the more complicated it gets. I quickly concluded that music is art, not science, so you can do whatever you want.

Instead, we wrote a scraper for guitar tabs. Just download a ton of them and look for anything that looks roughly like

([A-Ga-g][b#]?)(m)?((?:maj)?[0-9])?(sus[0-9]|add[0-9])?(/[A-Ga-g][b#]?)

We then try to figure out which key the original song was in, and convert the chords to intervals, called universal notation.

Now that we have these sequences of legit chord progressions, we need to persist them in a smart way.

We went through markov chains, tries and some other crazyness, but fnally settled on just a flat list. It turns out just scanning the whole list for the given pattern takes an acceptable amount of time.

The final step was to write a gui that could suggest chords for you. It’s written in Tkinter and uses Mingus for some playing and parsing.

Get my new pop hit

Get the source