Wishful Coding

Didn't you ever wish your
computer understood you?

The Multiple Index Problem

I would like to name this problem in functional programming. There has been some talk aout how Clojure solves the Expression Problem, but th Multiple Index Problem has deserved so little attention it doesn’t even have a name.

A simple example of multiple indices can be seen in relational databases, where you can querry a table like SELECT * FROM foo WHERE id=1 but you can add an index to the table and querry on other fields like SELECT * FROM foo WHERE name="Pepijn".

If you would translate this to hash maps, in imperative languages, you could do the following

>>> val1 = []
>>> val2 = []
>>> index1 = {1:val1, 5:val2}
>>> index2 = {"pepijn":val1, "ben":val2}
>>> index2["pepijn"].append("foo")
>>> index1[1]
['foo']

If you want to do this in a functional language, you’re out of luck. In the most basic case, you’d have to add some indirection.

user=> (def index [[], []])
#'user/index
user=> (def index1 {1 0, 5 1})
#'user/index1
user=> (def index2 {"pepijn" 0, "ben" 1})
#'user/index2
user=> (def new-index (update-in index [(get index2 "pepijn")] conj "foo"))
#'user/new-index
user=> (get new-index (get index1 1))
["foo"]

Next up, I’ll show 2 cases where this is a real pain, rather than just anoying.

Dijkstras Algorithm

Dijkstras algorithm is used to find the shortest path in a graph. It is fairly easy to implement using a vector, but also very slow.

Every itteration you need to find the closest unexplored node. With he flat list, you need to walk them all.

The way to solve this is to use a Fibonacci Heap, which has constant time and logarithmic time opperations for getting and decrementing the lowest key.

However, it has no methods for lookingg up other keys, other than… walking the tree? The way this is done in imparative languages is to just have pointers to the nodes, but in a functional language, youre stuck walking the tree.

This defeats the whole purpose of using a Fibonacci Heap. Is it at all possible to implement Dijkstra in a pure functional way in an optiomal time complexity?

I ended up using a mutable heap and just not decrementing keys. Yuk!

You can see the code me and Mary wrote at Hacker School at her github(fibbonaccy and flat array), and mine(Using PriorityBlockingQueue)

Collision Detection

I struggled with the same problem in my Clojure game engine, before I realised this was a repeating pattern.

So I started out modeling my world as a list of objects, but quickly found that the bat needed to know where the ball was, so I turned the world into a hash map of IDs to objects, and all was good.

Until I needed to do collision detection on more than a dozen objects. If you just compare all objects to al others to see if they collide, you’re doing O(n2), which is no good.

Lukily there are datastructures that do spatial indexing. Indexing… see my problem?

I wrote a Quadtree in Clojure, but as soon as I stuck all my objects in the tree, there was no way to look them up by ID.

I think I ended up doing some black magic to sorted maps to have lookup by ID and okayish collision detection.

That is all I have to say about the problem. I don’t have a soltuion or a satisfying workaround. You can try adding intermediate indices, or introducing mutability, but it’s painfull.

Pygments cross-referencing

I recently discovered the convenience of using ctags in VIM, to quickly check the definition of a function and go back to reading or writing.

While talking to some friends at Hacker School, we thought it would be super convenient to be abple to do the exact same thing while reading code on Github or other websites.

A few days and 45 LOC later, it became a reality in Pygments. It’s like LXR you can actually use, and on nearly any language.

The patch is not yet accepted, so you’ll have to use my fork for now.

You can see the cross-referenced source of the patch itself here.

Pygments just handles files line-by-line, so it has no understanding itself of multiple files.

Rather, you generate the ctags file yourself(use the -n parameter!), tell pygments what the links should look like, and loop over your files with a few lines of bash.

for f in $@
do
mkdir -p `dirname "output/$f.html"`
~/hg/pygments/pygmentize -f html -O anchorlinenos,linenos,full,urlformat=/{path}/{fname}{fext}.html,lineanchors=L,tagsfile=tags -o "output/$f.html" "$f"
done
Published on

C persistent hash map with Python bindings

After using Redis for a bit, and wanting to learn C, I set out to write a database that uses persistent collections and allows arbitrary keys.

It turns out you need to reinvent a lot of wheels to write a big application in C. By the time I realized that, I had reinvented object orientated programming, reference counting and a peristent hash map.

I’m not sure the database will ever see the light of day, but my persistent hash map works well enough that I made it into a C library with a Python extension.

The hash map uses the same algorithm as Clojure. A good explaination of how it works can be found here.

I implement the different node types as what are best described as polymorphic structs. Structs that share the same header as their superclass, so they can be casted back and forth. Example:

struct foo {
	int head;
};

struct bar {
	int head;
	int tail;
};

struct bar *test = malloc(sizeof(struct bar));
test->head = 3;
test->tail = 6;
((foo*)test)->head; // 3

Now I define different node types, that share a header of function pointers to insert, delete and find functions.

Currently, putting a node in a node as a key or value would segfault, but this is only because the function pointers for equality and hashing point into empty space(or kernel space even).

After I had the C code working, and played around with it for a bit, I realized I needed an intelligent way to free memory, so refcounting was reinvented.

Basically every polymorphic struct has a refcount and a free function pointer. I simply define 2 functions to increment and decrement it, the latter calling obj->free if it becomes zero.

The next important discoveries where that C is verbose and Python does not have persistent hash maps. This led me to develop a Python extenson to cure both problems.

First I looked at ctypes, which would be the easiest and most portable. Unfortunately, my datastructure needs to store Python objects, which can’t be done in ctypes.

Next up was a real C extension. This is not too hard for functions, but defining a class is verbose and hard. That is when the nice folks in #python suggested Cython.

I ended up writing an PyObject wrapper in C, to allow Python objects to be nodes in the trie, but wrote the Python interface in Cython. Except for some casting issues, it’s pretty sweet.

I encourage you to check out the Github page, it has silly benchmarks. Clojure is still faster than my C code, but my C code is a lot faster than other Python code.