Wishful Coding

Didn't you ever wish your
computer understood you?

Pygments cross-referencing

I recently discovered the convenience of using ctags in VIM, to quickly check the definition of a function and go back to reading or writing.

While talking to some friends at Hacker School, we thought it would be super convenient to be abple to do the exact same thing while reading code on Github or other websites.

A few days and 45 LOC later, it became a reality in Pygments. It’s like LXR you can actually use, and on nearly any language.

The patch is not yet accepted, so you’ll have to use my fork for now.

You can see the cross-referenced source of the patch itself here.

Pygments just handles files line-by-line, so it has no understanding itself of multiple files.

Rather, you generate the ctags file yourself(use the -n parameter!), tell pygments what the links should look like, and loop over your files with a few lines of bash.

for f in $@
do
mkdir -p `dirname "output/$f.html"`
~/hg/pygments/pygmentize -f html -O anchorlinenos,linenos,full,urlformat=/{path}/{fname}{fext}.html,lineanchors=L,tagsfile=tags -o "output/$f.html" "$f"
done
Published on

C persistent hash map with Python bindings

After using Redis for a bit, and wanting to learn C, I set out to write a database that uses persistent collections and allows arbitrary keys.

It turns out you need to reinvent a lot of wheels to write a big application in C. By the time I realized that, I had reinvented object orientated programming, reference counting and a peristent hash map.

I’m not sure the database will ever see the light of day, but my persistent hash map works well enough that I made it into a C library with a Python extension.

The hash map uses the same algorithm as Clojure. A good explaination of how it works can be found here.

I implement the different node types as what are best described as polymorphic structs. Structs that share the same header as their superclass, so they can be casted back and forth. Example:

struct foo {
	int head;
};

struct bar {
	int head;
	int tail;
};

struct bar *test = malloc(sizeof(struct bar));
test->head = 3;
test->tail = 6;
((foo*)test)->head; // 3

Now I define different node types, that share a header of function pointers to insert, delete and find functions.

Currently, putting a node in a node as a key or value would segfault, but this is only because the function pointers for equality and hashing point into empty space(or kernel space even).

After I had the C code working, and played around with it for a bit, I realized I needed an intelligent way to free memory, so refcounting was reinvented.

Basically every polymorphic struct has a refcount and a free function pointer. I simply define 2 functions to increment and decrement it, the latter calling obj->free if it becomes zero.

The next important discoveries where that C is verbose and Python does not have persistent hash maps. This led me to develop a Python extenson to cure both problems.

First I looked at ctypes, which would be the easiest and most portable. Unfortunately, my datastructure needs to store Python objects, which can’t be done in ctypes.

Next up was a real C extension. This is not too hard for functions, but defining a class is verbose and hard. That is when the nice folks in #python suggested Cython.

I ended up writing an PyObject wrapper in C, to allow Python objects to be nodes in the trie, but wrote the Python interface in Cython. Except for some casting issues, it’s pretty sweet.

I encourage you to check out the Github page, it has silly benchmarks. Clojure is still faster than my C code, but my C code is a lot faster than other Python code.

NXT Forth compiler in Clojure

My first project at Hacker School was to write a compiler for Forth for the LEGO NXT. There are a few real programming languages for the NXT, but before I found Mirah, I never really enjoyed them.

Initially, I wanted to write a Lisp, but the NXT bytecode is so static that even the concept of a cons cell is not viable.

Forth is a very simple stack based language that uses just that, a stack and subroutines(called words). Surpisingly, the only dynamic feature in the NXT are resizable arrays. I think it’s a good fit.

This is the complete stack implementation in NBC.

#define push(stack, val) \
  replace stack.data stack.data stack.offset val \
  add stack.offset stack.offset 1

#define pop(stack, val) \
  sub stack.offset stack.offset 1 \
  index val stack.data stack.offset

My implementation follows the Boostrapping a Forth in 40 lines of Lua code approach of defining a Forth word that evals the host language.

The only difference is that this Forth is compiled, so there is a word for Clojure and one for NBC.

The main Clojure file just defines an atom to store Forth words, and defines the clj word, which evaluates a Clojure expression.

As soon as that is in place, you’re in Forth land.

First thing we do in Forth is some more Clojure defining “:*” to mean “define clojure word”, and then using “:*” to define 3 more words.

  • ”:” start of a Forth word
  • ”;” end of a Forth word
  • “nbc” write a line of assembly

What follows are a few forth words defined in assembly. At this point, the following works

: square
  dup *;

2 square dot

Get the code