Wishful Coding

Didn't you ever wish your
computer understood you?

Redis Pipelining

I’d like to announce Pypredis, a Python client for Redis that tries to answer the question

How fast can I pump data into Redis?

There are many answers to that question, depending on what your goal and constraints are. The answer that Pypredis is exploring is pipelining and sharding. Let me explain.

The best use case is a few slow and independent commands. For example, a couple of big SINTER commands.

The naive way to do it using redis-py is to just execute the commands one after the other and wait for a reply.

r = redis.StrictRedis()
r.sinter('set1', 'set2')
r.sinter('set3’, 'set4’)
r.sinter('set5’, 'set6’)
r.sinter('set7’, 'set8’)

In addition to the CPU time, you add a lot of latency by waiting for the response every time, so a better solution would be to use a pipeline.

r = redis.StrictRedis()
pl = r.pipeline()
pl.sinter('set1', 'set2')
pl.sinter('set3’, 'set4’)
pl.sinter('set5’, 'set6’)
pl.sinter('set7’, 'set8’)
pl.execute()

That is pretty good, but we can do better in two ways.

First of all, redis-py does not start sending commands until you call execute, wasting valuable time while building up the pipeline. Especially if other work is done in-between Redis commands.

Secondly, Redis is — for better or worse — single-threaded. So while the above pipeline might use 100% CPU on one core, the remaining cores might not be doing very much.

To utilise a multicore machine, sharding might be employed. However, sequentially executing pipelines on multiple Redis servers using redis-py actually performs worse.

pl1.execute() #blocks
pl2.execute() #blocks

The approach that Pypredis takes is to return a Future and send the command in another thread using an event loop.

Thus, pipelining commands in parallel to multiple Redis servers is a matter of not waiting for the result.

eventloop.send_command(conn1, “SINTER”, "set1", "set2")
eventloop.send_command(conn2, “SINTER”, "set3”, "set4”)
eventloop.send_command(conn1, “SINTER”, "set5”, "set6”)
eventloop.send_command(conn2, “SINTER”, "set7”, "set8”)

A very simple benchmark shows that indeed Pypredis is a lot faster on a few big and slow commands, but the extra overhead makes it slower for many small and fast commands.

pypredis ping
1.083333
redis-py ping
0.933333
pypredis sunion
0.42
redis-py sunion
11.736665
Published on

Writing a web server

A colleague asked what would be an interesting exercise to learn more about Perl. I think a HTTP server is a good thing to build because it’s a small project that helps you understand web development a lot better.

This post serves as a broad outline of how a HTTP server works, and as a collection of resources to get started.

There is of course the HTTP specification itself. It’s good for looking up specific things, but otherwise not very easy reading.

HTTP is a relatively simple text-based protocol on top of TCP. It consists of a request and a response, both of which are made up of a status line, a number of headers, a blank line, and the request/response body.

What I recommend doing is playing with a simple working server to see what happens.

Lets create a file and start a simple server.

$ echo 'Hello, world!' > test
$ python -m SimpleHTTPServer

This will serve the current directory at port 8000. We can now use curl to request the file we created. Use the -v flag to see the HTTP request and response.

$ curl -v http://localhost:8000/test
> GET /test HTTP/1.1
> User-Agent: curl/7.30.0
> Host: localhost:8000
> Accept: */*
> 
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/2.7.6
< Date: Wed, 12 Mar 2014 17:51:26 GMT
< Content-type: application/octet-stream
< Content-Length: 14
< Last-Modified: Wed, 12 Mar 2014 17:51:06 GMT
< 
Hello, world!

Take a while to look up all the headers to see what each one does. Explain what happens to a friend, cat or plant.

Now you can in turn take the role of the client or server. Can you get Python to return you the file using netcat?

$ nc localhost 8000
<enter request here>

Now can you get curl to talk to you? Start listening with

$ nc -l 1234

Now in another terminal run

$ curl http://localhost:1234/test

You’ll see the request in the netcat window. Try writing a response. Remember to set Content-Length correctly.

Now it is time to actually write the server in the language of choice. Whichever one you use, it is probably loosely based on the Unix C API. To find out more about that, run

man socket

You’re looking for an PF_INET(IPv4) socket of the SOCK_STREAM(TCP) type. But other types exist.

Be sure to check out the SEE ALSO section for functions for working with the socket.

The basic flow for the web server is as folows.

  1. Create the socket.
  2. bind it to a port.
  3. Start to listen.
  4. accept an incoming connection. (will block)
  5. read the request.
  6. write the response.
  7. close the connection.
  8. Go back to accept.

Note that what you do after accept is subject to much debate. The simple case outlined above will handle only one request at a time. A few other options.

  • Start a new thread to handle the request.
  • Use a queue and a fixed pool of threads or processes to handle the requests. Apache does this.
  • Handle many requests asynchronously with select, epoll(linux) or kqueue(BSD). Node.js does this.

After you have a basic request/response working, there are many things you could explore.

  • Serve static files.
  • Add compression with gzip.
  • Support streaming requests and responses.
  • Run a CGI script.
  • Implement most of HTTP 1.0
  • Implement some HTTP 1.1 parts.
  • Look into pipelining and Keep-Alive.
  • Look into caching.
Published on

The end of Team Relay Chat

It’s time to end this experiment. It’s as easy as disabling the sign-up button, as there are no users to notify.

How it started

During my batch at Hacker School we used IRC to communicate. I still have the channel open in my IRC bouncer, but after my batch they started using some web application which I never really used.

At about the same time I worked for a company where they used Campfire internally, which I also never really used or liked.

So the idea was born to build a collaboration platform based on IRC. I started hacking and it worked. The rest was just an excercise in finishing and shipping.

Users!

After I posted my first working version to Hacker News I had maybe 5 trial users and a lot of people chatting on the demo server. I got some useful feedback.

This was all very exciting, but it didn’t last. I dropped of the front page, trial periods ended and things became quite. I’d have maybe a signup every week, but noone stayed.

I tried Google AdWords, but that stuff is hard.

Doing things that don’t scale

The sign-up process worked like this:

  1. An email arrived.
  2. I sent a reply to verify they where not bots.
  3. I rented a VPS.
  4. I ran my deployment script.
  5. I sent another email telling the user their server was ready.

This was actually very funny, because I could tailor the email to the user. They still thought I was a machine most of the time, but I also had some nice exchanges.

Some people also submitted multiple times because they did not get a reply within 5 minutes. After I changed “Sign up” to “Request invite” this went better.

Technical problems

The system I made worked pretty well, but deployment could have been better.

I used a Pallet script that would break every other deploy. The authors where very helpful in fixing all the problems, but I would nevertheless pick another system next time I need to automate deployments.

Every user had its own VPS. This was before Docker, so multi-tenanting and isolating IRC servers was hard. IRC doesn’t have virtual hosts you know. In practice this meant signing up new users was slow and expensive.

Bigger problems

The only real problem with this project is that I can’t sell it. I can’t even explain to you why this is better than email or Facebook.

It basically comes down to “I like IRC”. Other than that, Hipchat has more features and better UX.

The project was born as a hack and something I would use. I had and still have no idea why anyone else would use it.

There are developers who don’t use IRC, and non-developers that don’t even know what IRC is. Who would have thought…

Conclusion

It was fun, I learned. Next time I build somthing, I should figure out if and why people want it.

TRC is on Github, so if you do care, you can run your own server. The deploy script is probably broken though.

There is one instance of TRC still running as a bouncer that actually has users, including myself. If you are looking for a bouncer with a fancy web interface, you can have it for 3 Euro per month.

It’s all very small-scale, so if you want some random plugin installed, you can probably have it. Chances are I won’t renew the wildcard SSL cert though, these things are expensive.