Wishful Coding

Didn't you ever wish your computer understood you?

Creating a Gigabyte of NOPs

At the university we’re learning about computer architecture. The professor said that registers are used because reading from main memory is too slow. But then he also said the program is stored in main memory. How can the CPU ever execute an instruction every cycle if memory is so slow?

The answer appears to be caching. The CPU can store parts of the program in the cache, and access it fast enough from there apparently. But what if your program is bigger then the cache, or even bigger than memory? I assume it will be limited by the throughput of the RAM/disk. Let’s find out.

As a baseline I created a loop that executes a few NOPs. Enough to neglect the loop overhead, but not so much they spill the cache. I was very surprised to find that on my 2.2GHz i7, it executed 11.5GHz. Wat? I thought I made an error in my math or my NOPs got optimized away, but this was not the case.

It turns out that my CPU has a turbo frequency of 3GHz and 4 execution units that can execute an (independent) instruction each. 3×4=12GHz of single-core performance. Not bad.

Now what if it does not fit in cache? Let’s create a GB of NOPs. This was not so easy and I used several “amplification” steps. First I compiled a C file to assembly with a CPP macro that generated 100 NOPs. Then I saved the NOPs in a separate file and used vim to “100dd10000p” create a million NOPs. Then I used cat to concatenate 10 of those files, and then 10 of those, and then included it 10 times in the original assembly file. Then I compiled with gcc -pipe -Wall -O0 -o bin10 wrap.S, which took a good number of minutes.

The resulting file still runs at a respectable 7GHz. I was expecting much slower, but in retrospect this was to be expected since the throughput of my DDR3 RAM is apparently 10GB/s. Much higher than I thought it would be.

To really slow down the program, I would need to make it so big that it doesn’t fit in RAM. Seeing how hard it was and how long it took to make a 1GB binary, a 20GB binary would require a new technique.

The other option is to generate a lot of jumps so that the speed becomes limited by the latency of RAM rather than the throughput. But again, generating a GB of jumps requires a completely new technique.

I’ll leave it at this for now, as I’ve already spent quite some time and learned a few interesting things. Now it doesn’t seem so odd anymore that apparently for large codebases -Os generally performs better than -O3.

Pepijn de Vos

Pokemon GO Old

Pokemon GO for old ‘90 kids based on Pokemon Gold1.

When I did Pokemon Gringo I used Pokemon Red with an Arduino, GPS, and external battery. It worked, but it was kind of unwieldy and unreliable.

This hack solves all those problems. It’s based on Pokemon Crystal, and relies on a pedometer that is directly connected and powered by the Game Boy.

Unlike Pokemon Gringo, Pokemon GO Old features the complete Pokemon game-play in full sound and color, except you have to be physically walking to walk in the game.

This means you get to hatch eggs by walking! Assuming a game step equals one meter2, it’d take 1Km (Magikarp) to 10Km (Dratino) to hatch an egg.

I made one other small change to bring character selection up to date with current standards.

choose your style

In my previous post I explained how I managed to connect the accelerometer directly to the Game Boy. From there it is relatively easy to use the accelerometer as a pedometer to control your movement in Pokemon GO Old.

I configured the LIS3DH to generate an interrupt when acceleration exceeds 64mg. Then I added a function that checks the interrupt register in the code that moves the character. Finally I made the buttons sticky so that you keep walking as long as you walk.

In case anyone wants to try it out, the complete ROM and code are on Github. The hardware is super simple; except for the flash cart it’s probably around $15 of components.

ball of wires

The Game Link cable connects directly to the LIS3DH, but a small circuit is required to drive the CS line.

buffer

  1. Actually Crystal, but everything for the pun.

  2. You take about 2 game steps in a real step, but no steps during battles.

Pepijn de Vos

Connect SPI sensors to a Game Boy

In all my previous Game Boy hardware hacks, I always used an Arduino to talk to the Game Boy over the Game Link cable. But the Game Link protocol is essentially just SPI, so I was thinking it would be trivial to talk directly to SPI sensors.

So I ordered an LIS3DH breakout from Adafruit and connected it to the Game Link port of my Game Boy. I wrote some code to read the WHOAMI register of the sensor, which should return 0x33. Nothing happened. I double-checked all the code and wiring, but nothing worked.

On day two I probed around with my oscilloscope and dug around in the datasheet. The problem turned out to be that I tied the Chip Select to ground, which is actually required to indicate the start of a transaction1. But the Game Boy does not have a Chip Select pin, so that seemed like it would be the end of it.

On day three I figured I might be able to emulate the CS line somehow. I tried various things with low-pass filters, binary counters and even an Attiny25, but nothing quite worked. The highlight of the day was that the Game Boy would display 0x24 if the stars aligned, and random junk otherwise. I gave up again.

On day four I woke up with the winning solution. It was glorious.

33

A MOSFET buffer connected to the clock line pulls low the CS line and a capacitor combined with the 10k pull-up on the breakout keep the line low between clock pulses and release it after transmission.

33

I have never before been so excited about the number 0x33. After 4 days of despair, trial and error, it finally worked.

33

It should be noted a large delay between transactions is required to raise the CS pin, so transmission will be slow.

After setting up the control registers somewhat correctly, I was able to read accelerometer data from the sensor. I made a quick demo that moves a dot around by rotating the sensor, as seen in the video. It’s not hard to imagine you could easily make a tilting maze game out of this.

But I have other plans. To be continued.

  1. I checked various sensors, and every single accelerometer I found was dual I2C/SPI and bit-for-bit identical to the LIS3DH, so using another sensor was not an option.

Pepijn de Vos