Wishful Coding

Didn't you ever wish your
computer understood you?

Behavior Trees are Concatenative Programs

I have previously talked about how I got pulled into building soccer robots at Roboteam Twente. During the summer holiday I spent a lot of time thinking about their code and about ways to improve it. During this time I wrote 3 different behavior tree implementations, and stumbled upon something that I’ve not seen in any of the literature on behavior trees. (Behavior trees are basically state machines on steroids)

The main problem I was trying to solve is that over time, the leaf nodes that should contain the “how” in a tree of “what” had gotten increasingly large and complex, and it was desired to split those up in smaller tasks.

Behavior tree - C++ language barrier

Another related problem was that these leaf nodes, or skills as we call them, are basically untestable because they touch a lot of global mutable state and do a bunch of IO.

mutable state behavior tree

I believe these skills became so large and complex because the smaller skills that live inside them have complex interdependencies that cannot be easily expressed in behavior trees. There is not really a good way for these nodes to pass parameters around.

The conventional method for sharing information between skills is using “blackboards”, which are more like “inverse blackboards”, where rather than one person sharing information with the whole room, everyone is running around chalking random stuff all over the board.

So my first implementation applied some basic functional programming. All our skills look basically the same, get the world state, do some logic, and send messages. So I simply made all the nodes take the world state and return a list of messages.

functional behavior tree

However, this seemed way too specific, and did not really solve the problem of passing parameters around between nodes. Once I had generalized the list of messages to a parameter stack, the realization struck me that I was doing concatenative programming! I repeat:

Behavior Trees are Concatenative Programs

The idea that I was basically building a domain-specific stack-based concatenative programming language lead to a boatload of new ideas and implementation methods.

I particularly liked the way Forth has this hybrid between compiling and interpreting that allows very efficient yet very interactive programming. I also took a long, hard look at how Joy formalized statically typed concatenative programming in a purely functional way.

My second implementation was in Clojure, and I was delighted by how short and expressive it is. I will repeat my FizzBuzz implementation as a behavior tree below, where you pass in a number, and it returns a number or a string.

(defn fizzbuzz [n]
  (-> {}
    (spush n)
    (selector
      #(sequential %
          (modn 15)
          (message "FizzBuzz"))
      #(sequential %
          (modn 3)
          (message "Fizz"))
      #(sequential %
          (modn 5)
          (message "Buzz")))
    speek))

Seeing how the Clojure implementation played out, I doubled down on the concatentative programming language, and started work on an actual compiler for my DSL.

The third implementation is an actual compiler that fulfills most of my Joy aspirations. It compiles a statically typed Forth-like language to C code, where you can easily implement new words in C.

There is no explicit stack or API for defining words. I parse your C headers for compatible functions, and compile your Bobcat code to a series of function calls on regular C stack variables.

There is no conventional control flow in Bobcat. It has quotations (which compile to function pointers), and special forms for function definition and – importantly – interpreting a quotation as a behavior tree node, which adds the necessary control flow to implement all the usual behavior tree nodes (sequential, selector, etc.) as C functions.

One noteworthy feature is that there is actually a separate stack per type. I think this is a good trade-off for a DSL, compared to the homogeneous stack of words seen in most stack-based languages. This limits your ability to have generic words like swap and dup, but allows dealing with many things, without too much stack juggling. Those details are mostly handled in the host language.

A funny detail I implemented is the comma operator, which is like executing words in parallel. Much like Clojure’s juxt.

As a small example, something simple like 2 dupi addi 5 muli will compile to the following (which GCC can basically optimize to *out_0=20)

void test(int *out_0)
{
  int var_1 = 2;
  int b_1;
  int c_1;
  dupi(var_i, &b_1, &c_1);
  int c_3;
  addi(c_1, b_2, &c_3);
  int var_2 = 5;
  muli(var_2, c_3, &c_e);
  *out_0 = c_5;
}

With something like dupi simply defined as

void dupi (int const a, int* b, int* c) {
    *b = a;
    *c = a;
}

What is sill missing is a more powerful way to work with quotations, which are at the moment useless and broken outside of function definitions and behavior tree nodes. And a really big omission in this design is interactive programming. I want this quite badly, so maybe some day there will be a fo(u)rth implementation, which will be more like Forth.

But for now, I’ve started my masters and a new job, so I had to put a halt to this development. However, I still think this is a powerful concept that will allow writing more expressive, small, testable behavior trees.

Qt+GStreamer+DDS+Android

QML app with DDS and GStreamer on Android

Greetings, web wanderer. I’m sorry to see you here, because I know your pain. Be brave, though it is almost certain that this post will not solve your problem, it may lead you to the next one.

What you see in the picture is an Android app, written in Qt, streaming video with GStreamer over DDS. It is the result of days and days of wrestling with build tools. If you have any chance to drop any of these dependencies, I wholeheartedly suggest you do. Any two of these is enough to ruin your day, any 3 is likely to take a few more days, and all 4 may not be possible at all with any sort of reproducability.

Before getting into details, I’d like to propose “de Vos third law of compilation”, which states that the probability of a successful build scales with 1/N^2 where N is the number of build tools and compilation steps. This is a conservative estimate. It is suggested to take 2N when cross-compiling.

So let’s have a look what we’re dealing with. Qt has its own qmake system, which generates makefiles, which invoke moc for an extra code generation step, resulting in 3 steps. RTI Connext DDS has rtiddsgen which generates makefiles and code for your custom data types, for anther 3 steps. Android uses Gradle, with CMake for the native parts, which of course generates makefiles. GStreamer seems like a pretty normal C library, but when Android comes into play, you’d be wrong. I lost count. Let’s say 12, or 24 because we’re cross-compiling, resulting in a chance of success of 1/24^2*100=0.17%.

To reduce N, and tie all these incompatible build systems together, I chose to use CMake. Let’s break down the problem, and dive in. Some customized build scripts can be found in this gist.

DDS+Qt

Probably the least interesting, and the most easy. There is a Qt CMake manual, and the DDS part is as simple as copying some defines and running rtiddsgen. To do that, I found some handy CMake file, which I adapted from OpenSplice to RTI Connext. See above gist.

Qt+GStreamer

This one took a bit more work. Of course, on your average Linux box you can just apt-get all the things, so the main challenge here is rendering the GStreamer sink inside a QtQuick2 app. I will present two ways, one that works, and one that works properly, but is more painful.

The Qt widgets way is to take a widget, get its window ID, and re-parent the sink to the widget. The problem is that QtQuick Controls 2 are no longer based on Qt widgets, so they don’t have a window ID. Instead you can re-parent to the top-level window and set the bounding box.

QQmlApplicationEngine* engine;
engine.load(QUrl(QStringLiteral("qrc:/main.qml")));
// ...
QObject* m_rootObject = engine->rootObjects().first();
WId wid = -1;
if(m_rootObject) {
    QWindow *window = qobject_cast<QWindow *>(m_rootObject);
    if(window) {
        wid = window->winId();
    }
}
// ...
video_sink = gst_element_factory_make("glimagesink", "data.video_sink");
// ...
gst_video_overlay_set_window_handle (GST_VIDEO_OVERLAY (video_sink), wid);
gst_video_overlay_set_render_rectangle(GST_VIDEO_OVERLAY (video_sink), x, y, width, height);
gst_video_overlay_expose(GST_VIDEO_OVERLAY (video_sink));

However, this only really works on Linux. On Windows the render rectangle is ignored and then it segfaults. On Android window_handle expects an ANativeActivity*, which is hard to come by in Qt land. The solution is to use qmlglsink, which sadly does not come precompiled with any GStreamer installation.

Luckily, if you download gst-plugins-good from a Github release rather than the official download, you’ll find ext/qt/qtplugin.pro, which allows you to compile the plugin. At least, once you change the DEFINES to HAVE_QT_X11, HAVE_QT_WAYLAND and/or HAVE_QT_EGLFS(Android). After copying the resulting .so to your plugin folder, you can verify with gst-inspect-1.0 qmlglsink that there is a chance that it might work. There is even some useful example code. The key parts are as follows.

QQuickItem* videoItem; // from somewhere
/* the plugin must be loaded before 
 * loading the qml file to register the
 * GstGLVideoItem qml item */
video_sink = gst_element_factory_make("qmlglsink", "data.video_sink");
g_object_set(video_sink, "widget", videoItem, NULL);
import org.freedesktop.gstreamer.GLVideoItem 1.0

GstGLVideoItem {
  id: videoView
  onItemInitializedChanged: {
    if (itemInitialized) {
      // play the video
    }
  }
}

Android

This is where everything gets ten times harder. If it wasn’t for qt-android-cmake, I’d have rewritten the whole thing for qmake. That would have reclaimed some sanity in some places, and lose some in others. As its author put it:

When using Qt for Android development, QMake & QtCreator is the only sane option for compiling and deploying.

Take extreme care which compilers, C++ STL library, API versions, NDK versions, Qt versions, etc. you’re using, because nothing works if you pick the wrong one. Both DDS and Qt are built against the ancient 10e NDK, so I suggest using that. However, the default API version is 16, which does not have things needed to compile GStreamer, so you have to use API version 21, the most modern that ships with 10e. This NDK uses gnustl with gcc, rather than the more modern llvm NDK, which is supported by none of these libraries.

DDS

This is again relatively easy. It just takes a lot of defines to point it to the correct stuff.

export ANDROID_SDK=/home/rove/Android/Sdk/
export ANDROID_NDK=/home/rove/Android/android-ndk-r10e/
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export NDDSHOME=/home/rove/rti_connext_dds-5.3.1/
cmake .. -DRTI_TARGET_ARCH=armv7aAndroid5.0gcc4.9ndkr10e -DCMAKE_TOOLCHAIN_FILE=../../qt-android-cmake/toolchain/android.toolchain.cmake -DCMAKE_MAKE_PROGRAM=$(which make) -DANDROID_NATIVE_API_LEVEL=21 -DPKG_CONFIG_EXECUTABLE=$(which pkg-config)

Qt

Qt supports Android, so this should be relatively easy too. It turns out there is a bug in CMake that can be worked around by editing lib/cmake/Qt5Core/Qt5CoreConfigExtras.cmake and deleting set_property(TARGET Qt5::Core PROPERTY INTERFACE_COMPILE_FEATURES cxx_decltype).

I had some problem linking the DDS libraries from earlier, so I copied all the stuff into NDDSHOME and put that on the CMAKE_FIND_ROOT_PATH inside the qt-android-cmake toolchain. It’s also worth noting I modified the manifest XML in the toolchain to give my app the permissions it needs. Maybe there is a neater way, but this works.

With all those issues out of the way, you can get a basic Qt app going with a ton of extra defines.

export ANDROID_SDK=/home/rove/Android/Sdk/
export ANDROID_NDK=/home/rove/Android/android-ndk-r10e/
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export NDDSHOME=/home/rove/rti_connext_dds-5.3.1/
export CMAKE_PREFIX_PATH=/home/rove/Qt/5.11.1/android_armv7/
export GSTREAMER_ROOT=/home/rove/bin/gst/armv7
export PKG_CONFIG_PATH=/home/rove/bin/gst/armv7/lib/pkgconfig/
cmake .. -DRTI_TARGET_ARCH=armv7aAndroid5.0gcc4.9ndkr10e -DCMAKE_TOOLCHAIN_FILE=../../qt-android-cmake/toolchain/android.toolchain.cmake -DCMAKE_MAKE_PROGRAM=$(which make) -DANDROID_NATIVE_API_LEVEL=21 -DPKG_CONFIG_EXECUTABLE=$(which pkg-config)

GStreamer

They provide prebuilt Android binaries, but still…

This is really the hardest one to get working with Qt and Android. It needs some special setup, that is taken care of by a ndk-build thing that ties into the Gradle build system of vanilla Android apps, but since we’re on Qt, we have to do this by ourselves. Better get some more tea/coffee/hot chocolate.

Life-saving example code from ystreet.

Step one is some manual template expansion. Take gstreamer_android-1.0.c.in, and copy it into your project. If you don’t have any fonts and stuff, you can ignore GStreamer.java and comment out everything that refers to it at the bottom of gstreamer_android-1.0.c.

Next you need to replace @PLUGINS_DECLARATION@ and @PLUGINS_REGISTRATION@ with all the plugins you are using. Mine looks like this

/* Declaration of static plugins */
GST_PLUGIN_STATIC_DECLARE(app);
GST_PLUGIN_STATIC_DECLARE(rtpmanager);
GST_PLUGIN_STATIC_DECLARE(rtp);
GST_PLUGIN_STATIC_DECLARE(libav);
GST_PLUGIN_STATIC_DECLARE(videoscale);
GST_PLUGIN_STATIC_DECLARE(opengl);
GST_PLUGIN_STATIC_DECLARE(videoconvert);
GST_PLUGIN_STATIC_DECLARE(qmlgl);
GST_PLUGIN_STATIC_DECLARE(videofilter);

/* Declaration of static gio modules */

/* Call this function to register static plugins */
void
gst_android_register_static_plugins (void)
{
 GST_PLUGIN_STATIC_REGISTER(app);
 GST_PLUGIN_STATIC_REGISTER(rtpmanager);
 GST_PLUGIN_STATIC_REGISTER(rtp);
 GST_PLUGIN_STATIC_REGISTER(libav);
 GST_PLUGIN_STATIC_REGISTER(videoscale);
 GST_PLUGIN_STATIC_REGISTER(opengl);
 GST_PLUGIN_STATIC_REGISTER(videoconvert);
 GST_PLUGIN_STATIC_REGISTER(qmlgl);
 GST_PLUGIN_STATIC_REGISTER(videofilter);
}

/* Call this function to load GIO modules */
void
gst_android_load_gio_modules (void)
{
}

Next you’ll need some initialization code, which you can copy from ystreet or from the gist I linked to at the top.

The GStreamer Android libs come with pkg-config files, but they have the wrong prefix. Nothing some sed magic can’t fix. Or you might as well just hard-code all the flags and paths.

Once you set up your include and library paths and run make, you’ll get a ton of link errors. That’s good. First you’ll get a small number that directly mention the plugins you registered. Add the plugin to LIBS (qmake) or target_link_libraries (cmake) and try again.

Now you’ll get tons and tons of reference errors. Progress! Find their source and add them. Note that order matters, so add them at the bottom. Quick tip: you can use ndk/path/bin/android-eabi-blah-ar to list all the symbols in a library. Or Google them. Or use trial-and-error.

qmlglsink

The above will get you a working GStreamer+Qt app. But as mentioned, getting the sink to work is not fun. However, due to the Qt dependency, the qmlglsink is not precompiled as part of gst-plugins-good, so you’re left to do that by yourself.

The provided qmake file in ext/qt is written for Windows, so we’ll need to change some things and mess around. First of all, we’d like a static library, so add CONFIG += staticlib. Next, we’re not on Windows, so add HAVE_QT_EGLFS to DEFINES. Then a bunch more defines will let qmake run to completion.

export ANDROID_NDK_ROOT=/home/rove/Android/android-ndk-r10e/
export PKG_CONFIG_DIR=
export PKG_CONFIG_LIBDIR=/home/rove/bin/gst/armv7/lib/pkgconfig/
export PKG_CONFIG_SYSROOT_DIR=/
/home/rove/Qt/5.11.1/android_armv7/bin/qmake

However, this uses API version 16, which won’t work due to missing some OpenGL headers. I’m sure there is a good way to do it, but you can also just vim Makefile and :%s/16/21/gc. If you get link errors, add them to LIBS and try again.

Finally, copy the .a to your other Android plugins, add the declaration and registration, resolve the link errors, and DONE.

Conclusion

It works, but don’t do it if you have other options.

I’m sure none of this works for you, and I can’t help you. You’ll just have to muddle through. All I can tell you is that it is a giant relief when it finally works, and you see a live video stream on your Android device. The best of luck.

Partial Decoding of 360° HD Virtual Reality Video

I’m doing some mental cleaning, putting some ideas out there that I had saved up for a master thesis, startup, or other ambition. Starting with this VR-related idea.

I got this idea from a post by John Carmack about 5k video decoding on VR headsets, where he talks about the challenges of 360° HD video. Basically, it’s a lot of data, and the user is only looking at about 1/6 of it. The problem with partial decoding is that conventional video codecs use key frames and motion prediction. John’s solution is to slice up the video in tiles with extra many key frames and decode those, with an extra low-resolution backdrop for quick head motions.

I thought there must be better ways, so I made a new video codec to do efficient partial decoding. It’s based on the 3D discrete cosine transform, that I implemented on the GPU in Futhark. It’s the same thing used in JPEG, with the third dimension being time.

Think of it like this: If you’d put all the video frames behind each other, you basically get a cube of pixels. So similar to how you compress areas of the same color in JPEG, now you can compress volumes of the same color over time.

The way compression like this works is that you take blocks of 8x8(x8) pixels, and transform them to frequency domain. (the cosine transform is family of the Fourier transform) A property of the cosine transform is that most of the important information is at low frequencies, so you can basically set the high-frequency parts to zero. Then you do lossless compression, which is great at compressing long runs of zeros.

Well, that’s how JPEG and 3D-DCT video compression works, which has been written about a lot. That’s not a new thing. But what’s really great about 3D-DCT compared to motion prediction is that you can decode and arbitrary 8x8x8 cube without any extra data. This makes it great for VR video, I think.

What’s even more cool: The DC component of the DCT is the average of the whole cube, so without any decoding, you can take the DC component to get your low-resolution back-drop. This is also 1/8th the frame rate, so it may be desirable to partially decode the frame, which is totally possible. You just apply the 1D inverse DCT to the time dimension and take the DC components of the 2D frames from there.

After implementing a proof of concept in Futhark (for the DCT) and Python (for the IO and interface), I sent an email to John Carmack with the video above. His reply:

There are at least three companies working full time on schemes for partial video decode in VR. I have been in communication with Visbit and TiledMedia, and I know there are a couple others. An algorithm isn’t going to be worth much of anything, but a functioning service, like they are trying to do, may have some kind of acquisition exit strategy, but it isn’t looking great for them right now.

Long ago, I did some investigation of 3D DCT for video compression, and it wasn’t as competitive as I hoped – 2D motion prediction winds up being rather more flexible than the DCT basis functions, and video frames are actually aliased in time due to shutter exposures being a fraction of the time duration, so it isn’t as smooth as the spatial dimensions.

Though the main reason I shelved this idea is that there is not really a viable path to get this onto VR headsets. A mobile video codec pretty much has to be implemented in hardware, but for such a niche market, it’s hard to imagine a way to realize this hardware. If there were a CPU manufacturer interested in licensing my 3D-DCT IP block into their products, I’d be more than happy to finish the thing.

Published on