# Wishful Coding

Didn't you ever wish your computer understood you?

## Qt+GStreamer+DDS+Android

Greetings, web wanderer. I’m sorry to see you here, because I know your pain. Be brave, though it is almost certain that this post will not solve your problem, it may lead you to the next one.

What you see in the picture is an Android app, written in Qt, streaming video with GStreamer over DDS. It is the result of days and days of wrestling with build tools. If you have any chance to drop any of these dependencies, I wholeheartedly suggest you do. Any two of these is enough to ruin your day, any 3 is likely to take a few more days, and all 4 may not be possible at all with any sort of reproducability.

Before getting into details, I’d like to propose “de Vos third law of compilation”, which states that the probability of a successful build scales with 1/N^2 where N is the number of build tools and compilation steps. This is a conservative estimate. It is suggested to take 2N when cross-compiling.

So let’s have a look what we’re dealing with. Qt has its own qmake system, which generates makefiles, which invoke moc for an extra code generation step, resulting in 3 steps. RTI Connext DDS has rtiddsgen which generates makefiles and code for your custom data types, for anther 3 steps. Android uses Gradle, with CMake for the native parts, which of course generates makefiles. GStreamer seems like a pretty normal C library, but when Android comes into play, you’d be wrong. I lost count. Let’s say 12, or 24 because we’re cross-compiling, resulting in a chance of success of 1/24^2*100=0.17%.

To reduce N, and tie all these incompatible build systems together, I chose to use CMake. Let’s break down the problem, and dive in. Some customized build scripts can be found in this gist.

### DDS+Qt

Probably the least interesting, and the most easy. There is a Qt CMake manual, and the DDS part is as simple as copying some defines and running rtiddsgen. To do that, I found some handy CMake file, which I adapted from OpenSplice to RTI Connext. See above gist.

### Qt+GStreamer

This one took a bit more work. Of course, on your average Linux box you can just apt-get all the things, so the main challenge here is rendering the GStreamer sink inside a QtQuick2 app. I will present two ways, one that works, and one that works properly, but is more painful.

The Qt widgets way is to take a widget, get its window ID, and re-parent the sink to the widget. The problem is that QtQuick Controls 2 are no longer based on Qt widgets, so they don’t have a window ID. Instead you can re-parent to the top-level window and set the bounding box.

However, this only really works on Linux. On Windows the render rectangle is ignored and then it segfaults. On Android window_handle expects an ANativeActivity*, which is hard to come by in Qt land. The solution is to use qmlglsink, which sadly does not come precompiled with any GStreamer installation.

Luckily, if you download gst-plugins-good from a Github release rather than the official download, you’ll find ext/qt/qtplugin.pro, which allows you to compile the plugin. At least, once you change the DEFINES to HAVE_QT_X11, HAVE_QT_WAYLAND and/or HAVE_QT_EGLFS(Android). After copying the resulting .so to your plugin folder, you can verify with gst-inspect-1.0 qmlglsink that there is a chance that it might work. There is even some useful example code. The key parts are as follows.

### Android

This is where everything gets ten times harder. If it wasn’t for qt-android-cmake, I’d have rewritten the whole thing for qmake. That would have reclaimed some sanity in some places, and lose some in others. As its author put it:

When using Qt for Android development, QMake & QtCreator is the only sane option for compiling and deploying.

Take extreme care which compilers, C++ STL library, API versions, NDK versions, Qt versions, etc. you’re using, because nothing works if you pick the wrong one. Both DDS and Qt are built against the ancient 10e NDK, so I suggest using that. However, the default API version is 16, which does not have things needed to compile GStreamer, so you have to use API version 21, the most modern that ships with 10e. This NDK uses gnustl with gcc, rather than the more modern llvm NDK, which is supported by none of these libraries.

#### DDS

This is again relatively easy. It just takes a lot of defines to point it to the correct stuff.

#### Qt

Qt supports Android, so this should be relatively easy too. It turns out there is a bug in CMake that can be worked around by editing lib/cmake/Qt5Core/Qt5CoreConfigExtras.cmake and deleting set_property(TARGET Qt5::Core PROPERTY INTERFACE_COMPILE_FEATURES cxx_decltype).

I had some problem linking the DDS libraries from earlier, so I copied all the stuff into NDDSHOME and put that on the CMAKE_FIND_ROOT_PATH inside the qt-android-cmake toolchain. It’s also worth noting I modified the manifest XML in the toolchain to give my app the permissions it needs. Maybe there is a neater way, but this works.

With all those issues out of the way, you can get a basic Qt app going with a ton of extra defines.

#### GStreamer

They provide prebuilt Android binaries, but still…

This is really the hardest one to get working with Qt and Android. It needs some special setup, that is taken care of by a ndk-build thing that ties into the Gradle build system of vanilla Android apps, but since we’re on Qt, we have to do this by ourselves. Better get some more tea/coffee/hot chocolate.

Step one is some manual template expansion. Take gstreamer_android-1.0.c.in, and copy it into your project. If you don’t have any fonts and stuff, you can ignore GStreamer.java and comment out everything that refers to it at the bottom of gstreamer_android-1.0.c.

Next you need to replace @PLUGINS_DECLARATION@ and @PLUGINS_REGISTRATION@ with all the plugins you are using. Mine looks like this

Next you’ll need some initialization code, which you can copy from ystreet or from the gist I linked to at the top.

The GStreamer Android libs come with pkg-config files, but they have the wrong prefix. Nothing some sed magic can’t fix. Or you might as well just hard-code all the flags and paths.

Once you set up your include and library paths and run make, you’ll get a ton of link errors. That’s good. First you’ll get a small number that directly mention the plugins you registered. Add the plugin to LIBS (qmake) or target_link_libraries (cmake) and try again.

Now you’ll get tons and tons of reference errors. Progress! Find their source and add them. Note that order matters, so add them at the bottom. Quick tip: you can use ndk/path/bin/android-eabi-blah-ar to list all the symbols in a library. Or Google them. Or use trial-and-error.

##### qmlglsink

The above will get you a working GStreamer+Qt app. But as mentioned, getting the sink to work is not fun. However, due to the Qt dependency, the qmlglsink is not precompiled as part of gst-plugins-good, so you’re left to do that by yourself.

The provided qmake file in ext/qt is written for Windows, so we’ll need to change some things and mess around. First of all, we’d like a static library, so add CONFIG += staticlib. Next, we’re not on Windows, so add HAVE_QT_EGLFS to DEFINES. Then a bunch more defines will let qmake run to completion.

However, this uses API version 16, which won’t work due to missing some OpenGL headers. I’m sure there is a good way to do it, but you can also just vim Makefile and :%s/16/21/gc. If you get link errors, add them to LIBS and try again.

Finally, copy the .a to your other Android plugins, add the declaration and registration, resolve the link errors, and DONE.

### Conclusion

It works, but don’t do it if you have other options.

I’m sure none of this works for you, and I can’t help you. You’ll just have to muddle through. All I can tell you is that it is a giant relief when it finally works, and you see a live video stream on your Android device. The best of luck.

## Partial Decoding of 360° HD Virtual Reality Video

I’m doing some mental cleaning, putting some ideas out there that I had saved up for a master thesis, startup, or other ambition. Starting with this VR-related idea.

I got this idea from a post by John Carmack about 5k video decoding on VR headsets, where he talks about the challenges of 360° HD video. Basically, it’s a lot of data, and the user is only looking at about 1/6 of it. The problem with partial decoding is that conventional video codecs use key frames and motion prediction. John’s solution is to slice up the video in tiles with extra many key frames and decode those, with an extra low-resolution backdrop for quick head motions.

I thought there must be better ways, so I made a new video codec to do efficient partial decoding. It’s based on the 3D discrete cosine transform, that I implemented on the GPU in Futhark. It’s the same thing used in JPEG, with the third dimension being time.

Think of it like this: If you’d put all the video frames behind each other, you basically get a cube of pixels. So similar to how you compress areas of the same color in JPEG, now you can compress volumes of the same color over time.

The way compression like this works is that you take blocks of 8x8(x8) pixels, and transform them to frequency domain. (the cosine transform is family of the Fourier transform) A property of the cosine transform is that most of the important information is at low frequencies, so you can basically set the high-frequency parts to zero. Then you do lossless compression, which is great at compressing long runs of zeros.

Well, that’s how JPEG and 3D-DCT video compression works, which has been written about a lot. That’s not a new thing. But what’s really great about 3D-DCT compared to motion prediction is that you can decode and arbitrary 8x8x8 cube without any extra data. This makes it great for VR video, I think.

What’s even more cool: The DC component of the DCT is the average of the whole cube, so without any decoding, you can take the DC component to get your low-resolution back-drop. This is also 1/8th the frame rate, so it may be desirable to partially decode the frame, which is totally possible. You just apply the 1D inverse DCT to the time dimension and take the DC components of the 2D frames from there.

After implementing a proof of concept in Futhark (for the DCT) and Python (for the IO and interface), I sent an email to John Carmack with the video above. His reply:

There are at least three companies working full time on schemes for partial video decode in VR. I have been in communication with Visbit and TiledMedia, and I know there are a couple others. An algorithm isn’t going to be worth much of anything, but a functioning service, like they are trying to do, may have some kind of acquisition exit strategy, but it isn’t looking great for them right now.

Long ago, I did some investigation of 3D DCT for video compression, and it wasn’t as competitive as I hoped – 2D motion prediction winds up being rather more flexible than the DCT basis functions, and video frames are actually aliased in time due to shutter exposures being a fraction of the time duration, so it isn’t as smooth as the spatial dimensions.

Though the main reason I shelved this idea is that there is not really a viable path to get this onto VR headsets. A mobile video codec pretty much has to be implemented in hardware, but for such a niche market, it’s hard to imagine a way to realize this hardware. If there were a CPU manufacturer interested in licensing my 3D-DCT IP block into their products, I’d be more than happy to finish the thing.

## Futhark: Python gotta go faster

While discussing the disappointing performance of my Futhark DCT on my “retro GPU”(Nvidia NVS 4200M) with Troels Henriksen, it came up that the Python backend has quite some calling overhead.

Futhark can compile high-level functional code to very fast OpenCL, but Futhark is meant to be embedded in larger programs. So it provides a host library in C and Python that set up the GPU, transfer the memory, and run the code. It turns out the the Python backend based on PyOpenCL is quite a bit slower at this than the C backend.

I wondered why the Python backend did not use the C one via FFI, and Troels mentioned that someone had done this for a specific program and saw modest performance gains. However, this does require a working compiler and OpenCL installation, rather than just a pip install PyOpenCL, so he argued that PyOpenCL is the easiest solution for the average data scientist.

I figured I might be able to write a generic wrapper for the generated C code by feeding the generated header directly to CFFI. That worked on the first try, so that was nice. The hard part was writing a generic, yet efficient and Pythonic wrapper around the CFFI module.

The first proof of concept required quite a few fragile hacks (pattern matching on function names and relying on the type and number of arguments to infer other things) But it worked! My DCT ran over twice as fast. Then, Troels, helpful as always, modified the generated code to reduce the number of required hacks. He then proceeded to port some of the demos and benchmarks, request some features, and contribute Python 2 support.

futhark-ffi now supports all Futhark types on both Python 2 and 3, resulting in speedups of anywhere between 20% and 100% compared to the PyOpenCL backend. Programs that make many short calls benefit a lot, while programs that call large, long-running code benefit very little. The OpenCL code that runs is the same, only the calling overhead is reduced.

One interesting change suggested by Troels is to not automatically convert Futhark to Python types. For my use case I just wanted to take a Numpy array, pass it to Futhark, and get a Numpy array back. But for a lot of other programs, the Futhark types are passed between functions unchanged, so not copying them between the GPU and CPU saves a lot of time. There is even a compatibility shim that lets you use futhark-ffi with existing PyOpenCL code by merely changing the imports. An example of this can be seen here

After installing Futhark, you can simply get my library with pip. (working OpenCL required)

pip install futhark-ffi


Usage is as follows. First generate a C library, and build a Python binding for it

futhark-opencl --library test.fut
build_futhark_ffi test


From there you can import both the CFFI-generated module and the library to run your Futhark code even faster!

import numpy as np
import _test
from futhark_ffi import Futhark

test = Futhark(_test)
res = test.test3(np.arange(10))
test.from_futhark(res)