My first challenge was getting the Xbox Kinect to interact with a piece of software that would allow me to control Ableton or other programs using just my body. I looked at several programs including:
Open NI (Open Natural Interface) is an open-source software project started in 2010 aiming to improve the ease of information transfer between various pieces of hardware and software. It was discontinued by Apple in 2014 following the release of Microsoft’s Kinect for windows SDK. Although the software is still available today, it is outdated and there are other, faster solutions out there.
TouchDesigner is a node-based visual programming language for real-time interactive multimedia content. Compatible with the Xbox Kinect, this looked like a suitable piece of software to use, however getting this program to interact with Ableton appeared to be more difficult than other alternatives.
Nuitrack is a piece of middleware designed to replace the Kinect SDK as a newer, faster and more feature-rich alternative. However, the price of $99, proved to be the prohibitive factor with this option.
I found several other options, but most of them lead to dead links and discontinued projects. Unfortunately, although all these options had their own positives none of them would allow me to easily send inputs into Ableton Live, which was what I was looking for.
That was when I found dp.kinect which is a program written to run inside of Max MSP which meant I could run it in Max4Live and use it with Ableton. dp.kinect is a plug-in, an external extension, for Max MSP that allows you to use your Microsoft Kinect v1 sensors on a Windows PC. It has a range of features including:
Colour image, depth, and IR sensor output in many pixel formats
User identification, location, and occlusion
Skeleton joint tracking with orientations
Point clouds, accelerometer, and gravity
Sound location and strength; speech recognition
Face tracking with pose, rotation, translation, bounding boxes, key 2D and 3D points; face 3D modelling with animation/shape units
Output data in native Max messages or OSC; compatible with the output of dp.kinect and jit.openni to aid in migration
Using different objects (modules with different functions, such as add and subtract), you can select the data outputs that you want to use from the Xbox Kinect. The main one that I used was hand position data on an XYZ axis, where the X axis is horizontal, Y is vertical and Z is depth.
The plug-in has seven possible outputs, each with a different purpose: “depthmap”, “colormap”, “irmap”, “playermap”, “pointcloud”, skeleton (and other data) messages, and a “dumpout”. Since there are many more streams of data than physical outputs, you have to specify exactly what you want to get it. If, for example, you want to get the left hand position data, you would first have to slice the data in half with “zl slice 1” then request the hand data stream with “route L_hand”. Then you could unpack that into three separate outputs for X, Y and Z.
However, a rather significant problem I encountered was that you can only run one instance of the plug-in per Kinect at a time which would mean that only one track and instrument within Ableton could be controlled at once, therefore limiting how freely I could compose. To overcome this problem, I built a patch that acts as a hub and sends any stream of data out that I want, such as X, Y and Z hand position, through a network port allowing any number of patches in Max4Live to receive a simple number between -1. and 1. This turned out to be a rather positive solution as it significantly cuts down on the objects in each patch, with the receiving patches only requiring one object per network port instead of the whole dp.kinect plugin.
There are several ways to send numbers over a network from Max. The simplest way to communicate over the internet in Max is by sending Max messages as UDP (User Datagram Protocol) packets, with the “udpsend” and “udpreceive” objects, but each port can only send to one other port meaning that I would need another port for each parameter I want to change. TCP works slightly different to UDP as it sends confirmation packets back to the sender when the packet is received making it better for large amounts of media data, such as video. However, it was not suitable for my use as, again, you can only connect one receiver to each send and there is a higher latency than with UDP. My final choice was to use Maxhole, a Java class built into Max MSP, which is designed to send data from Max on one computer to Max on another. This was perfect for me as each Maxhole sends broadcasts out to any Maxhole listening, meaning all my patches on each track can receive my hand position. There is also a jitter object called “jit.net” which is used for transmitting matrices for video. I used this in the second half of my project when experimenting with sending video from the Kinect or other videos to other patches. However, I realised at the end that I could have done all this with the basic send object instead as I was just using it on one PC, but it was still useful to learn in case I wanted to use multiple computers.