And, here’s the live edge detection version. Again, pretty straight forward. Same setup as before, but just changed the fragment shader to be an edge detector instead of that cross hatch thing. Really, you can do anything GLSL will support.
When it’s supported by OpenCL kernels, then it gets even more interesting because you’re not quite as constrained by the language constructs of GLSL. As I just explained to my daughter.
I must say, at this point, it really has nothing to do with the Kinect, other than to the extent that I’m using the Kinect as a cheap source of video input. Actually, it’s not a particularly cheap source for video input. The Kinect for PC cost $250, and requires a dedicated USB 2.0 bus, and is very finicky when it comes to performance.
For the same $250, you could purchase a couple of HD quality webcams, and enough hardware to put them on a piece of aluminum at a sufficient distance from each other to do some stereo correspondence work. That might be nice for some close up situations, which the Kinect for PC is supposed to deal with. The Kinect probably works better for that larger distances away from the screen, but does it…
Now that I have the Kinect behaving reliably, and the GLSL Shader thing going on fairly reasonably, it’s time to combine the two.
In the video, I am using the Kinect to capture a movie playing on the screen of my monitor. Any live video source will do really, but I might as well use the Kinect as a WebCam for this purpose.
The setup for the Kinect is same as before. Just get the sensor, and for every frame, grab the color image, and display it using a quad to the full window size.
Second, I just bring in the shader, just like before. The shader doesn’t know it’s modifying a live video feed. All it knows is that it’s sampling from a texture object, and doing its thing. Similarly, the Kinect code doesn’t know anything about the shader. It’s just doing its best to supply frames of color images, and leaving it at that.
Pulling it together just means instantiating the shader program, and letting nature run its course. The implications are fairly dramatic though. Through good composition, you can easily peform simple tricks like this. The next step would be to do some interesting processing on the frames as they come from the camera.
This brings up an interesting point in my mind. The Kinect is a highly specialized piece of optical equipment, promoted, and made cheap, because of Microsoft and the XBox. But, is it the best and only way to go? The quality of the 3D imaging is kind of limited by the resolution and technique of the IR projector/camera. Through some powerful software, you can get good stereo correspondence to match up the depth pixels with the color pixels.
But, why not go with just plain old ordinary WebCams and stereo correspondence? Really, it’s just a challenging math problem. Well, using OpenCL, and the power of the GPU, doing stereo correspondence calculations shouldn’t be much of a problem should it? Then, the visualization and 3D model generation could happen with a reasonable stereo camera rig. WebCam drivers are a well trod ground, so cross platform would be instantaneous. Are there limitations to this approach? Probably, but not any that are based on the physical nature of the sensors, but rather they are based on the ability to do some good quality number crunching in realtime, which is what the GPU excels at.
I did the initial interop FFI work for the Kinect a couple weeks back. That initial effort was sufficient to allow me to view an upside down image on my screen. The video was not very reliable, and quite a few frames were dropped, and eventually the thing would just kind of hang. Same as on the XBox actually. Mostly it works, but occasionally, it just kind of loses its mind.
Well, I’ve updated the code a bit. But, one of the most noticeable changes in my rig is that I’ve purchased the Kinect for PC. It does in fact make a difference. I don’t know if it’s primarily the slightly shorter USB cable, or because I plugged into a better USB port on my machine, or that MS has actually done something to improve it, but it is better.
In this picture, I’m capturing the color video stream at 640×480, which simultaneously capturing the depth at 320×240. It’s good enough that I can display both in realtime (~30 fps). It does get a bit jerky when there is a lot of movement in the scene, but it will run on and on and on.
I’ve done a couple of things to the code itself. First, I put evrything into a single file (Kinect.lua). It’s a bit beefy at about 1070 lines, but compared to various other files, it’s fairly tame.
Second, I abstracted the video and depth streams, so there is a KinectImageStream object. Getting that from the sensor looks like this now:
flags = bor(NUI_INITIALIZE_FLAG_USES_COLOR,NUI_INITIALIZE_FLAG_USES_DEPTH) local sensor0 = Kinect.GetSensorByIndex(0, flags); local colorStream = sensor0.ColorStream; local depthStream = sensor0.DepthStream;
Pretty simple. And what do you get from these streams? Well, in my case, I want to copy the data from the stream into a texture object so that I can display it using a quad, or do other interesting processing. The setup of the textures looks like this:
local colorWidth = 640 local colorHeight = 480 local colorImage = GLTexture.Create(colorWidth, colorHeight) local depthWidth = 320 local depthHeight = 240 local depthImage = GLTexture.Create(depthWidth, depthHeight, GL_LUMINANCE, nil, GL_LUMINANCE, GL_SHORT, 2)
Then, at every clock tick, the display routine does the following:
function displaycolorimage() local success = colorStream:GetCurrentFrame(300) if success then colorImage:CopyPixelData(colorWidth, colorHeight, colorStream.LockedRect.pBits, GL_BGRA) glViewport(windowWidth/2, windowHeight/2, windowWidth/2, windowHeight/2) displayfullquad(colorImage) colorStream:ReleaseCurrentFrame() end end function displaydepthimage() local success = depthStream:GetCurrentFrame(300) if success then depthImage:CopyPixelData(depthWidth, depthHeight, depthStream.LockedRect.pBits, GL_LUMINANCE, GL_SHORT) glViewport(0, windowHeight/2, windowWidth/2, windowHeight/2) displayfullquad(depthImage, windowWidth, windowHeight) depthStream:ReleaseCurrentFrame() end end function display(canvas) glClear(GL_COLOR_BUFFER_BIT) displaycolorimage(); displaydepthimage(); end
That’s about as simple as it gets. To go further, I could have had the stream automatically copy into a texture object, but it’s find the way it is right now. This gets very interesting when you consider the possibly collaboration with OpenCL, and OpenGL shaders. First of all, the memory can be a memory Buffer, or Image2D in OpenCL terms. That can be shared with OpenGL, for zero copying when going between the two environments.
So, imagine if you will, you can now capture data from the Kinect with very little glue code. Next, you can process that data with OpenCL kernels if you feel like it, taking advantage of some GPU processing without having to deal with GLSL. The, in the end, you can utilize the results in OpenGL, for doing whatever it is that you’ll be doing.
That’s quite a few different pieces of technology coming together there. I think LuaJIT and Lua in general makes this task a bit easier. Once the basic interfaces are done up in Lua, stitching things together gets progressively easier.
I imagine I’ll be able to shave off at least a couple hundred lines of code from that Kinect.lua file. There’s some cruft in there that I just did not clean up yet. With this one file, a developer is able to easily incorporate Kinect data into their programming flow. So, now things get really fun.