Value Per Typed Line (VPTL)Posted: April 20, 2012
These days, as the pace of technology dispersement increases, I’m very conscious about how much value is attained per each line of code that I type. Recent examples come from dealing with OpenGL, and various Windows interfaces.
OpenGL, including various extensions, is a whopping 15000 lines of code. And that’s just various declarations, constants, and the like. Of those 15,000 lines, most of it was either generic search/replace, or auto genned from parsing the .spec files. Overally, I’d say it’s probably a week’s worth of work to go from scratch, to having the fullness of OpenGL available in something like a Lua environment. This of course is a one time cost, for the most part.
The great benefit is that once you have the core of OpenGL available, it’s actually cross platform (for the most part). There are differences between the various extensions, and the vendor specific host environment. If you stick to the core profile, and isolate the platform specific stuff, it’s not much work to take the exact same Lua/OpenGL code and have it run on any platform that supports the two.
Now, how about some Windows specific stuff? I did the Kinect interfaces, based on the Microsoft supplied Kinect SDK 1.0. All told, it’s not more than a few hundred lines of code, including some convenience classes to make things simpler. What do I get in the end? Well, since it’s based on the Microsoft Windows Kinect SDK, I only get the Kinect on Windows. On top of that, it’s based on some COM interface paradigms, so there’s some funkiness dealing with getting interfaces, and representing them in a reasonable way. In the end, I have to know a lot more about Windows, and events than I do about dealing with the Kinect itself. In addition, this only works on Windows. I can’t use the same code on Linux or Mac.
What’s the value here? Well, it’s good because I can use the Kinect, it’s bad because I’ve spent a lot of time locking into a single platform, for no particular value add. It would be a better use of time to code wrappers for libfreenect or some such, and be cross platform. In addition, even though I might lose access to the skeleton library, I gain the freedom of a simple C interface, without having to deal with COM nonsense.
Now, I’m contemplating what’s the best parallel processing interface to use. There are contenders such as CUDA, OpenCL, MPI/OpenMP, and DirectCompute. DirectCompute is eliminated almost right off the bat. Why? Because it tries to provide value by sitting atop CUDA and OpenCL, and it doesn’t bring that much to the table. So, not only would I have to implement interop to OpenCL, for example, but I’d once again have to delve deep into the COM voodoo, for very little added value. The return on investment seems fairly low.
CUDA seems fairly nice, at least if you read the slick literature from nVidia. The problem with it from my perspective is that although they provide tons of tools and whatnot, it all comes from one vendor. I doubt nVidia will disappear any time soon, but still, that single vendor lock in has shown to not be the path to success over an extended period of time. That leaves OpenCL and OpenMP. OpenMP and MPI have been around for quite some time, but they require your compiler to have support built in. That may or may not be the case depending on what compiler I choose to use. Since I’m using Lua, it’s really a non-starter. There are libraries to use, but… it seems like OpenCL is the clear winner.
There are a couple of benefits to OpenCL right off the bat. First of all, it’s a straight forward C API, just like the OpenGL stuff. Those interfaces are a natural fit for Lua, and LuaJIT makes interfacing to them a snap. Second, OpenCL has a stream processing like interface. You create a bit of code, you bind variables to it, then you send that down the processing queue to be executed by whatever so happens to be executing things. This fits in nicely with a broader queue and command architecture which might have nodes spread across the network, just as easily as cores within a GPU. The interface is fairly small, for the core stuff. Not a ton of structures and functions to worry about just to get the fundamentals.
One of the dramatically good things about OpenCL, is that it integrates nicely with OpenGL. That is, I can allocate something like a PixelBuffer, and use that as the storage for OpenCL data. That’s great because once I perform some operation on the data (image blur, or whatever), I don’t have to copy it to the graphics card, because it’s already there, so zero CPU cycles, which is great for other things to occur.
Going with OpenCL lets me reuse other things as well. I’ve already created vec[2,3,4] data structures, and similar with matrix constructs. As these are core data types for both OpenGL and OpenCL, I get great reuse, without having to introduce a lot more stuff. The final kicker is, OpenCL is available on multiple platforms. As the kernels are intended to be compiled by the host environment itself, compatibility is built into the Khronos OpenCL spec itself. It’s not all fun and roses, and portable code always has platform specific gotchas, but, going this route, the code will be at least as portable as the equivalent C code, if not moreso, and I don’t have to worry about a separate build environment to deal with it.
So, since I’m going for VPTL, and since I’m a lazy programmer, I’m going with OpenCL.