LuaJIT to Khronos

I’ve been sitting on some code for quite some time.   Hording it as it were, for my own devices.  As my current desktop machine seems to be failing more frequently, I thought it would be a good time to do some spring cleaning and put up some more code.

The current round has to do with things related to the Khronos Group of APIs.  The Khronos Group is one of those industry bodies setup for collaboration across multiple companies.  Probably the most famous of the APIs they’ve dealt with to date is the OpenGL API.  The followon to that was the OpenGL ES API.  Then along came one of their own the OpenVG, which deals with Vector graphics.  Rounding out the set, you have OpenCL for distributed computing, and OpenMAX for Audio/Video stuff.

Since the group was originally founded by companies interested in GPUs, those are the APIs that are the most mature.  Well, recently, playing with the Raspberry Pi, I found that OpenGL ES/EGL and OpenVG, OpenMAX are the only way to get at hardware acceleration on the device.  There are already a couple of examples of OpenVG running on the Pi.  One Example of using OpenVG on the Pi was done by Anthony Starks.  If for no other reason, you’ve got to check out the code because of the author’s name…

Anthony Starks has several examples of how to use OpenVG, and how to bind and use the Go language to do some really nice stuff.  Well, no reason for those Go programmers to have all the fun, so I decided to make a really convenient binding to LuaJIT for OpenVG.  But, why stop there.  Why not all the APIs?  Well, I already had the OpenCL, and OpenGL laying about, so I’ve put them all together into a single repository, the LJIT2Khronos project.

One of the first things that I realized way back, was that these things can only be useful if they can be demonstrated.  Well, one of the first things you need to do to use any of these APIs is establish a connection to the windowing system.  In this case, I had to include some bindings to Windows APIs as well.  There is a Win32 directory, which contains some basic bindings for User32, GDI, Kernel32, Windows types and the like.  More than enough to get a basic window up on the screen, and certainly enough to get a window handle and device context that is required for the various APIs.

What do you get for your troubles?

Let’s say you want to create a window, which has a ‘frame rate’ of 3 frames per second, and a routine you specify will be called, and rendering will occur…

local NativeWindow = require "User32Window"
ocal EGL = require "egl_utils"

local OpenVG = require "OpenVG"
local OpenVGUtils = require "OpenVG_Utils"
local ogm = require "OglMan"
local RenderClass = require"Drawing"

-- Setup the "display" object
local dpy =, EGL.EGL_OPENVG_API);

local screenWidth = 640;
local screenHeight = 480;

-- Create the renderer class which
-- will handle drawing tasks
local Renderer =, screenWidth, screenHeight);

local tick = function(ticker, tickCount)
	print("Tick: ", tickCount);


	Renderer:Background(0, 0, 0);	  -- Black background



-- Create a window
local winParams = {
	ClassName = "EGLWindow",
	Title = "EGL Window",
	Origin = {10,10},
	Extent = {screenWidth, screenHeight},
	FrameRate = 3,

	OnTickDelegate = tick;

-- create an EGL window surface
local win =
assert(win, "Window not created");

local surf = dpy:CreateWindowSurface(win:GetHandle())

-- Make the context current


local ratio = screenWidth/screenHeight;
glFrustum(-ratio, ratio, -1, 1, 1, 10);

-- Now, finally do some drawing

-- free up the display

I’ll be the first to admit, this is still quite a lot to type to do some very basic rendering, but this is way less typing that you’d have to type on your own. I’ll write up a separate article that goes into more depth of how to use this OpenVG stuff, for now, suffice to say, it will work on whatever environment has the OpenVG and EGL libraries available (at least Windows, Raspberry Pi, Linux in general).

But of course, there’s more. This repository also include the OpenGL bindings, and the OpenCL stuff as well. Those bindings are fairly mature, at least I’ve written a couple of apps using them.

So, there you have it. Some fairly complete bindings to these various Khronos Group APIs. Getting them off my machine, and into the interwebs gives me some relief. As well, I expect to make full use of them across the multiple environments in which they are available.

The Power of Leveraged Frameworks that work

I was browsing the web site recently, when I ran across a talk by Mark Kilgard. The talk was at a recent GPU Technology Conference (GTC 2012 – San Jose). The talk was entitled “NVIDIA OpenGL in 2012“.

It was a general overview talk, covering the history of OpenGL, it’s present, and near future.  There are lots of little details related to OpenGL of course, but there were a couple that stood out for me.  At roughly 49:15 into the talk, there’s a slide entitled “What is path rendering?”, whith a bunch of 2D path rendered images on it.  Now this gets really interesting.

Basically, if you’ve been doing 2D graphics for the past few years, you realize that the GPU revolution has largely left you behind.  Yes, you can render milliions of triangles per second, but just try to render some nicely kerned text, say for a web page, and you’re pretty much on your own.  OpenGL has got nothing for you, or rather, what it does have for you will leave you completely dealing with the rasterization and process, for the most part.

What this means is that if you want to render high quality path driven stuff, like what you find in postscript, OpenVG, or HTML text, you’re going to have to do a whole bunch of work.  But wait!  Seeing the sad state of affairs, perhaps feeling guilty for their sins, nVidia has decided to tackle the problem space of path based rendering, using the GPU to accelerate.  What a novel idea!  I think it stems from the fact that their growth market is underpowered mobile devices, which have GPUs.  The more you can offload to the GPU the better as it’s more energy efficient for certain things than a CPU would be.

During the presentation, he talks about the various 2D APIs such as Quartz, OpenVG, Direct2D, Cairo, Skia, Qt::QPainter, Anti-grain… All APIs I’ve touched one way or another over the years.  He goes on about the greatness of these new extensions, which apparently have been in the nVidia drivers for a while.  Then I get to thinking.

I want 2D graphics.  I want it to work across multiple platforms, I want it to be fast and efficient.  At first I thought, maybe I should get Cairo and use that as my cross platform 2D graphics system.  Eventually Cairo will likely utilize this new path rendering stuff, and I’ll eventually benefit.  So, I looked at Cairo, took one look at the build system, and turned green.  Then I had another thought.

OpenGL is already THE cross platform graphics API.  And, since I have Lua, and more specifically LuaJIT with FFI, and I’ve already coded up my opengl interfaces, I can just use that, and it should work across multiple platforms.

So, sounds good.  I went off to the nVidia site to see what I could see with regards to the using this newfangled path rendering stuff.  Right now it’s only in the nVidia driver, so AMD, not so much.  I took one of the whitepapers that has examples on it, and just started coding what was there.  After a few turns of the crank, I was finally able to generate the image seen above.

Here’s one sequence of calls that I used:

local ogl = require "OglMan"

ogl.glMatrixOrthoEXT(GL_PROJECTION, 0, 500, 0, 400, -1, 1);
ogl.glPathStringNV(pathObj, GL_PATH_FORMAT_SVG_NV, #svgPathString, svgPathString);
ogl.glStencilFillPathNV(pathObj, GL_COUNT_UP_NV, 0x1F);

OglMan is my OpenGL Manager. It’s effectively the same thing as using the familiar GLEW (GL Extension Wrangler), but done up in Lua, not as an interop thing.

I was not familiar with any of these calls before I wrote this code. But, just putting ‘ogl.’ at the front of each one of them, I assumed they would just work, and they did! I was actually amazed at how simple it was to code up this example.

This speaks volumes to the ease of use of Lua as a rapid prototyping tools. To do the same in C, would take me a lot more scaffolding, compiling, sweating and praying. In my little HeadsUp harness, I can just code and go, trying things out, with zero “compile”.

At any rate, it’s nice to know that seeing the world through a Lua lense is not a bad thing. I am just as capable as anyone on any other platform. I am asking myself this question now… If I could have a high quality text renderer done using nothing more than the GPU, and whatever text rendering library I write in Lua, could I write a nicely specialized HTML viewer?


Unchaining the GPU with Lua and OpenCL

Quite a few years ago, I programmed the BeBox to display multiple streams of .mpg video, while simultaneously pulling in video feeds from Satellite and cable. In all, you could see snapshots of roughly six things on the screen, happening all at the same time.

The CPUs were utilized primarily for the mpeg part, doing decoding, and some special effects when changing sources being displayed in the primary area. The feeds coming off the Happauge video capture card were being DMAd directly into the framebuffer of the graphics card, so there wasn’t any work by the CPU going on there.

That was a pretty good result for a dual-proc machine circa 1996. That was at the very beginning of the birth of nVidia, and GPUs were actually first becoming mainstream from 3dfx. Roll forward 16 years… and where are we today?

Well, the machine whining away under my desk is a 3.4Ghz AMD Phenom(tm) II X4 965 Processor, with 8Gb of RAM. The graphics card is an nVidia gfx 275. This machine is a couple years old now, but compared to that BBox, it’s a monster from another planet. As such, you would think it would be able to perform the same feats as that old machine, without even heating up a single resistor. To make it even more of a monster, there’s that GPU sitting in there which has 1000 times over the amount of processing power utilized to send people to the moon in the sixties.

So, what can this machine do? Well, It allows me to type really fast!! I can read emails in the blink of an eye, and Netflix movies play with nary a stutter!  I tell you, it’s simply amazing!  But, what about all that horsepower that’s sitting idle under my desk?  Surely I can put it to some good usage.

Well, of course graphics processing can largely be offloaded to the GPU these days.  Although I conjured up a graphics library that lives complely on the CPU, and just draws to memory, doing the same using the GPU is far faster, and takes a lot less electricity.

And finally, I come to the point.  I have gotten far enough along with my OpenCL binding that I can now actually do some OpenCL programming.  OpenCL is an interesting little thing.  Basically, it introduces the concept of ‘kernel’ programming.  And here, Kernel does not mean the OS kernel, but rather the small little bit of code that will run in parallel on the same piece of memory that other little bits of code are running against.  This is in fact what happens when you’re running a GLSL shader.  It’s just a little ‘kernel’, and in the case of a fragment shader, that little kernel runs against all the pixels in a frame, in parallel with hundreds of others doing the same thing.

Using GLSL based fragment shaders is great for graphics programming, but for general computing, it’s kind of clunky as you’d have to cast your compute problem into terms that the graphics pipeline can understand.  Furthermore, in order to use GLSL at all, you have to do things like create a GLContext, which requires a DeviceContext, which requires a Window, or at least a GDIBitmap.  That’s a lot of machiner to just write a bit of code to manipulate some data.

OpenCL changes things a bit.  First of all, you have access to the GPU power without the graphics constructs.  You still have to create a proper context, but it’s far easier without having to worry about windows and bitmaps.  There are some concepts, and a hierarchy for doing things.  You start at the top with platforms.  There may be multiple “platforms” within your machine.  Usually there is only one though.  Within a platform, there are devices.  There may be multiple devices in a platform.  For example, you might have two nVidia cards in your machine, and that will list as two devices.

After the device, there is the concept of a context.  The context can span multiple devices.  The context controls things like where memory is created, where programs are created, where kernels are run, and the like.  This is really where things start to get interesting.

From the context, you can create a “program”.  Here, I think it is easier to think of the program as “image”.  You are essentially placing an “image” onto the context.  I think of the image as the raw OS image, ready to have little bits of code running in it.

Then, finally, you can create a “kernel”, which is actually a piece of code that’s going to execute on the device.

That’s a lot of stuff, and a lot of error checking, and a lot of pointers that can go wrong, etc.  So, the Lua version looks like this:

local platform, num = CLGetPlatform()
local devices = platform:GetDevices(CL_DEVICE_TYPE_GPU)

That is, get the first plaform available. Then, get the list of devices available on the platform. And finally, run a kernel (code below).

Using Lua is nice because garbage collection can be used to release various resources when they’re no longer in use. That saves a bit of typing, and you don’t have to remember anything.

To run a kernel, I looked at a simple example in C, written by Clifford Wolf.

local program_source = [[
    __kernel void simple_demo(__global int *src, __global int *dst, int factor)
        int i = get_global_id(0);
        dst[i] = src[i] * factor;

function runkernel(device)
    local context = CLContext():CreateForDevice(device);

    local program = context:CreateProgramFromSource(program_source);

    local NUM_DATA = 100;
    local buffsize = ffi.sizeof("int")*NUM_DATA;

    local input_buffer = context:CreateBuffer(buffsize, CL_MEM_READ_ONLY);
    local output_buffer = context:CreateBuffer(buffsize, CL_MEM_WRITE_ONLY);

    local factor = 2;
    local lpfactor ="int[1]", factor);

    local kernel = program:CreateKernel("simple_demo");

    kernel:SetIndexedArg(0, input_buffer.Handle, ffi.sizeof("cl_mem"));
    kernel:SetIndexedArg(1, output_buffer.Handle, ffi.sizeof("cl_mem"));
    kernel:SetIndexedArg(2, lpfactor, ffi.sizeof("int"));

    local queue = context:CreateCommandQueue(input_buffer);

    local intsize = ffi.sizeof("int");
    local lpi ="int[1]");
    for i=0, NUM_DATA-1 do
        local offset = intsize*i;
        lpi[0] = i;
        queue:EnqueueWriteBuffer(input_buffer, offset, lpi, intsize);

    local global_work_size ="size_t[1]",NUM_DATA);
    local kernel_completion = queue:EnqueueNDRangeKernel(kernel, global_work_size);


    local lpdata ="int[1]");
    for i=0, NUM_DATA-1 do
        local offset = i*intsize;
        local err = ocl.clEnqueueReadBuffer(queue.Handle, output_buffer.Handle, 
            CL_TRUE, offset, intsize, lpdata, 0, nil, nil);
        CL_CHECK(err, "clEnqueueReadBuffer");

In the first part of runkernel(), I’m using the nice object like interface that the Lua binding provides. In the last part of the function, I’m using the straight OpenCL calls, just to show how that’s done.

There are a couple of things of note here. First, the ‘program_source’ is just a string. This is the same as with GLSLProgram. There are various environments available, including from nVidia, which will help you create these kernel strings. Once you have your string perfected, you can just drop it in for inclusion as your kernel.

Since a kernel is not a function in lua that you can just pass variables to, you have to do some explicit work to pass values in as arguments. kernel:SetIndexedArg() performs this task. This is an ideal candidate for some Lua magic to make it simpler. Unlike the GLSL interface, I can’t query the program to find out the types of the various arguments. But, since I wrote the kernel, I do know their types, so, I write a little table that maps the index to a name, and the data values, and this code could turn into a more familiar:

kernel.src = input_buffer
kernel.dst = output_buffer
kernel.factor = 2

Then I’d be happy as a clam. There is another concept that gets in your face here. That’s the whole queuewrite, queueread business. Basically, all data and kernel deployment happens as commands executed from a queue. That fact does not need to be front and center, and a little bit of wrapping might make it nicer to deal with.

Now that this is in hand, what can be done with it? Well, there’s the obvious graphics stuff, which is where it came from, but there’s a whole lot more. I was just thinking that this might be a great way to perform base64 encoding for example. It’s a largely parallel task. You could write a kernel that turns a 3-character block into the equivalent 4-character code. As this kernel can run in parallel, you could literally have hundreds of them working on encoding your text at the same time. At the end, you’ve got a base64 encoded thing, in a fraction of the time it would normally take.

Using a slightly different approach, that of stream processing, you could probably perform some cryptographic operations, like digest calculations and the like.

There is one tool that I found that makes exploring OpenCL fairly easy and fun. OpenCL Studio is done by Geist Software Labs, who appear to be a consultancy for high performance computing. They have a nice Lua scriptable environment that allows you to play with OpenCL and OpenGL, just like that.

Having such a tool available is an accelerant for me to get even more productivity wrung out of myself, and my machine.

With my little Lua Binding to OpenCL, I am confident that I’m going to be able to get more per killowatt out of my programming.  That’s good for my programs, and good for the environment.  I’m hoping that between a fast quad-proc, super duper graphics card, and Lua, I’ll finally be able to write and utilize programs that are more impressive that what I could do 15 years ago.

Kinect Edge Detect

And, here’s the live edge detection version. Again, pretty straight forward. Same setup as before, but just changed the fragment shader to be an edge detector instead of that cross hatch thing. Really, you can do anything GLSL will support.

When it’s supported by OpenCL kernels, then it gets even more interesting because you’re not quite as constrained by the language constructs of GLSL. As I just explained to my daughter.

I must say, at this point, it really has nothing to do with the Kinect, other than to the extent that I’m using the Kinect as a cheap source of video input. Actually, it’s not a particularly cheap source for video input. The Kinect for PC cost $250, and requires a dedicated USB 2.0 bus, and is very finicky when it comes to performance.

For the same $250, you could purchase a couple of HD quality webcams, and enough hardware to put them on a piece of aluminum at a sufficient distance from each other to do some stereo correspondence work. That might be nice for some close up situations, which the Kinect for PC is supposed to deal with. The Kinect probably works better for that larger distances away from the screen, but does it…

Kinect Video with CrossHatch Shader

Now that I have the Kinect behaving reliably, and the GLSL Shader thing going on fairly reasonably, it’s time to combine the two.

In the video, I am using the Kinect to capture a movie playing on the screen of my monitor. Any live video source will do really, but I might as well use the Kinect as a WebCam for this purpose.

The setup for the Kinect is same as before. Just get the sensor, and for every frame, grab the color image, and display it using a quad to the full window size.

Second, I just bring in the shader, just like before. The shader doesn’t know it’s modifying a live video feed. All it knows is that it’s sampling from a texture object, and doing its thing. Similarly, the Kinect code doesn’t know anything about the shader. It’s just doing its best to supply frames of color images, and leaving it at that.

Pulling it together just means instantiating the shader program, and letting nature run its course. The implications are fairly dramatic though. Through good composition, you can easily peform simple tricks like this. The next step would be to do some interesting processing on the frames as they come from the camera.

This brings up an interesting point in my mind. The Kinect is a highly specialized piece of optical equipment, promoted, and made cheap, because of Microsoft and the XBox. But, is it the best and only way to go? The quality of the 3D imaging is kind of limited by the resolution and technique of the IR projector/camera. Through some powerful software, you can get good stereo correspondence to match up the depth pixels with the color pixels.

But, why not go with just plain old ordinary WebCams and stereo correspondence? Really, it’s just a challenging math problem. Well, using OpenCL, and the power of the GPU, doing stereo correspondence calculations shouldn’t be much of a problem should it? Then, the visualization and 3D model generation could happen with a reasonable stereo camera rig. WebCam drivers are a well trod ground, so cross platform would be instantaneous. Are there limitations to this approach? Probably, but not any that are based on the physical nature of the sensors, but rather they are based on the ability to do some good quality number crunching in realtime, which is what the GPU excels at.

Kinecting with Lua, Phase II

I did the initial interop FFI work for the Kinect a couple weeks back. That initial effort was sufficient to allow me to view an upside down image on my screen. The video was not very reliable, and quite a few frames were dropped, and eventually the thing would just kind of hang. Same as on the XBox actually. Mostly it works, but occasionally, it just kind of loses its mind.

Well, I’ve updated the code a bit. But, one of the most noticeable changes in my rig is that I’ve purchased the Kinect for PC. It does in fact make a difference. I don’t know if it’s primarily the slightly shorter USB cable, or because I plugged into a better USB port on my machine, or that MS has actually done something to improve it, but it is better.

In this picture, I’m capturing the color video stream at 640×480, which simultaneously capturing the depth at 320×240. It’s good enough that I can display both in realtime (~30 fps). It does get a bit jerky when there is a lot of movement in the scene, but it will run on and on and on.

I’ve done a couple of things to the code itself. First, I put evrything into a single file (Kinect.lua). It’s a bit beefy at about 1070 lines, but compared to various other files, it’s fairly tame.

Second, I abstracted the video and depth streams, so there is a KinectImageStream object. Getting that from the sensor looks like this now:

local sensor0 = Kinect.GetSensorByIndex(0, flags);

local colorStream = sensor0.ColorStream;
local depthStream = sensor0.DepthStream;

Pretty simple. And what do you get from these streams? Well, in my case, I want to copy the data from the stream into a texture object so that I can display it using a quad, or do other interesting processing. The setup of the textures looks like this:

local colorWidth = 640
local colorHeight = 480
local colorImage = GLTexture.Create(colorWidth, colorHeight)

local depthWidth = 320
local depthHeight = 240
local depthImage = GLTexture.Create(depthWidth, depthHeight, GL_LUMINANCE, nil, 

Then, at every clock tick, the display routine does the following:

function displaycolorimage()
    local success = colorStream:GetCurrentFrame(300)

    if success then
        colorImage:CopyPixelData(colorWidth, colorHeight, 
            colorStream.LockedRect.pBits, GL_BGRA)

        glViewport(windowWidth/2, windowHeight/2, windowWidth/2, windowHeight/2)

function displaydepthimage()
    local success = depthStream:GetCurrentFrame(300)

    if success then
        depthImage:CopyPixelData(depthWidth, depthHeight, 
            depthStream.LockedRect.pBits, GL_LUMINANCE, GL_SHORT)

        glViewport(0, windowHeight/2, windowWidth/2, windowHeight/2)
        displayfullquad(depthImage, windowWidth, windowHeight)

function display(canvas)


That’s about as simple as it gets. To go further, I could have had the stream automatically copy into a texture object, but it’s find the way it is right now. This gets very interesting when you consider the possibly collaboration with OpenCL, and OpenGL shaders. First of all, the memory can be a memory Buffer, or Image2D in OpenCL terms. That can be shared with OpenGL, for zero copying when going between the two environments.

So, imagine if you will, you can now capture data from the Kinect with very little glue code. Next, you can process that data with OpenCL kernels if you feel like it, taking advantage of some GPU processing without having to deal with GLSL. The, in the end, you can utilize the results in OpenGL, for doing whatever it is that you’ll be doing.

That’s quite a few different pieces of technology coming together there. I think LuaJIT and Lua in general makes this task a bit easier. Once the basic interfaces are done up in Lua, stitching things together gets progressively easier.

I imagine I’ll be able to shave off at least a couple hundred lines of code from that Kinect.lua file. There’s some cruft in there that I just did not clean up yet. With this one file, a developer is able to easily incorporate Kinect data into their programming flow. So, now things get really fun.

Dynamic Setting of Uniform Shader Values

I’m all about the lazy, and avoiding typing when I program. If I had my way, I would just think my program into existance. But alas, I still have to type.

But now, I can type a lot less error prone code when it comes to dealing with OpenGL, and in particular with shaders. What’s a ‘shader’? You know, those pesky little programs that you have to write to get anything done on the GPU these days.

Here’s the entirety of the code for the Mandelbrot zooming thing, at least the part that runs on the GPU. It’s a ‘fragment shader’, and pretty rudimentary at that:

local fragtext = [[
uniform sampler1D tex;
uniform vec2 center;
uniform float scale;
uniform int iter;

void main() {
    vec2 z, c;

    c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x;
    c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;

    int i;
    z = c;
    for(i=0; i 4.0) break;
        z.x = x;
        z.y = y;

    gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);

Hay! Wait a minute. Isn’t that “C” code? Well, not quite. Notice a couple of things. First of all, this is the shader code encapsulated in a Lua string (the [[]]). When you look at the code itself, you see those words “uniform”. Those are variables that can be set from the outside world. In the case of the zooming mandelbrot explorer from the video, I’m changing things like the ‘iter'(ations), and the scale, and the center.

Normally, in order to change these values, you have to do things like this:

gpuprog:set_uniform1i("iter", iter);
gpuprog:set_uniform2f("center", cx, cy);
gpuprog:set_uniform1f("scale", scale);

My fingers start to cramp even from having to just copy and paste that code. Not only that, but it requires the programmer know a lot more about the intricacies of the GLSL/OpenGL API, which means they spend less time actually coding, and more time doing glue work.

What I really want to do as a programmer is this:

gpuprog.iter = iter; = float2(cx, cy)
gpuprog.scale = scale;

And while I’m at it, creating that GPU program is quite a pain in the first place. Roughly speaking, the steps are like this:

  • Create Program
  • Create Shader
  • Compile Shader
  • Attach Shader to Program
  • Link Program
  • Use Program

That’s more than a mouthful, and there’s tons of error checking in esoteric sorts of ways. But really, as a programmer, who’s not particularly focused on learning the details of OpenGL APIs, but just wants to use the GPU for some kick butt graphics, I simply want to the do the following:

gpuprog = GLSLProgram(fragtext)

Once again, saving my arthritic hands from a lot of typing, and sparing the few brains cells I have left from having to remember a bunch of APIs that I will use infrequently.

There’s really something to this. Because of the magic of Lua, I can implement the GLSLProgram to do lookups whenever it doesn’t recognize a field access. Basically, the lookup does the following:

function glsl_get(self, key)
    -- try the class table, as it might be a
    -- function for the class
    field = rawget(GPUProgram,key)
    if field ~= nil then
        print("returning glsl field: ", field)
        return field

    -- Last, do whatever magic to return a value
    -- or nil
    local value, ncomps =  GetUniformValue(self, key)

    if ncomps == 1 then
        return value[0];

    return value

Setting a field is even easier:

function glsl_set(self, key, value)
    -- try to set the value
    -- in the shader
    SetUniformValue(self, key, value)

The functions GetUniformValue, and SetUniformValue is where all the action is at. They figure out what type of field it is we’re accessing (there’s a GLSL API for that), and the construct the appropriate type of array, stuff it with the value, and call the appropriate API to get the values in and out. The, return to the user, and a nice sort of way. You can actually round trip values between get/set, so everything works out great. All the error checking can be put in one place, rather than spread all over your code.

And the way it ties together is like this:

GPUProgram = {}
GPUProgram_mt = {}

function, vertext)
    local self = {}

    self.ID = ogm.glCreateProgram();

    if fragtext ~= nil then
        self.FragmentShader = GPUShader(GL_FRAGMENT_SHADER, fragtext);
        GPUProgram.AttachShader(self, self.FragmentShader);

    if vertext ~= nil then
        self.VertexShader = GPUShader(GL_VERTEX_SHADER, vertext);
        GPUProgram.AttachShader(self, self.VertexShader);


    setmetatable(self, GPUProgram_mt)

    return self

GPUProgram_mt.__index = glsl_get
GPUProgram_mt.__newindex = glsl_set

function GLSLProgram(fragtext, verttext)
    local prog =, vertext)

    return prog

Basically, the GLSLProgram function will create an ‘instance’ of a GPUProgram table. That table will have a metatable, which has the __index and __newindex functions on it. These functions will be called whenever a value is not found in the instance of the GPUProgram, or when there is a desire to set a value.

It’s a good thing I was able to come up with this little string of code. Otherwise, I fear that I would never be able to make any progress using the GPU. The same little tricks will need to be done to set vertex attributes as well, which is a whole other beast to deal with.

But, there you have it. I don’t know about using other dynamic programming languages to deal with GPU shading, but I find this to be the cat’s meow as far as I’m concerned. Having this particular mechanism in place makes GPU shader programming feel as natural as my regular Lua programming. There is that small step of writing that bit of shader code in what looks like “C”, but maybe even that isn’t a big deal. I don’t see why shader code can’t be written in Lua, and just translated into the appropriate backend language. If I could do that, then I could stay safely tucked away in my “Lua Only” world. But I digress…

HeadsUp Live Mandelbrot Zoom

To kickoff the usage of shaders, I figured I go back to the Mandelbrot example. In this particular case, showing a static image isn’t that exciting, so I figured I’d produce a little movie clip instead. So, what you see here is some typical OpenGL code written, that uses a fragment shader to do the actual work. I borrowed the fragment shader from here.

This is really getting interesting.  The fragment shader itself had zero changes, because it’s GLSL code, so nothing to change.  The code wrapping changed in the usual sorts of way.  Here is how the shader program is setup:

function setup_shader(fname)
    local fp =, "r");

    local src_buf = fp:read("*all");
    local src_array ="char*[1]", ffi.cast("char *",src_buf));

    local sdr = ogm.glCreateShader(GL_FRAGMENT_SHADER_ARB);	
    ogm.glShaderSource(sdr, 1, src_array, nil);

    local prog = ogm.glCreateProgram();
    ogm.glAttachShader(prog, sdr);
    return prog;

There’s only one slightly tricky line in here. A shader is a program, and that program gets compiled on the GPU by the vendor’s OpenGL GLSL compiler. So, you’ve got to get the text of that program over to the GPU. The API for doing that is:

void glShaderSource (GLuint shader, GLsizei count, GLchar* *string, const GLint *length);

It’s the “GLchar **string” that’s the only challenge here. Basically, the function expects an array of pointers to strings. So, using the LuaJIT ffi, this turns out to be achievable with the following:

local src_array ="char*[1]", ffi.cast("char *",src_buf));

It maybe looks like a bit of a magical incantation, but once it’s done, you’re good to go. From then on out, it’s standard stuff. Notice the usage of ‘ogm’. That’s the alias for the OglMan table, which is used to pull in all the extensions you could care to use. It really was brain dead easy to do this. Whenever the LuaJIT compiler complained about not being able to find something, I just put “ogm.” in front of it, until all complaints were solved, and the program finally ran.

And the result in this case is a nice fly through of a mandelbrot set. Julia sets can be added just as easily by changing the .glsl file that I’m loading into the fragment shader.

This bodes well. It will be a small matter to wrap this stuff up in a couple of convenience objects so that I won’t have to make all those GLSL Calls explicitly.

One of the hardest parts to deal with typically is the setting of ‘uniform’ variables. This is the way in which you communicate values from the outside world into the shader code. I’m thinking Lua will help me do that in such a way that’s fairly natural, and doesn’t take a lot of code. Maybe I can use the same trick I did with OglMan (implement __index and __newindex). If I could do that, then it would look the same as setting/getting properties on an object to interact with your GLSL shader. And that would be a fine thing indeed as then the code would just slip right into the rest of the Lua code, without looking dramatically different. Never mind that the underlying code is actually running on the GPU.

At any rate, there you go. Live zooming on a Mandelbrot set, utilizing the GPU for acceleration, all written in Lua (except for the shader code of course). I wonder if the shader code could be written in Lua as well, and then just converted…

HeadsUp NeHe Tutorials

Since time immemorial, I have learned from the NeHe OpenGL Tutorials.  These tutorials have been great, particularly up through the 2.1 version of OpenGL.  With the advent of more and more shader programming, WebGL, and other advancements, the old tutorials are now listed as “Legacy”.

These tutorials are still useful for a couple of reasons.  Not everyone programs with shaders as yet, and they are a great way to flush out all the challenges with a new OpenGL interface, such as what is in HeadsUp.  I have implemented tutorials 2-8, just for kicks.  The picture here shows lesson8.lua, which is about blending, some lighting, and using texture objects.  I’ve implemented an extremely simple brain dead Targa image viewer, just to get some images into texture objects.  The rest is pure OpenGL.

The way I’ve done it is to grab the GLUT or C++ version of the code if it exists, and then just do some massaging of the code until it compiles.  It’s typically a fairly simple straight forward process.  I’ve even added some of the most common glxxx functions to the global namespace for convenience.   One example is dealing with color.  Of course, you can be very explicit:


And if you want to do that without the ‘gl.’ prefix, you can simply do:


That allows your code to look exactly like the typical ‘C’ version of the code. But wait a minute, this is Lua, so convenience is the name of the game. We can do some overloading and get an even better effect:

function glColor(...)
local arg={...};
    if #arg == 3 then
        gl.glColor3d(arg[1], arg[2], arg[3]);
    elseif #arg == 4 then
        gl.glColor4d(arg[1], arg[2], arg[3], arg[4]);
    elseif #arg == 2 then
        gl.glColor4d(arg[1], arg[1], arg[1], arg[2]);
    elseif #arg == 1 then
        if type(arg[1] == "number") then
            gl.glColor3d(arg[1], arg[1], arg[1]);
        elseif type(arg[1]) == "table" then
            if #arg[1] == 3 then
                gl.glColor3d(arg[1], arg[2], arg[3]);
            elseif #arg[1] == 4 then
                gl.glColor4d(arg[1], arg[2], arg[3], arg[4]);

With this function, yoiu can use several calling conventions:

glColor(0.63)        -- Set a grayscale value
glColor(0.63, 0.5)   -- Set a grayscale value with alpha
glColor(0.25, 0.30, 0.30, 1)  -- Set a full color value
glColor({0.24, 0.30,0.30,1})  -- Set a full color, with alpha using a table
glColor({0.24, 0.30, 0.30})   -- Set a full color using a table

The only one that is missing is:

glColor(vec3(0.24, 0.3,0.3))  -- Set color using a vec (float[3])

If you are familiar with using the processing environment, this flexibility in setting color values might seem more familiar. There is something nasty about the difference between counting from ‘0’ as is typical in C, and counting from ‘1’ which is standard for Lua tables. By using this sort of construct, you can get it both ways. If you want the typical C version, including passing array structures, then use the standard C looking functions. If you want to pass your Lua based tables around, then use the more generic versions of the function, and pass tables around.

The same is true for Vertex objects.

I find this to be a useful construct. Although the flexibility can be a bit much when you try to think about the many ways you can do something, really it just feels natural because you just do whatever feels natural to your programming style, and it will probably work. You can stick with copy/paste of code you find from elsewhere, or you can taylor it to the Lua environment as suits your needs.

Now, on to those shaders!

HeadsUp OpenGL Extension Wrangling

I have dealt with OpenGL extensions in a previous library I did in C#.  I can tell you, it’s rather a pain.  First of all, there are so many ‘extensions’, it can make your head spin just thinking about it.  Second, with static languages, you have to create all these various wrapper type things to get things to work correctly.  Create a delegate thing, a declaration thing, the glue code to tie the delegate to the declaration…

So, I figured with LuaJIT and LuaJIT ffi in particular, I might be able to make an easier time of it.  There is one unavoidable part though.  You have to have the prototypes of your functions somewhere.  I’ll start with the simplest one:

// WGL_ARB_extensions_string 1  
const char *  wglGetExtensionsStringARB (HDC hdc); 

The function: wglGetExtensionsStringARB(), if called, you can get the list of wglExtensions that your currenct gl driver supports. This isn’t a normal function in a library. My first inclination might be to simply do:

wglGetExtensionsStringARB (HDC hdc); 

gl = ffi.load("opengl32")
local extensions = gl.wglGetExtensionsStringARB(hdc);

But, you can’t do that. This function isn’t necessarily located within the opengl32.dll library at all. In order to find out where it actually is, you have to call another function: wglGetProcAddress(), which is actually in the library. So, in order to string this together, you have to do the following:

gl = ffi.load("opengl32")
local funcptr = gl.wglGetProcAddress("wglGetExtensionsStringARB")
local castfunc = ffi.cast("PFNWGLGETEXTENSIONSSTRINGARBPROC", funcptr);

local extensions = castfunc(hdc);

That’s a handful, but it’s not too bad. First, get the address of the extension function you’re looking for (wglGetProcAddress). Then, cast it to a function prototype so that when you try to call it, LuaJIT knows about the parameter types and can do the marshaling for your automatically. Then, call the function.

But, I want this to be as easy as possible, and being the error prone programmer that I am, I want it to be automated as well, because I’m not good at typing a whole bunch of repetitive stuff correctly.

OK, so how to do this? First of all, I downloaded the wglext.h and glext.h files from the Khronos site:
You can get the .spec files, and start parsing from there, or you can download the already made .h files. You can also get these from various vendors, or the GLEW library. I just started from the Khronos ones.

I performed some hand massaging on the .h files, to come up with things like all the constants pulled out from the #defines, and generally made the thing look like a lua file: wglext.lua. Within this files, you see all the function prototypes, wrapped up in a ffi.cdef[[]] thing, as seen above.

A thing to note about the function prototypes. For each function, there is a prototype, and a typedef. I actually only use the typedef, but the prototype is there as well for completeness. By a stroke of luck, or more likely design, the typedefs are created in a consistent way, that is an easy modification of the function name. So, in the case of wglGetExtensionsStringARB, the typedef name, which is the part I’m interested in, looks like:


If I were to represent this transformation as a simple function, it would be:

function GetFunctionProtoName(fname)
    return string.format("PFN%sPROC", fname:upper());

That’s good. So, now, when I wanto to go from the name of a function, to the name of the typedef that represents that function, I can simply do this:


That’s grand. Now, tying this piece to the lookup piece, I might have two more functions:

function GetWglFunctionPointer(fname, funcptr)
    local protoname = GetFunctionProtoName(fname);
    local castfunc = ffi.cast(protoname, funcptr);

    return castfunc;

function GetWglFunction(fname)
    local funcptr = opengl32.wglGetProcAddress(fname);
    if funcptr == nil then
        return nil

    local castfunc = GetWglFunctionPointer(fname, funcptr);

    return castfunc;

And how to use this?

wglGetExtensionsStringARB = GetWglFunction ("wglGetExtensionsStringARB");

local exts = wglGetExtensionsStringARB(hdc);

Isn’t that spiffy? I don’t think it gets much easier than that. So, for the extensions you care about, just repeat the line that has “GetWglFunction”, and you’re done…

But wait, that’s still a lot of error prone copy/paste typing isn’t it? Can’t Lua enable my laziness even more? Well, sure it can. How about we create a simple interface to deal with all this nonsense for us?

OglMan_mt = {
    __index = function(tbl, key)
        local funcptr = GetWglFunction(key)
        rawset(tbl, key, funcptr)
        return funcptr;

setmetatable(OglMan, OglMan_mt)

Ah… now I can do the following:

local getextensions = OglMan.wglGetExtensionsStringARB
if getextensions  ~= nil then
  local exts = getextensions(hdc);

Or, if I’m feeling particularly daring, I can simply do this:


Pick your level of error checking.

Why does this work? Well, the OglMan table has a meta table (OglMan_mt). That metatable defines a function ‘__index’. Through the magic of Lua, this function is called whenever you try to lookup something in the table, and it doesn’t already exist. So, when I do this:
OglMan.wglGetExtensionsStringARB, my __index function is called, and the runtime hands me the name of the thing that was being looked up. In normal circumstances, a nil value would be returned, but since I’ve already created those functions that can go from a string to a cast function pointer, I can use that first. If it fails, then I can simply return nil as usual. If it succeeds, I can return a function “pointer” that’s already cast in the appropriate way, ready to be used.

I think that’s pretty spiff.

In conclusion, after doing a bit of grunt work on those header files, it’s less than 100 lines of code to make all OpenGL extensions fully available to the Lua programmer. Of course, this works because of the ease of LuaJIT, and the __index trick of Lua in general. But, I’ve very pleased with this outcome. I don’t have to take a dependency on GLEW or any other extensions wrangler. I just need to do the initial .h file wrangling, and then go on about my business as usual.

As an added bonus, it turns out that sometimes it’s better to use this trick on functions that are actually in the OpenGL32.dll as well. The ones that are in the .dll might have bugs, that Microsoft doesn’t bother to fix. The ones that can be found using the lookup come from the vendor of the graphics card, and they have more of a vested interest in ensuring they work correctly. Just saying.