LuaJIT to Khronos

I’ve been sitting on some code for quite some time.   Hording it as it were, for my own devices.  As my current desktop machine seems to be failing more frequently, I thought it would be a good time to do some spring cleaning and put up some more code.

The current round has to do with things related to the Khronos Group of APIs.  The Khronos Group is one of those industry bodies setup for collaboration across multiple companies.  Probably the most famous of the APIs they’ve dealt with to date is the OpenGL API.  The followon to that was the OpenGL ES API.  Then along came one of their own the OpenVG, which deals with Vector graphics.  Rounding out the set, you have OpenCL for distributed computing, and OpenMAX for Audio/Video stuff.

Since the group was originally founded by companies interested in GPUs, those are the APIs that are the most mature.  Well, recently, playing with the Raspberry Pi, I found that OpenGL ES/EGL and OpenVG, OpenMAX are the only way to get at hardware acceleration on the device.  There are already a couple of examples of OpenVG running on the Pi.  One Example of using OpenVG on the Pi was done by Anthony Starks.  If for no other reason, you’ve got to check out the code because of the author’s name…

Anthony Starks has several examples of how to use OpenVG, and how to bind and use the Go language to do some really nice stuff.  Well, no reason for those Go programmers to have all the fun, so I decided to make a really convenient binding to LuaJIT for OpenVG.  But, why stop there.  Why not all the APIs?  Well, I already had the OpenCL, and OpenGL laying about, so I’ve put them all together into a single repository, the LJIT2Khronos project.

One of the first things that I realized way back, was that these things can only be useful if they can be demonstrated.  Well, one of the first things you need to do to use any of these APIs is establish a connection to the windowing system.  In this case, I had to include some bindings to Windows APIs as well.  There is a Win32 directory, which contains some basic bindings for User32, GDI, Kernel32, Windows types and the like.  More than enough to get a basic window up on the screen, and certainly enough to get a window handle and device context that is required for the various APIs.

What do you get for your troubles?

Let’s say you want to create a window, which has a ‘frame rate’ of 3 frames per second, and a routine you specify will be called, and rendering will occur…

local NativeWindow = require "User32Window"
ocal EGL = require "egl_utils"

local OpenVG = require "OpenVG"
local OpenVGUtils = require "OpenVG_Utils"
local ogm = require "OglMan"
local RenderClass = require"Drawing"

-- Setup the "display" object
local dpy =, EGL.EGL_OPENVG_API);

local screenWidth = 640;
local screenHeight = 480;

-- Create the renderer class which
-- will handle drawing tasks
local Renderer =, screenWidth, screenHeight);

local tick = function(ticker, tickCount)
	print("Tick: ", tickCount);


	Renderer:Background(0, 0, 0);	  -- Black background



-- Create a window
local winParams = {
	ClassName = "EGLWindow",
	Title = "EGL Window",
	Origin = {10,10},
	Extent = {screenWidth, screenHeight},
	FrameRate = 3,

	OnTickDelegate = tick;

-- create an EGL window surface
local win =
assert(win, "Window not created");

local surf = dpy:CreateWindowSurface(win:GetHandle())

-- Make the context current


local ratio = screenWidth/screenHeight;
glFrustum(-ratio, ratio, -1, 1, 1, 10);

-- Now, finally do some drawing

-- free up the display

I’ll be the first to admit, this is still quite a lot to type to do some very basic rendering, but this is way less typing that you’d have to type on your own. I’ll write up a separate article that goes into more depth of how to use this OpenVG stuff, for now, suffice to say, it will work on whatever environment has the OpenVG and EGL libraries available (at least Windows, Raspberry Pi, Linux in general).

But of course, there’s more. This repository also include the OpenGL bindings, and the OpenCL stuff as well. Those bindings are fairly mature, at least I’ve written a couple of apps using them.

So, there you have it. Some fairly complete bindings to these various Khronos Group APIs. Getting them off my machine, and into the interwebs gives me some relief. As well, I expect to make full use of them across the multiple environments in which they are available.

The Power of Leveraged Frameworks that work

I was browsing the web site recently, when I ran across a talk by Mark Kilgard. The talk was at a recent GPU Technology Conference (GTC 2012 – San Jose). The talk was entitled “NVIDIA OpenGL in 2012“.

It was a general overview talk, covering the history of OpenGL, it’s present, and near future.  There are lots of little details related to OpenGL of course, but there were a couple that stood out for me.  At roughly 49:15 into the talk, there’s a slide entitled “What is path rendering?”, whith a bunch of 2D path rendered images on it.  Now this gets really interesting.

Basically, if you’ve been doing 2D graphics for the past few years, you realize that the GPU revolution has largely left you behind.  Yes, you can render milliions of triangles per second, but just try to render some nicely kerned text, say for a web page, and you’re pretty much on your own.  OpenGL has got nothing for you, or rather, what it does have for you will leave you completely dealing with the rasterization and process, for the most part.

What this means is that if you want to render high quality path driven stuff, like what you find in postscript, OpenVG, or HTML text, you’re going to have to do a whole bunch of work.  But wait!  Seeing the sad state of affairs, perhaps feeling guilty for their sins, nVidia has decided to tackle the problem space of path based rendering, using the GPU to accelerate.  What a novel idea!  I think it stems from the fact that their growth market is underpowered mobile devices, which have GPUs.  The more you can offload to the GPU the better as it’s more energy efficient for certain things than a CPU would be.

During the presentation, he talks about the various 2D APIs such as Quartz, OpenVG, Direct2D, Cairo, Skia, Qt::QPainter, Anti-grain… All APIs I’ve touched one way or another over the years.  He goes on about the greatness of these new extensions, which apparently have been in the nVidia drivers for a while.  Then I get to thinking.

I want 2D graphics.  I want it to work across multiple platforms, I want it to be fast and efficient.  At first I thought, maybe I should get Cairo and use that as my cross platform 2D graphics system.  Eventually Cairo will likely utilize this new path rendering stuff, and I’ll eventually benefit.  So, I looked at Cairo, took one look at the build system, and turned green.  Then I had another thought.

OpenGL is already THE cross platform graphics API.  And, since I have Lua, and more specifically LuaJIT with FFI, and I’ve already coded up my opengl interfaces, I can just use that, and it should work across multiple platforms.

So, sounds good.  I went off to the nVidia site to see what I could see with regards to the using this newfangled path rendering stuff.  Right now it’s only in the nVidia driver, so AMD, not so much.  I took one of the whitepapers that has examples on it, and just started coding what was there.  After a few turns of the crank, I was finally able to generate the image seen above.

Here’s one sequence of calls that I used:

local ogl = require "OglMan"

ogl.glMatrixOrthoEXT(GL_PROJECTION, 0, 500, 0, 400, -1, 1);
ogl.glPathStringNV(pathObj, GL_PATH_FORMAT_SVG_NV, #svgPathString, svgPathString);
ogl.glStencilFillPathNV(pathObj, GL_COUNT_UP_NV, 0x1F);

OglMan is my OpenGL Manager. It’s effectively the same thing as using the familiar GLEW (GL Extension Wrangler), but done up in Lua, not as an interop thing.

I was not familiar with any of these calls before I wrote this code. But, just putting ‘ogl.’ at the front of each one of them, I assumed they would just work, and they did! I was actually amazed at how simple it was to code up this example.

This speaks volumes to the ease of use of Lua as a rapid prototyping tools. To do the same in C, would take me a lot more scaffolding, compiling, sweating and praying. In my little HeadsUp harness, I can just code and go, trying things out, with zero “compile”.

At any rate, it’s nice to know that seeing the world through a Lua lense is not a bad thing. I am just as capable as anyone on any other platform. I am asking myself this question now… If I could have a high quality text renderer done using nothing more than the GPU, and whatever text rendering library I write in Lua, could I write a nicely specialized HTML viewer?


Unchaining the GPU with Lua and OpenCL

Quite a few years ago, I programmed the BeBox to display multiple streams of .mpg video, while simultaneously pulling in video feeds from Satellite and cable. In all, you could see snapshots of roughly six things on the screen, happening all at the same time.

The CPUs were utilized primarily for the mpeg part, doing decoding, and some special effects when changing sources being displayed in the primary area. The feeds coming off the Happauge video capture card were being DMAd directly into the framebuffer of the graphics card, so there wasn’t any work by the CPU going on there.

That was a pretty good result for a dual-proc machine circa 1996. That was at the very beginning of the birth of nVidia, and GPUs were actually first becoming mainstream from 3dfx. Roll forward 16 years… and where are we today?

Well, the machine whining away under my desk is a 3.4Ghz AMD Phenom(tm) II X4 965 Processor, with 8Gb of RAM. The graphics card is an nVidia gfx 275. This machine is a couple years old now, but compared to that BBox, it’s a monster from another planet. As such, you would think it would be able to perform the same feats as that old machine, without even heating up a single resistor. To make it even more of a monster, there’s that GPU sitting in there which has 1000 times over the amount of processing power utilized to send people to the moon in the sixties.

So, what can this machine do? Well, It allows me to type really fast!! I can read emails in the blink of an eye, and Netflix movies play with nary a stutter!  I tell you, it’s simply amazing!  But, what about all that horsepower that’s sitting idle under my desk?  Surely I can put it to some good usage.

Well, of course graphics processing can largely be offloaded to the GPU these days.  Although I conjured up a graphics library that lives complely on the CPU, and just draws to memory, doing the same using the GPU is far faster, and takes a lot less electricity.

And finally, I come to the point.  I have gotten far enough along with my OpenCL binding that I can now actually do some OpenCL programming.  OpenCL is an interesting little thing.  Basically, it introduces the concept of ‘kernel’ programming.  And here, Kernel does not mean the OS kernel, but rather the small little bit of code that will run in parallel on the same piece of memory that other little bits of code are running against.  This is in fact what happens when you’re running a GLSL shader.  It’s just a little ‘kernel’, and in the case of a fragment shader, that little kernel runs against all the pixels in a frame, in parallel with hundreds of others doing the same thing.

Using GLSL based fragment shaders is great for graphics programming, but for general computing, it’s kind of clunky as you’d have to cast your compute problem into terms that the graphics pipeline can understand.  Furthermore, in order to use GLSL at all, you have to do things like create a GLContext, which requires a DeviceContext, which requires a Window, or at least a GDIBitmap.  That’s a lot of machiner to just write a bit of code to manipulate some data.

OpenCL changes things a bit.  First of all, you have access to the GPU power without the graphics constructs.  You still have to create a proper context, but it’s far easier without having to worry about windows and bitmaps.  There are some concepts, and a hierarchy for doing things.  You start at the top with platforms.  There may be multiple “platforms” within your machine.  Usually there is only one though.  Within a platform, there are devices.  There may be multiple devices in a platform.  For example, you might have two nVidia cards in your machine, and that will list as two devices.

After the device, there is the concept of a context.  The context can span multiple devices.  The context controls things like where memory is created, where programs are created, where kernels are run, and the like.  This is really where things start to get interesting.

From the context, you can create a “program”.  Here, I think it is easier to think of the program as “image”.  You are essentially placing an “image” onto the context.  I think of the image as the raw OS image, ready to have little bits of code running in it.

Then, finally, you can create a “kernel”, which is actually a piece of code that’s going to execute on the device.

That’s a lot of stuff, and a lot of error checking, and a lot of pointers that can go wrong, etc.  So, the Lua version looks like this:

local platform, num = CLGetPlatform()
local devices = platform:GetDevices(CL_DEVICE_TYPE_GPU)

That is, get the first plaform available. Then, get the list of devices available on the platform. And finally, run a kernel (code below).

Using Lua is nice because garbage collection can be used to release various resources when they’re no longer in use. That saves a bit of typing, and you don’t have to remember anything.

To run a kernel, I looked at a simple example in C, written by Clifford Wolf.

local program_source = [[
    __kernel void simple_demo(__global int *src, __global int *dst, int factor)
        int i = get_global_id(0);
        dst[i] = src[i] * factor;

function runkernel(device)
    local context = CLContext():CreateForDevice(device);

    local program = context:CreateProgramFromSource(program_source);

    local NUM_DATA = 100;
    local buffsize = ffi.sizeof("int")*NUM_DATA;

    local input_buffer = context:CreateBuffer(buffsize, CL_MEM_READ_ONLY);
    local output_buffer = context:CreateBuffer(buffsize, CL_MEM_WRITE_ONLY);

    local factor = 2;
    local lpfactor ="int[1]", factor);

    local kernel = program:CreateKernel("simple_demo");

    kernel:SetIndexedArg(0, input_buffer.Handle, ffi.sizeof("cl_mem"));
    kernel:SetIndexedArg(1, output_buffer.Handle, ffi.sizeof("cl_mem"));
    kernel:SetIndexedArg(2, lpfactor, ffi.sizeof("int"));

    local queue = context:CreateCommandQueue(input_buffer);

    local intsize = ffi.sizeof("int");
    local lpi ="int[1]");
    for i=0, NUM_DATA-1 do
        local offset = intsize*i;
        lpi[0] = i;
        queue:EnqueueWriteBuffer(input_buffer, offset, lpi, intsize);

    local global_work_size ="size_t[1]",NUM_DATA);
    local kernel_completion = queue:EnqueueNDRangeKernel(kernel, global_work_size);


    local lpdata ="int[1]");
    for i=0, NUM_DATA-1 do
        local offset = i*intsize;
        local err = ocl.clEnqueueReadBuffer(queue.Handle, output_buffer.Handle, 
            CL_TRUE, offset, intsize, lpdata, 0, nil, nil);
        CL_CHECK(err, "clEnqueueReadBuffer");

In the first part of runkernel(), I’m using the nice object like interface that the Lua binding provides. In the last part of the function, I’m using the straight OpenCL calls, just to show how that’s done.

There are a couple of things of note here. First, the ‘program_source’ is just a string. This is the same as with GLSLProgram. There are various environments available, including from nVidia, which will help you create these kernel strings. Once you have your string perfected, you can just drop it in for inclusion as your kernel.

Since a kernel is not a function in lua that you can just pass variables to, you have to do some explicit work to pass values in as arguments. kernel:SetIndexedArg() performs this task. This is an ideal candidate for some Lua magic to make it simpler. Unlike the GLSL interface, I can’t query the program to find out the types of the various arguments. But, since I wrote the kernel, I do know their types, so, I write a little table that maps the index to a name, and the data values, and this code could turn into a more familiar:

kernel.src = input_buffer
kernel.dst = output_buffer
kernel.factor = 2

Then I’d be happy as a clam. There is another concept that gets in your face here. That’s the whole queuewrite, queueread business. Basically, all data and kernel deployment happens as commands executed from a queue. That fact does not need to be front and center, and a little bit of wrapping might make it nicer to deal with.

Now that this is in hand, what can be done with it? Well, there’s the obvious graphics stuff, which is where it came from, but there’s a whole lot more. I was just thinking that this might be a great way to perform base64 encoding for example. It’s a largely parallel task. You could write a kernel that turns a 3-character block into the equivalent 4-character code. As this kernel can run in parallel, you could literally have hundreds of them working on encoding your text at the same time. At the end, you’ve got a base64 encoded thing, in a fraction of the time it would normally take.

Using a slightly different approach, that of stream processing, you could probably perform some cryptographic operations, like digest calculations and the like.

There is one tool that I found that makes exploring OpenCL fairly easy and fun. OpenCL Studio is done by Geist Software Labs, who appear to be a consultancy for high performance computing. They have a nice Lua scriptable environment that allows you to play with OpenCL and OpenGL, just like that.

Having such a tool available is an accelerant for me to get even more productivity wrung out of myself, and my machine.

With my little Lua Binding to OpenCL, I am confident that I’m going to be able to get more per killowatt out of my programming.  That’s good for my programs, and good for the environment.  I’m hoping that between a fast quad-proc, super duper graphics card, and Lua, I’ll finally be able to write and utilize programs that are more impressive that what I could do 15 years ago.

Kinect Edge Detect

And, here’s the live edge detection version. Again, pretty straight forward. Same setup as before, but just changed the fragment shader to be an edge detector instead of that cross hatch thing. Really, you can do anything GLSL will support.

When it’s supported by OpenCL kernels, then it gets even more interesting because you’re not quite as constrained by the language constructs of GLSL. As I just explained to my daughter.

I must say, at this point, it really has nothing to do with the Kinect, other than to the extent that I’m using the Kinect as a cheap source of video input. Actually, it’s not a particularly cheap source for video input. The Kinect for PC cost $250, and requires a dedicated USB 2.0 bus, and is very finicky when it comes to performance.

For the same $250, you could purchase a couple of HD quality webcams, and enough hardware to put them on a piece of aluminum at a sufficient distance from each other to do some stereo correspondence work. That might be nice for some close up situations, which the Kinect for PC is supposed to deal with. The Kinect probably works better for that larger distances away from the screen, but does it…

Kinect Video with CrossHatch Shader

Now that I have the Kinect behaving reliably, and the GLSL Shader thing going on fairly reasonably, it’s time to combine the two.

In the video, I am using the Kinect to capture a movie playing on the screen of my monitor. Any live video source will do really, but I might as well use the Kinect as a WebCam for this purpose.

The setup for the Kinect is same as before. Just get the sensor, and for every frame, grab the color image, and display it using a quad to the full window size.

Second, I just bring in the shader, just like before. The shader doesn’t know it’s modifying a live video feed. All it knows is that it’s sampling from a texture object, and doing its thing. Similarly, the Kinect code doesn’t know anything about the shader. It’s just doing its best to supply frames of color images, and leaving it at that.

Pulling it together just means instantiating the shader program, and letting nature run its course. The implications are fairly dramatic though. Through good composition, you can easily peform simple tricks like this. The next step would be to do some interesting processing on the frames as they come from the camera.

This brings up an interesting point in my mind. The Kinect is a highly specialized piece of optical equipment, promoted, and made cheap, because of Microsoft and the XBox. But, is it the best and only way to go? The quality of the 3D imaging is kind of limited by the resolution and technique of the IR projector/camera. Through some powerful software, you can get good stereo correspondence to match up the depth pixels with the color pixels.

But, why not go with just plain old ordinary WebCams and stereo correspondence? Really, it’s just a challenging math problem. Well, using OpenCL, and the power of the GPU, doing stereo correspondence calculations shouldn’t be much of a problem should it? Then, the visualization and 3D model generation could happen with a reasonable stereo camera rig. WebCam drivers are a well trod ground, so cross platform would be instantaneous. Are there limitations to this approach? Probably, but not any that are based on the physical nature of the sensors, but rather they are based on the ability to do some good quality number crunching in realtime, which is what the GPU excels at.

Kinecting with Lua, Phase II

I did the initial interop FFI work for the Kinect a couple weeks back. That initial effort was sufficient to allow me to view an upside down image on my screen. The video was not very reliable, and quite a few frames were dropped, and eventually the thing would just kind of hang. Same as on the XBox actually. Mostly it works, but occasionally, it just kind of loses its mind.

Well, I’ve updated the code a bit. But, one of the most noticeable changes in my rig is that I’ve purchased the Kinect for PC. It does in fact make a difference. I don’t know if it’s primarily the slightly shorter USB cable, or because I plugged into a better USB port on my machine, or that MS has actually done something to improve it, but it is better.

In this picture, I’m capturing the color video stream at 640×480, which simultaneously capturing the depth at 320×240. It’s good enough that I can display both in realtime (~30 fps). It does get a bit jerky when there is a lot of movement in the scene, but it will run on and on and on.

I’ve done a couple of things to the code itself. First, I put evrything into a single file (Kinect.lua). It’s a bit beefy at about 1070 lines, but compared to various other files, it’s fairly tame.

Second, I abstracted the video and depth streams, so there is a KinectImageStream object. Getting that from the sensor looks like this now:

local sensor0 = Kinect.GetSensorByIndex(0, flags);

local colorStream = sensor0.ColorStream;
local depthStream = sensor0.DepthStream;

Pretty simple. And what do you get from these streams? Well, in my case, I want to copy the data from the stream into a texture object so that I can display it using a quad, or do other interesting processing. The setup of the textures looks like this:

local colorWidth = 640
local colorHeight = 480
local colorImage = GLTexture.Create(colorWidth, colorHeight)

local depthWidth = 320
local depthHeight = 240
local depthImage = GLTexture.Create(depthWidth, depthHeight, GL_LUMINANCE, nil, 

Then, at every clock tick, the display routine does the following:

function displaycolorimage()
    local success = colorStream:GetCurrentFrame(300)

    if success then
        colorImage:CopyPixelData(colorWidth, colorHeight, 
            colorStream.LockedRect.pBits, GL_BGRA)

        glViewport(windowWidth/2, windowHeight/2, windowWidth/2, windowHeight/2)

function displaydepthimage()
    local success = depthStream:GetCurrentFrame(300)

    if success then
        depthImage:CopyPixelData(depthWidth, depthHeight, 
            depthStream.LockedRect.pBits, GL_LUMINANCE, GL_SHORT)

        glViewport(0, windowHeight/2, windowWidth/2, windowHeight/2)
        displayfullquad(depthImage, windowWidth, windowHeight)

function display(canvas)


That’s about as simple as it gets. To go further, I could have had the stream automatically copy into a texture object, but it’s find the way it is right now. This gets very interesting when you consider the possibly collaboration with OpenCL, and OpenGL shaders. First of all, the memory can be a memory Buffer, or Image2D in OpenCL terms. That can be shared with OpenGL, for zero copying when going between the two environments.

So, imagine if you will, you can now capture data from the Kinect with very little glue code. Next, you can process that data with OpenCL kernels if you feel like it, taking advantage of some GPU processing without having to deal with GLSL. The, in the end, you can utilize the results in OpenGL, for doing whatever it is that you’ll be doing.

That’s quite a few different pieces of technology coming together there. I think LuaJIT and Lua in general makes this task a bit easier. Once the basic interfaces are done up in Lua, stitching things together gets progressively easier.

I imagine I’ll be able to shave off at least a couple hundred lines of code from that Kinect.lua file. There’s some cruft in there that I just did not clean up yet. With this one file, a developer is able to easily incorporate Kinect data into their programming flow. So, now things get really fun.

Dynamic Setting of Uniform Shader Values

I’m all about the lazy, and avoiding typing when I program. If I had my way, I would just think my program into existance. But alas, I still have to type.

But now, I can type a lot less error prone code when it comes to dealing with OpenGL, and in particular with shaders. What’s a ‘shader’? You know, those pesky little programs that you have to write to get anything done on the GPU these days.

Here’s the entirety of the code for the Mandelbrot zooming thing, at least the part that runs on the GPU. It’s a ‘fragment shader’, and pretty rudimentary at that:

local fragtext = [[
uniform sampler1D tex;
uniform vec2 center;
uniform float scale;
uniform int iter;

void main() {
    vec2 z, c;

    c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x;
    c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;

    int i;
    z = c;
    for(i=0; i 4.0) break;
        z.x = x;
        z.y = y;

    gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);

Hay! Wait a minute. Isn’t that “C” code? Well, not quite. Notice a couple of things. First of all, this is the shader code encapsulated in a Lua string (the [[]]). When you look at the code itself, you see those words “uniform”. Those are variables that can be set from the outside world. In the case of the zooming mandelbrot explorer from the video, I’m changing things like the ‘iter'(ations), and the scale, and the center.

Normally, in order to change these values, you have to do things like this:

gpuprog:set_uniform1i("iter", iter);
gpuprog:set_uniform2f("center", cx, cy);
gpuprog:set_uniform1f("scale", scale);

My fingers start to cramp even from having to just copy and paste that code. Not only that, but it requires the programmer know a lot more about the intricacies of the GLSL/OpenGL API, which means they spend less time actually coding, and more time doing glue work.

What I really want to do as a programmer is this:

gpuprog.iter = iter; = float2(cx, cy)
gpuprog.scale = scale;

And while I’m at it, creating that GPU program is quite a pain in the first place. Roughly speaking, the steps are like this:

  • Create Program
  • Create Shader
  • Compile Shader
  • Attach Shader to Program
  • Link Program
  • Use Program

That’s more than a mouthful, and there’s tons of error checking in esoteric sorts of ways. But really, as a programmer, who’s not particularly focused on learning the details of OpenGL APIs, but just wants to use the GPU for some kick butt graphics, I simply want to the do the following:

gpuprog = GLSLProgram(fragtext)

Once again, saving my arthritic hands from a lot of typing, and sparing the few brains cells I have left from having to remember a bunch of APIs that I will use infrequently.

There’s really something to this. Because of the magic of Lua, I can implement the GLSLProgram to do lookups whenever it doesn’t recognize a field access. Basically, the lookup does the following:

function glsl_get(self, key)
    -- try the class table, as it might be a
    -- function for the class
    field = rawget(GPUProgram,key)
    if field ~= nil then
        print("returning glsl field: ", field)
        return field

    -- Last, do whatever magic to return a value
    -- or nil
    local value, ncomps =  GetUniformValue(self, key)

    if ncomps == 1 then
        return value[0];

    return value

Setting a field is even easier:

function glsl_set(self, key, value)
    -- try to set the value
    -- in the shader
    SetUniformValue(self, key, value)

The functions GetUniformValue, and SetUniformValue is where all the action is at. They figure out what type of field it is we’re accessing (there’s a GLSL API for that), and the construct the appropriate type of array, stuff it with the value, and call the appropriate API to get the values in and out. The, return to the user, and a nice sort of way. You can actually round trip values between get/set, so everything works out great. All the error checking can be put in one place, rather than spread all over your code.

And the way it ties together is like this:

GPUProgram = {}
GPUProgram_mt = {}

function, vertext)
    local self = {}

    self.ID = ogm.glCreateProgram();

    if fragtext ~= nil then
        self.FragmentShader = GPUShader(GL_FRAGMENT_SHADER, fragtext);
        GPUProgram.AttachShader(self, self.FragmentShader);

    if vertext ~= nil then
        self.VertexShader = GPUShader(GL_VERTEX_SHADER, vertext);
        GPUProgram.AttachShader(self, self.VertexShader);


    setmetatable(self, GPUProgram_mt)

    return self

GPUProgram_mt.__index = glsl_get
GPUProgram_mt.__newindex = glsl_set

function GLSLProgram(fragtext, verttext)
    local prog =, vertext)

    return prog

Basically, the GLSLProgram function will create an ‘instance’ of a GPUProgram table. That table will have a metatable, which has the __index and __newindex functions on it. These functions will be called whenever a value is not found in the instance of the GPUProgram, or when there is a desire to set a value.

It’s a good thing I was able to come up with this little string of code. Otherwise, I fear that I would never be able to make any progress using the GPU. The same little tricks will need to be done to set vertex attributes as well, which is a whole other beast to deal with.

But, there you have it. I don’t know about using other dynamic programming languages to deal with GPU shading, but I find this to be the cat’s meow as far as I’m concerned. Having this particular mechanism in place makes GPU shader programming feel as natural as my regular Lua programming. There is that small step of writing that bit of shader code in what looks like “C”, but maybe even that isn’t a big deal. I don’t see why shader code can’t be written in Lua, and just translated into the appropriate backend language. If I could do that, then I could stay safely tucked away in my “Lua Only” world. But I digress…