The Power of Leveraged Frameworks that work

I was browsing the Opengl.org web site recently, when I ran across a talk by Mark Kilgard. The talk was at a recent GPU Technology Conference (GTC 2012 – San Jose). The talk was entitled “NVIDIA OpenGL in 2012“.

It was a general overview talk, covering the history of OpenGL, it’s present, and near future.  There are lots of little details related to OpenGL of course, but there were a couple that stood out for me.  At roughly 49:15 into the talk, there’s a slide entitled “What is path rendering?”, whith a bunch of 2D path rendered images on it.  Now this gets really interesting.

Basically, if you’ve been doing 2D graphics for the past few years, you realize that the GPU revolution has largely left you behind.  Yes, you can render milliions of triangles per second, but just try to render some nicely kerned text, say for a web page, and you’re pretty much on your own.  OpenGL has got nothing for you, or rather, what it does have for you will leave you completely dealing with the rasterization and process, for the most part.

What this means is that if you want to render high quality path driven stuff, like what you find in postscript, OpenVG, or HTML text, you’re going to have to do a whole bunch of work.  But wait!  Seeing the sad state of affairs, perhaps feeling guilty for their sins, nVidia has decided to tackle the problem space of path based rendering, using the GPU to accelerate.  What a novel idea!  I think it stems from the fact that their growth market is underpowered mobile devices, which have GPUs.  The more you can offload to the GPU the better as it’s more energy efficient for certain things than a CPU would be.

During the presentation, he talks about the various 2D APIs such as Quartz, OpenVG, Direct2D, Cairo, Skia, Qt::QPainter, Anti-grain… All APIs I’ve touched one way or another over the years.  He goes on about the greatness of these new extensions, which apparently have been in the nVidia drivers for a while.  Then I get to thinking.

I want 2D graphics.  I want it to work across multiple platforms, I want it to be fast and efficient.  At first I thought, maybe I should get Cairo and use that as my cross platform 2D graphics system.  Eventually Cairo will likely utilize this new path rendering stuff, and I’ll eventually benefit.  So, I looked at Cairo, took one look at the build system, and turned green.  Then I had another thought.

OpenGL is already THE cross platform graphics API.  And, since I have Lua, and more specifically LuaJIT with FFI, and I’ve already coded up my opengl interfaces, I can just use that, and it should work across multiple platforms.

So, sounds good.  I went off to the nVidia site to see what I could see with regards to the using this newfangled path rendering stuff.  Right now it’s only in the nVidia driver, so AMD, not so much.  I took one of the whitepapers that has examples on it, and just started coding what was there.  After a few turns of the crank, I was finally able to generate the image seen above.

Here’s one sequence of calls that I used:

local ogl = require "OglMan"

ogl.glMatrixLoadIdentityEXT(GL_PROJECTION);
ogl.glMatrixOrthoEXT(GL_PROJECTION, 0, 500, 0, 400, -1, 1);
ogl.glMatrixLoadIdentityEXT(GL_MODELVIEW);
ogl.glPathStringNV(pathObj, GL_PATH_FORMAT_SVG_NV, #svgPathString, svgPathString);
ogl.glStencilFillPathNV(pathObj, GL_COUNT_UP_NV, 0x1F);

OglMan is my OpenGL Manager. It’s effectively the same thing as using the familiar GLEW (GL Extension Wrangler), but done up in Lua, not as an interop thing.

I was not familiar with any of these calls before I wrote this code. But, just putting ‘ogl.’ at the front of each one of them, I assumed they would just work, and they did! I was actually amazed at how simple it was to code up this example.

This speaks volumes to the ease of use of Lua as a rapid prototyping tools. To do the same in C, would take me a lot more scaffolding, compiling, sweating and praying. In my little HeadsUp harness, I can just code and go, trying things out, with zero “compile”.

At any rate, it’s nice to know that seeing the world through a Lua lense is not a bad thing. I am just as capable as anyone on any other platform. I am asking myself this question now… If I could have a high quality text renderer done using nothing more than the GPU, and whatever text rendering library I write in Lua, could I write a nicely specialized HTML viewer?

 


Unchaining the GPU with Lua and OpenCL

Quite a few years ago, I programmed the BeBox to display multiple streams of .mpg video, while simultaneously pulling in video feeds from Satellite and cable. In all, you could see snapshots of roughly six things on the screen, happening all at the same time.

The CPUs were utilized primarily for the mpeg part, doing decoding, and some special effects when changing sources being displayed in the primary area. The feeds coming off the Happauge video capture card were being DMAd directly into the framebuffer of the graphics card, so there wasn’t any work by the CPU going on there.

That was a pretty good result for a dual-proc machine circa 1996. That was at the very beginning of the birth of nVidia, and GPUs were actually first becoming mainstream from 3dfx. Roll forward 16 years… and where are we today?

Well, the machine whining away under my desk is a 3.4Ghz AMD Phenom(tm) II X4 965 Processor, with 8Gb of RAM. The graphics card is an nVidia gfx 275. This machine is a couple years old now, but compared to that BBox, it’s a monster from another planet. As such, you would think it would be able to perform the same feats as that old machine, without even heating up a single resistor. To make it even more of a monster, there’s that GPU sitting in there which has 1000 times over the amount of processing power utilized to send people to the moon in the sixties.

So, what can this machine do? Well, It allows me to type really fast!! I can read emails in the blink of an eye, and Netflix movies play with nary a stutter!  I tell you, it’s simply amazing!  But, what about all that horsepower that’s sitting idle under my desk?  Surely I can put it to some good usage.

Well, of course graphics processing can largely be offloaded to the GPU these days.  Although I conjured up a graphics library that lives complely on the CPU, and just draws to memory, doing the same using the GPU is far faster, and takes a lot less electricity.

And finally, I come to the point.  I have gotten far enough along with my OpenCL binding that I can now actually do some OpenCL programming.  OpenCL is an interesting little thing.  Basically, it introduces the concept of ‘kernel’ programming.  And here, Kernel does not mean the OS kernel, but rather the small little bit of code that will run in parallel on the same piece of memory that other little bits of code are running against.  This is in fact what happens when you’re running a GLSL shader.  It’s just a little ‘kernel’, and in the case of a fragment shader, that little kernel runs against all the pixels in a frame, in parallel with hundreds of others doing the same thing.

Using GLSL based fragment shaders is great for graphics programming, but for general computing, it’s kind of clunky as you’d have to cast your compute problem into terms that the graphics pipeline can understand.  Furthermore, in order to use GLSL at all, you have to do things like create a GLContext, which requires a DeviceContext, which requires a Window, or at least a GDIBitmap.  That’s a lot of machiner to just write a bit of code to manipulate some data.

OpenCL changes things a bit.  First of all, you have access to the GPU power without the graphics constructs.  You still have to create a proper context, but it’s far easier without having to worry about windows and bitmaps.  There are some concepts, and a hierarchy for doing things.  You start at the top with platforms.  There may be multiple “platforms” within your machine.  Usually there is only one though.  Within a platform, there are devices.  There may be multiple devices in a platform.  For example, you might have two nVidia cards in your machine, and that will list as two devices.

After the device, there is the concept of a context.  The context can span multiple devices.  The context controls things like where memory is created, where programs are created, where kernels are run, and the like.  This is really where things start to get interesting.

From the context, you can create a “program”.  Here, I think it is easier to think of the program as “image”.  You are essentially placing an “image” onto the context.  I think of the image as the raw OS image, ready to have little bits of code running in it.

Then, finally, you can create a “kernel”, which is actually a piece of code that’s going to execute on the device.

That’s a lot of stuff, and a lot of error checking, and a lot of pointers that can go wrong, etc.  So, the Lua version looks like this:

local platform, num = CLGetPlatform()
local devices = platform:GetDevices(CL_DEVICE_TYPE_GPU)
runkernel(devices[1]);

That is, get the first plaform available. Then, get the list of devices available on the platform. And finally, run a kernel (code below).

Using Lua is nice because garbage collection can be used to release various resources when they’re no longer in use. That saves a bit of typing, and you don’t have to remember anything.

To run a kernel, I looked at a simple example in C, written by Clifford Wolf.

local program_source = [[
    __kernel void simple_demo(__global int *src, __global int *dst, int factor)
    {
        int i = get_global_id(0);
        dst[i] = src[i] * factor;
    }
]];

function runkernel(device)
    local context = CLContext():CreateForDevice(device);

    local program = context:CreateProgramFromSource(program_source);
    program:Build();

    local NUM_DATA = 100;
    local buffsize = ffi.sizeof("int")*NUM_DATA;

    local input_buffer = context:CreateBuffer(buffsize, CL_MEM_READ_ONLY);
    local output_buffer = context:CreateBuffer(buffsize, CL_MEM_WRITE_ONLY);

    local factor = 2;
    local lpfactor = ffi.new("int[1]", factor);

    local kernel = program:CreateKernel("simple_demo");

    kernel:SetIndexedArg(0, input_buffer.Handle, ffi.sizeof("cl_mem"));
    kernel:SetIndexedArg(1, output_buffer.Handle, ffi.sizeof("cl_mem"));
    kernel:SetIndexedArg(2, lpfactor, ffi.sizeof("int"));

    local queue = context:CreateCommandQueue(input_buffer);

    local intsize = ffi.sizeof("int");
    local lpi = ffi.new("int[1]");
    for i=0, NUM_DATA-1 do
        local offset = intsize*i;
        lpi[0] = i;
        queue:EnqueueWriteBuffer(input_buffer, offset, lpi, intsize);
    end

    local global_work_size = ffi.new("size_t[1]",NUM_DATA);
    local kernel_completion = queue:EnqueueNDRangeKernel(kernel, global_work_size);

    kernel_completion:Wait();
    kernel_completion:Release();

    print("Result:");
    local lpdata = ffi.new("int[1]");
    for i=0, NUM_DATA-1 do
        local offset = i*intsize;
        local err = ocl.clEnqueueReadBuffer(queue.Handle, output_buffer.Handle, 
            CL_TRUE, offset, intsize, lpdata, 0, nil, nil);
        CL_CHECK(err, "clEnqueueReadBuffer");
        print(lpdata[0]);
    end
end

In the first part of runkernel(), I’m using the nice object like interface that the Lua binding provides. In the last part of the function, I’m using the straight OpenCL calls, just to show how that’s done.

There are a couple of things of note here. First, the ‘program_source’ is just a string. This is the same as with GLSLProgram. There are various environments available, including from nVidia, which will help you create these kernel strings. Once you have your string perfected, you can just drop it in for inclusion as your kernel.

Since a kernel is not a function in lua that you can just pass variables to, you have to do some explicit work to pass values in as arguments. kernel:SetIndexedArg() performs this task. This is an ideal candidate for some Lua magic to make it simpler. Unlike the GLSL interface, I can’t query the program to find out the types of the various arguments. But, since I wrote the kernel, I do know their types, so, I write a little table that maps the index to a name, and the data values, and this code could turn into a more familiar:

kernel.src = input_buffer
kernel.dst = output_buffer
kernel.factor = 2

Then I’d be happy as a clam. There is another concept that gets in your face here. That’s the whole queuewrite, queueread business. Basically, all data and kernel deployment happens as commands executed from a queue. That fact does not need to be front and center, and a little bit of wrapping might make it nicer to deal with.

Now that this is in hand, what can be done with it? Well, there’s the obvious graphics stuff, which is where it came from, but there’s a whole lot more. I was just thinking that this might be a great way to perform base64 encoding for example. It’s a largely parallel task. You could write a kernel that turns a 3-character block into the equivalent 4-character code. As this kernel can run in parallel, you could literally have hundreds of them working on encoding your text at the same time. At the end, you’ve got a base64 encoded thing, in a fraction of the time it would normally take.

Using a slightly different approach, that of stream processing, you could probably perform some cryptographic operations, like digest calculations and the like.

There is one tool that I found that makes exploring OpenCL fairly easy and fun. OpenCL Studio is done by Geist Software Labs, who appear to be a consultancy for high performance computing. They have a nice Lua scriptable environment that allows you to play with OpenCL and OpenGL, just like that.

Having such a tool available is an accelerant for me to get even more productivity wrung out of myself, and my machine.

With my little Lua Binding to OpenCL, I am confident that I’m going to be able to get more per killowatt out of my programming.  That’s good for my programs, and good for the environment.  I’m hoping that between a fast quad-proc, super duper graphics card, and Lua, I’ll finally be able to write and utilize programs that are more impressive that what I could do 15 years ago.


HeadsUp Live Mandelbrot Zoom

To kickoff the usage of shaders, I figured I go back to the Mandelbrot example. In this particular case, showing a static image isn’t that exciting, so I figured I’d produce a little movie clip instead. So, what you see here is some typical OpenGL code written, that uses a fragment shader to do the actual work. I borrowed the fragment shader from here.

This is really getting interesting.  The fragment shader itself had zero changes, because it’s GLSL code, so nothing to change.  The code wrapping changed in the usual sorts of way.  Here is how the shader program is setup:

function setup_shader(fname)
    local fp = io.open(fname, "r");

    local src_buf = fp:read("*all");
    local src_array = ffi.new("char*[1]", ffi.cast("char *",src_buf));

    local sdr = ogm.glCreateShader(GL_FRAGMENT_SHADER_ARB);	
    ogm.glShaderSource(sdr, 1, src_array, nil);
    ogm.glCompileShader(sdr);

    local prog = ogm.glCreateProgram();
    ogm.glAttachShader(prog, sdr);
    ogm.glLinkProgram(prog);
    ogm.glUseProgram(prog);
    
    return prog;
end

There’s only one slightly tricky line in here. A shader is a program, and that program gets compiled on the GPU by the vendor’s OpenGL GLSL compiler. So, you’ve got to get the text of that program over to the GPU. The API for doing that is:

void glShaderSource (GLuint shader, GLsizei count, GLchar* *string, const GLint *length);

It’s the “GLchar **string” that’s the only challenge here. Basically, the function expects an array of pointers to strings. So, using the LuaJIT ffi, this turns out to be achievable with the following:

local src_array = ffi.new("char*[1]", ffi.cast("char *",src_buf));

It maybe looks like a bit of a magical incantation, but once it’s done, you’re good to go. From then on out, it’s standard stuff. Notice the usage of ‘ogm’. That’s the alias for the OglMan table, which is used to pull in all the extensions you could care to use. It really was brain dead easy to do this. Whenever the LuaJIT compiler complained about not being able to find something, I just put “ogm.” in front of it, until all complaints were solved, and the program finally ran.

And the result in this case is a nice fly through of a mandelbrot set. Julia sets can be added just as easily by changing the .glsl file that I’m loading into the fragment shader.

This bodes well. It will be a small matter to wrap this stuff up in a couple of convenience objects so that I won’t have to make all those GLSL Calls explicitly.

One of the hardest parts to deal with typically is the setting of ‘uniform’ variables. This is the way in which you communicate values from the outside world into the shader code. I’m thinking Lua will help me do that in such a way that’s fairly natural, and doesn’t take a lot of code. Maybe I can use the same trick I did with OglMan (implement __index and __newindex). If I could do that, then it would look the same as setting/getting properties on an object to interact with your GLSL shader. And that would be a fine thing indeed as then the code would just slip right into the rest of the Lua code, without looking dramatically different. Never mind that the underlying code is actually running on the GPU.

At any rate, there you go. Live zooming on a Mandelbrot set, utilizing the GPU for acceleration, all written in Lua (except for the shader code of course). I wonder if the shader code could be written in Lua as well, and then just converted…


HeadsUp NeHe Tutorials

Since time immemorial, I have learned from the NeHe OpenGL Tutorials.  These tutorials have been great, particularly up through the 2.1 version of OpenGL.  With the advent of more and more shader programming, WebGL, and other advancements, the old tutorials are now listed as “Legacy”.

These tutorials are still useful for a couple of reasons.  Not everyone programs with shaders as yet, and they are a great way to flush out all the challenges with a new OpenGL interface, such as what is in HeadsUp.  I have implemented tutorials 2-8, just for kicks.  The picture here shows lesson8.lua, which is about blending, some lighting, and using texture objects.  I’ve implemented an extremely simple brain dead Targa image viewer, just to get some images into texture objects.  The rest is pure OpenGL.

The way I’ve done it is to grab the GLUT or C++ version of the code if it exists, and then just do some massaging of the code until it compiles.  It’s typically a fairly simple straight forward process.  I’ve even added some of the most common glxxx functions to the global namespace for convenience.   One example is dealing with color.  Of course, you can be very explicit:

gl.glColor4f(1,o,o,1)

And if you want to do that without the ‘gl.’ prefix, you can simply do:

glColor4f(1,0,0,1)

That allows your code to look exactly like the typical ‘C’ version of the code. But wait a minute, this is Lua, so convenience is the name of the game. We can do some overloading and get an even better effect:

function glColor(...)
local arg={...};
    if #arg == 3 then
        gl.glColor3d(arg[1], arg[2], arg[3]);
    elseif #arg == 4 then
        gl.glColor4d(arg[1], arg[2], arg[3], arg[4]);
    elseif #arg == 2 then
        gl.glColor4d(arg[1], arg[1], arg[1], arg[2]);
    elseif #arg == 1 then
        if type(arg[1] == "number") then
            gl.glColor3d(arg[1], arg[1], arg[1]);
        elseif type(arg[1]) == "table" then
            if #arg[1] == 3 then
                gl.glColor3d(arg[1], arg[2], arg[3]);
            elseif #arg[1] == 4 then
                gl.glColor4d(arg[1], arg[2], arg[3], arg[4]);
            end
        end
    end
end

With this function, yoiu can use several calling conventions:

glColor(0.63)        -- Set a grayscale value
glColor(0.63, 0.5)   -- Set a grayscale value with alpha
glColor(0.25, 0.30, 0.30, 1)  -- Set a full color value
glColor({0.24, 0.30,0.30,1})  -- Set a full color, with alpha using a table
glColor({0.24, 0.30, 0.30})   -- Set a full color using a table

The only one that is missing is:

glColor(vec3(0.24, 0.3,0.3))  -- Set color using a vec (float[3])

If you are familiar with using the processing environment, this flexibility in setting color values might seem more familiar. There is something nasty about the difference between counting from ‘0’ as is typical in C, and counting from ‘1’ which is standard for Lua tables. By using this sort of construct, you can get it both ways. If you want the typical C version, including passing array structures, then use the standard C looking functions. If you want to pass your Lua based tables around, then use the more generic versions of the function, and pass tables around.

The same is true for Vertex objects.

I find this to be a useful construct. Although the flexibility can be a bit much when you try to think about the many ways you can do something, really it just feels natural because you just do whatever feels natural to your programming style, and it will probably work. You can stick with copy/paste of code you find from elsewhere, or you can taylor it to the Lua environment as suits your needs.

Now, on to those shaders!


HeadsUp OpenGL Extension Wrangling

I have dealt with OpenGL extensions in a previous library I did in C#.  I can tell you, it’s rather a pain.  First of all, there are so many ‘extensions’, it can make your head spin just thinking about it.  Second, with static languages, you have to create all these various wrapper type things to get things to work correctly.  Create a delegate thing, a declaration thing, the glue code to tie the delegate to the declaration…

So, I figured with LuaJIT and LuaJIT ffi in particular, I might be able to make an easier time of it.  There is one unavoidable part though.  You have to have the prototypes of your functions somewhere.  I’ll start with the simplest one:

// WGL_ARB_extensions_string 1  
const char *  wglGetExtensionsStringARB (HDC hdc); 
typedef const char * (* PFNWGLGETEXTENSIONSSTRINGARBPROC) (HDC hdc);

The function: wglGetExtensionsStringARB(), if called, you can get the list of wglExtensions that your currenct gl driver supports. This isn’t a normal function in a library. My first inclination might be to simply do:

ffi.cdef[[
wglGetExtensionsStringARB (HDC hdc); 
]]

gl = ffi.load("opengl32")
local extensions = gl.wglGetExtensionsStringARB(hdc);

But, you can’t do that. This function isn’t necessarily located within the opengl32.dll library at all. In order to find out where it actually is, you have to call another function: wglGetProcAddress(), which is actually in the library. So, in order to string this together, you have to do the following:

gl = ffi.load("opengl32")
local funcptr = gl.wglGetProcAddress("wglGetExtensionsStringARB")
local castfunc = ffi.cast("PFNWGLGETEXTENSIONSSTRINGARBPROC", funcptr);

local extensions = castfunc(hdc);
print(extensions)

That’s a handful, but it’s not too bad. First, get the address of the extension function you’re looking for (wglGetProcAddress). Then, cast it to a function prototype so that when you try to call it, LuaJIT knows about the parameter types and can do the marshaling for your automatically. Then, call the function.

But, I want this to be as easy as possible, and being the error prone programmer that I am, I want it to be automated as well, because I’m not good at typing a whole bunch of repetitive stuff correctly.

OK, so how to do this? First of all, I downloaded the wglext.h and glext.h files from the Khronos site: http://www.opengl.org/registry/
You can get the .spec files, and start parsing from there, or you can download the already made .h files. You can also get these from various vendors, or the GLEW library. I just started from the Khronos ones.

I performed some hand massaging on the .h files, to come up with things like all the constants pulled out from the #defines, and generally made the thing look like a lua file: wglext.lua. Within this files, you see all the function prototypes, wrapped up in a ffi.cdef[[]] thing, as seen above.

A thing to note about the function prototypes. For each function, there is a prototype, and a typedef. I actually only use the typedef, but the prototype is there as well for completeness. By a stroke of luck, or more likely design, the typedefs are created in a consistent way, that is an easy modification of the function name. So, in the case of wglGetExtensionsStringARB, the typedef name, which is the part I’m interested in, looks like:

PFNWGLGETEXTENSIONSSTRINGARBPROC

If I were to represent this transformation as a simple function, it would be:

function GetFunctionProtoName(fname)
    return string.format("PFN%sPROC", fname:upper());
end

That’s good. So, now, when I wanto to go from the name of a function, to the name of the typedef that represents that function, I can simply do this:

GetFunctionProtoName("wglGetExtensionsStringARB");

That’s grand. Now, tying this piece to the lookup piece, I might have two more functions:

function GetWglFunctionPointer(fname, funcptr)
    local protoname = GetFunctionProtoName(fname);
    local castfunc = ffi.cast(protoname, funcptr);

    return castfunc;
end

function GetWglFunction(fname)
    local funcptr = opengl32.wglGetProcAddress(fname);
    if funcptr == nil then
        return nil
    end

    local castfunc = GetWglFunctionPointer(fname, funcptr);

    return castfunc;
end

And how to use this?

wglGetExtensionsStringARB = GetWglFunction ("wglGetExtensionsStringARB");

local exts = wglGetExtensionsStringARB(hdc);
print(exts);

Isn’t that spiffy? I don’t think it gets much easier than that. So, for the extensions you care about, just repeat the line that has “GetWglFunction”, and you’re done…

But wait, that’s still a lot of error prone copy/paste typing isn’t it? Can’t Lua enable my laziness even more? Well, sure it can. How about we create a simple interface to deal with all this nonsense for us?

OglMan={}
OglMan_mt = {
    __index = function(tbl, key)
        local funcptr = GetWglFunction(key)
        rawset(tbl, key, funcptr)
        return funcptr;
    end,
}

setmetatable(OglMan, OglMan_mt)

Ah… now I can do the following:

local getextensions = OglMan.wglGetExtensionsStringARB
if getextensions  ~= nil then
  local exts = getextensions(hdc);
  print(exts)
end

Or, if I’m feeling particularly daring, I can simply do this:

print(OglMan.wglGetExtensionsStringARB(hdc))

Pick your level of error checking.

Why does this work? Well, the OglMan table has a meta table (OglMan_mt). That metatable defines a function ‘__index’. Through the magic of Lua, this function is called whenever you try to lookup something in the table, and it doesn’t already exist. So, when I do this:
OglMan.wglGetExtensionsStringARB, my __index function is called, and the runtime hands me the name of the thing that was being looked up. In normal circumstances, a nil value would be returned, but since I’ve already created those functions that can go from a string to a cast function pointer, I can use that first. If it fails, then I can simply return nil as usual. If it succeeds, I can return a function “pointer” that’s already cast in the appropriate way, ready to be used.

I think that’s pretty spiff.

In conclusion, after doing a bit of grunt work on those header files, it’s less than 100 lines of code to make all OpenGL extensions fully available to the Lua programmer. Of course, this works because of the ease of LuaJIT, and the __index trick of Lua in general. But, I’ve very pleased with this outcome. I don’t have to take a dependency on GLEW or any other extensions wrangler. I just need to do the initial .h file wrangling, and then go on about my business as usual.

As an added bonus, it turns out that sometimes it’s better to use this trick on functions that are actually in the OpenGL32.dll as well. The ones that are in the .dll might have bugs, that Microsoft doesn’t bother to fix. The ones that can be found using the lookup come from the vendor of the graphics card, and they have more of a vested interest in ensuring they work correctly. Just saying.


Compact Component Composability

What’s that?  Finally able to display 3D again?  Well yah.  Given the way HeadsUp works these days, a CAD program, or at least a simple STL Viewer, is just a few lines of “sample app” code. It looks like this:

require "STLCodec"
require "Scene"
require "SceneViewer"

local defaultscene = Scene();
local sceneviewer = SceneViewer();

-- Typical rendering initialization stuff
function init()
    -- create a mesh to be rendered
    local mesh = import_stl_mesh("bowl_12.stl");

    -- Add the mesh into the scene
    defaultscene:appendCommand(CADVM.mesh(mesh))
end

function reshape(w, h)
    if h == 0 then
        h = 1
    end
    gl.glViewport(0, 0, w, h)
    sceneviewer:SetSize(w, h)
end

function display()
    sceneviewer:Render(defaultscene);
end

So, for roughly 25 lines of code, you can load and display STL files. The SceneViewer object takes care of displaying the scene, including zooming, rotating around and all that. To load the stl in the first place, I’m using the import_stl_mesh() function, which comes from the STLCodec class. That’s a nicely isolated component. It can deal with reading and writing STL files. Right now it only does ASCII forms, but a little bit of code will enable binary as well. It’s nice because it’s totally isolated from the rest of the code, so it can easily be improved.

Another bit of isolation comes from the Scene and SceneViewer objects. The scene is a simple repository for things like shapes, and meshes, and other things, like rotations, translations, and the like. Your basic scene manager. The SceneViewer knows how to take a scene, and render it. It starts with a default camera and position, and color scheme. All of those things are changeable, but the defaults are good enough for most uses.

And that’s about it. All the little pieces are fairly small. Nothing more than 400 lines of code, but how they compose is the key here.

Now, to add another format, such as VRML, is a matter of adding a VRML codec. It would load into a TriangleMesh just like the stl codec does. One of the nice benefits of the separability of concerns is that you can deal with things at whatever level you like. For example, once you load a mesh in, you don’t have to render it at all. You can simply run some routines over it, and save it out again. For example, if you wanted to eliminate duplicate vertices from a mesh, you might do something like:

local mesh = import_stl_mesh("bowl.stl");
export_stl_mesh("bowl_new.stl", eliminate_duplicate_vertices(mesh));

…and you’re done. I’ve written about the desire for this kind of composability previously, but now it’s actually achievable. This kind of composability also makes for a ready made “extensions” capability for the CAD program. Since the CAD program BanateCAD is nothing more than the composition of a few different things, they can be reshaped into any configuration, and any part can load any new modules it feels like, without the primary app having to intervene in any way.


Eventus Obscuricus

There always comes a time in a programmer’s life when they must deal with an “event loop”.  Since the dawn of the teletype, programmers have had to deal with the age old question “should I poll for events, or should I be notified asynchronously”.  This very question is baked deeply into our CPU architectures with things like interrupts and queues.  It’s simply unavoidable.  So, here I sit, at the precipice of unification (2D and 3D) and the question arises, should I poll, or should I be notified.

Well, I’m actually in favor of asynchronous processing because that’s the way the world works in general.  Sure, we drive towards things, but more often than not, we are responding to something in our environment.  I figure the same should be true of a program.  Yes, it has some goals in mind, but largely it should be responsive to activities that occur on the periphery.

HeadsUp is the 2D aspect of BanateCAD.  This is where events like mouse and keyboard interaction enter the system.  Up until recently, I dealt with these events in a synchronous sort of way, with a typical event loop.  Now, I do a hybrid.  Yes, there is a fundamental message pump that pulls messages out of the system’s message queue, but then that pump turns around and stuffs them into an asynchronous queue, which is handled by a coroutine (cooperative multitasking).  That’s handy for a couple of reasons.

First of all, the message pump does not have to be completely synchronous with the rest of the program.  It can just pull messages out of the pipe, and stick them into the queue, rinse and repeat.  That makes the message pump very simple, and easily separable from the rest of the program.  Second, the part of the program that deals with processing messages does not have to know anything about the mechanics of the message pump.  All it knows it that it will be notified when there are messages to be processed.  It can then go pull as many messages from the async queue as it likes, and deal with them at its leisure.

Perhaps it’s only a point an asynchronous programmer could love.

Now that there’s a generic queue that can process commands, I want to make everything into commands.  So, here’s an example of a very typical case.  When I want to draw a line, there is typically some interface connected to the graphics API that looks like this:

Graphics:DrawLine(pt1, pt2)

That will talk to the rendering engine directly, and a line will magically appear.  But, what if under the covers, what was really happening was this:


void Graphics:DrawLine(pt1, pt2)
local cmd = Graphics:CreateDrawLine(pt1, pt2)
outboundqueue:Enque(cmd)
end

From the programmer’s perspective, there’s nothing different in their conceptual model.  They call DrawLine(), and that’s the end of it.  They assume a line is drawn, and it will be drawn, eventually.

From the system’s perspective, a whole world of possibilities just opened up.  Now that the command is packaged up and stuck into a queue, it could be immediately removed from the queue, and really executed against the graphics system, or it could be stored off in some persistent store somewhere, possibly on a distant part of the planet.  And, as long as you’ve got commands all packaged up and ready to go, you could ship them to other renderers while you’re at it.  That might be interesting and useful.

At any rate, it’s a fairly fundamental question to deal with.  Should you poll, or be notified, along with, should you call directly, or issue commands.  In the case of BanateCAD, there is a hybrid of poll/notify, and commands are definitely the way to go.