Splunking Windows – Extracting pleasure from legacy apis

If you are a modern programmer of Windows apps, there are numerous frameworks for you, hundreds of SDKs, scripted wrappers, IDEs to hide behind, and just layers upon layers of goodness to keep you safe and sane.  So, when it comes to using some of the core Windows APIs directly, you can be forgiven for not even knowing they exist, let alone how to use them from your favorite environment.

I’ve done a ton of exploration on the intricacies of the various Linux interfaces, Spelunking Linux goes over everything from auxv to procfs, and quite a few in between.  But, what about Windows?  Well, I’ve recently embarked on a new project lj2win32 (not to be confused with earlier LJIT2Win32).  The general purpose of this project is to bring the goodness of TINN to the average LuaJIT developer.  Whereas TINN is a massive project that strives to cover the entirety of the known world of common Windows interfaces, and provides a ready to go multi-tasking programming environment, lj2win32 is almost the opposite.  It does not provide its own shell, rather it just provides the raw bindings necessary for the developer to create whatever they want.  It’s intended to be a simple luarocks install, much in the way the ljsyscall works for creating a standard binding to UNIX kinds of systems without much fuss or intrusion.

In creating this project, I’ve tried to adhere to a couple of design principles to meet some objectives.

First objective is that it must ultimately be installable using luarocks.  This means that I have to be conscious about the organization of the file structure.  To wit, everything of consequence lives in a ‘win32’ directory.  The package name might ultimately be ‘win32’.  Everything is referenced from there.

Second objective, provide the barest minimum bindings.  Don’t change names of things, don’t introduce non-windows semantics, don’t create several layers of class hierarchies to objectify the interfaces.  Now, of course there are some very simple exceptions, but they should be fairly limited.  The idea being, anyone should be able to take this as a bare minimum, and add their own layers atop it.  It’s hard to resist objectifying these interfaces though, and everything from Microsoft’s ancient MFC, ATL, and every framework since, has thrown layers of object wrappers on the core Win32 interfaces.  In this case, wrappers and other suggestions will show up in the ‘tests’ directory.  That is fertile ground for all manner of fantastical object wrapperage.

Third objective, keep the dependencies minimal.  If you do programming in C on Windows, you include a couple of well known headers files at the beginning of your program, and the whole world gets dragged in.  Everything is pretty much in a global namespace, which can lead to some bad conflicts, but they’ve been worked out over time.  In lj2win32, there are only a couple things in the global namespace, everything else is either in some table, or within the ffi.C facility.  Additionally, the wrappings are clustered in a way that follows the Windows API Sets.  API sets are a mechanism Windows has for pulling apart interdependencies in the various libraries that make up the OS.  In short, it’s just a name (so happens to end in ‘.dll’) which is used by the loader to load in various functions.  If you use these special names, instead of the traditional ‘kernel32’, ‘advapi32’, you might pull in a smaller set of stuff.

With all that, I thought I’d explore one particular bit of minutia as an example of how things could go.

The GetSystemMetrics() function call is a sort of dumping ground for a lot of UI system information.  Here’s where you can find things like how big the screen is, how many monitors there are, how many pixels are used for the menu bars, and the like.  Of course this is just a wrapper on items that probably come from the registry, or various devices and tidbits hidden away in other databases throughout the system, but it’s the convenient developer friendly interface.

The signature looks like this

int WINAPI GetSystemMetrics(
_In_ int nIndex
);

A simple enough call. And a simple enough binding:

ffi.cdef[[
int GetSystemMetrics(int nIndex);
]]

Of course, there is the ‘nIndex’, which in the Windows headers is a bunch of manifest constants, which in LuaJIT might be defined thus:

ffi.cdef[[
	// Used for GetSystemMetrics
static const int	SM_CXSCREEN = 0;
static const int	SM_CYSCREEN = 1;
static const int	SM_CXVSCROLL = 2;
static const int	SM_CYHSCROLL = 3;
static const int	SM_CYCAPTION = 4;
static const int	SM_CXBORDER = 5;
static const int	SM_CYBORDER = 6;
]] 

 
Great. Then I can simply do

local value = ffi.C.GetSystemMetrics(ffi.C.SM_CXSCREEN)

 
Fantastic, I’m in business!

So, this meets the second objective of bare minimum binding. But, it’s not a very satisfying programming experience for the LuaJIT developer. How about just a little bit of sugar? Well, I don’t want to violate the same second objective of non-wrapperness, so I’ll create a separate thing in the tests directory. The systemmetrics.lua file contains a bit of an exploration in getting of system metrics.

It starts out like this:

local ffi = require("ffi")
local errorhandling = require("win32.core.errorhandling_l1_1_1");

ffi.cdef[[
int GetSystemMetrics(int nIndex);
]]

local exports = {}

local function SM_toBool(value)
	return value ~= 0
end

Then defines something like this:

exports.names = {
    SM_CXSCREEN = {value = 0};
    SM_CYSCREEN = {value = 1};
    SM_CXVSCROLL = {value = 2};
    SM_CYHSCROLL = {value = 3};
    SM_CYCAPTION = {value = 4};
    SM_CXBORDER = {value = 5};
    SM_CYBORDER = {value = 6};
    SM_CXDLGFRAME = {value = 7};
    SM_CXFIXEDFRAME = {value = 7};
    SM_CYDLGFRAME = {value = 8};
    SM_CYFIXEDFRAME = {value = 8};
    SM_CYVTHUMB = {value = 9};
    SM_CXHTHUMB = {value = 10};
    SM_CXICON = {value = 11};
    SM_CYICON = {value = 12};
    SM_CXCURSOR = {value = 13};
    SM_CYCURSOR = {value = 14};
    SM_CYMENU = {value = 15};
    SM_CXFULLSCREEN = {value = 16};
    SM_CYFULLSCREEN = {value = 17};
    SM_CYKANJIWINDOW = {value = 18, converter = SM_toBool};
    SM_MOUSEPRESENT = {value = 19, converter = SM_toBool};
    SM_CYVSCROLL = {value = 20};
    SM_CXHSCROLL = {value = 21};
    SM_DEBUG = {value = 22, converter = SM_toBool};
    SM_SWAPBUTTON = {value = 23, converter = SM_toBool};
    SM_RESERVED1 = {value = 24, converter = SM_toBool};
    SM_RESERVED2 = {value = 25, converter = SM_toBool};
    SM_RESERVED3 = {value = 26, converter = SM_toBool};
    SM_RESERVED4 = {value = 27, converter = SM_toBool};
}

And finishes with a flourish like this:

local function lookupByNumber(num)
	for key, entry in pairs(exports.names) do
		if entry.value == num then
			return entry;
		end
	end

	return nil;
end

local function getSystemMetrics(what)
	local entry = nil;
	local idx = nil;

	if type(what) == "string" then
		entry = exports.names[what]
		idx = entry.value;
	else
		idx = tonumber(what)
		if not idx then 
			return nil;
		end
		
		entry = lookupByNumber(idx)

        if not entry then return nil end
	end

	local value = ffi.C.GetSystemMetrics(idx)

    if entry.converter then
        value = entry.converter(value);
    end

    return value;
end

-- Create C definitions derived from the names table
function exports.genCdefs()
    for key, entry in pairs(exports.names) do
        ffi.cdef(string.format("static const int %s = %d", key, entry.value))
    end
end

setmetatable(exports, {
	__index = function(self, what)
		return getSystemMetrics(what)
	end,
})

return exports

All of this allows you to do a couple of interesting things. First, what if you wanted to print out all the system metrics. This same technique can be used to put all the metrics into a table to be used within your program.

local sysmetrics = require("systemmetrics");

local function testAll()
    for key, entry in pairs(sysmetrics.names) do
        local value, err = sysmetrics[key]
        if value ~= nil then
            print(string.format("{name = '%s', value = %s};", key, value))
        else
            print(key, err)
        end
    end
end

OK, so what? Well, the systemmetrics.names is a dictionary matching a symbolic name to the value used to get a particular metric. And what’s this magic with the ‘sysmetrics[key]’ thing? Well, let’s take a look back at that hand waving from the systemmetrics.lua file.

setmetatable(exports, {
	__index = function(self, what)
		return getSystemMetrics(what)
	end,
})

Oh, I see now, it’s obvious…

So, what’s happening here with the setmetatable thing is, Lua has a way of setting some functions on a table which will dictate the behavior they will exhibit in certain situations. In this case, the ‘__index’ function, if it exists, will take care of the cases when you try to look something up, and it isn’t directly in the table. So, in our example, doing the ‘sysmetrics[key]’ thing is essentially saying, “Try to find a value with the string associated with ‘key’. If it’s not found, then do whatever is associated with the ‘__index’ value”. In this case, ‘__index’ is a function, so that function is called, and whatever that returns becomes the value associated with that key.

I know, it’s a mouth full, and metatables are one of the more challenging aspects of Lua to get your head around, but once you do, it’s a powerful concept.

How about another example which will be a more realistic and typical case.

local function testSome()
    print(sysmetrics.SM_MAXIMUMTOUCHES)
end

In this case, the exact same mechanism is at play. In Lua, there are two ways to get a value out of a table. The first one we’ve already seen, where the ‘[]’ notation is used, as if the thing were an array. In the ‘testSome()’ case, the ‘.’ notation is being utilized. This is accessing the table as if it were a data structure, but it’s exactly the same as trying to access as an array, at least as far as the invocation of the ‘__index’ function is concerned. The ‘SM_MAXIMUMTOUCHES’ is taken as a string value, so it’s the same as doing: sysmetrics[‘SM_MAXIMUMTOUCHES’], and from the previous example, we know how that works out.

Now, there’s one more thing to note from this little escapade. The implementation of the helper function:

local function getSystemMetrics(what)
	local entry = nil;
	local idx = nil;

	if type(what) == "string" then
		entry = exports.names[what]
		idx = entry.value;
	else
		idx = tonumber(what)
		if not idx then 
			return nil;
		end
		
		entry = lookupByNumber(idx)

        if not entry then return nil end
	end

	local value = ffi.C.GetSystemMetrics(idx)

    if entry.converter then
        value = entry.converter(value);
    end

    return value;
end

There’s all manner of nonsense in here. The ‘what’ can be either a string or something that can be converted to a number. This is useful because it allows you to pass in symbolic names like “SM_CXBLAHBLAHBLAH” or a number 123. That’s great depending on what you’re interacting with and how the values are held. You might have some UI for example where you just want to use the symbolic names and not deal with numbers.

The other thing of note is that ‘entry.converter’ bit at the end. If you look back at the names table, you’ll notice that some of the entries have a ‘converter’ field associated with them. this is an optional function that can be associated with the entries. If it exists, it is called, with the value from the system called passed to it. In most cases, what the system returns is a number (number of mouse buttons, size of screen, etc). In some cases, the value returned is ‘0’ for false, and ‘non-zero’ for true. Well, as a Lua developer, I’d rather just get a bool in those cases where it’s appropriate, and this helper function is in a position to provide that for me. This is great because it allows me to not have to check the documentation to figure it out.

There’s one more tiny gem hidden in all this madness.

function exports.genCdefs()
    for key, entry in pairs(exports.names) do
        ffi.cdef(string.format("static const int %s = %d", key, entry.value))
    end
end

What does this do exactly? Simply, it generates those constants in the ffi.C space, so that you can still do this:

ffi.C.GetSystemMetrics(ffi.C.SM_MAXIMUMTOUCHES)

So, there you have it. You can go with the raw traditional sort of ffi binding, or you can spice things up a bit and make things a bit more useful with a little bit of effort. I like doing the latter, because I can generate the more traditional binding from the table of names that I’ve created. That’s a useful thing for documentation purposes, and in general.

I have stuck to my objectives, and this little example just goes to prove how esoteric minute details can be turned into approachable things of beauty with a little bit of Lua code.

Advertisements

SVG And Me – Don’t tell me, just another database!

A picture is worth 175Kb…

grapes

So, SVG right? Well, the original was, but this image was converted to a .png file for easy embedding in WordPress. The file size of the original grapes.svg is 75K. That savings in space is one of the reasons to use .svg files whenever you can.

But, I digress. The remotesvg project has been moving right along.

Last time around, I was able to use Lua syntax as a stand in for the raw .svg syntax.  That has some benefits because since your in a programming language, you can use programming constructs such as loops, references, functions and the like to enhance the development of your svg.  That’s great when you’re creating something from scratch programmatically, rather than just using a graphical editing tool such as inkscape to construct your .svg.  If you’re constructing a library of svg handling routines, you need a bit more though.

This time around, I’m adding in some parsing of svg files, as well as general manipulation of the same from within Lua.  Here’s a very simple example of how to read an svg file into a lua table:

 

local parser = require("remotesvg.parsesvg")

local doc = parser:parseFile("grapes.svg");

That’s it! You now have the file in a convenient lua table, ready to be manipulated. But wait, what do I have exactly? Let’s look at a section of that file and see what it gives us.

    <linearGradient
       inkscape:collect="always"
       id="linearGradient4892">
      <stop
         style="stop-color:#eeeeec;stop-opacity:1;"
         offset="0"
         id="stop4894" />
      <stop
         style="stop-color:#eeeeec;stop-opacity:0;"
         offset="1"
         id="stop4896" />
    </linearGradient>
    <linearGradient
       inkscape:collect="always"
       xlink:href="#linearGradient4892"
       id="linearGradient10460"
       gradientUnits="userSpaceOnUse"
       gradientTransform="translate(-208.29289,-394.63604)"
       x1="-238.25415"
       y1="1034.7042"
       x2="-157.4043"
       y2="1093.8906" />

This is part of the definitions, which later get used on portions of representing the grapes. A couple of things to notice. As a straight ‘parsing’, you’ll get a bunch of text values. For example: y2 = “109.8906”, that will turn into a value in the lua table like this: {y2 = “109.8906”}, the ‘109.8906’ is still a string value. That’s useful, but a little less than perfect. Sometimes, depending on what I’m doing, retaining that value as a string might be just fine, but sometimes, I’ll want that value to be an actual lua number. So, there’s an additional step I can take to parse the actual attributes values and turn them into a more native form:

local parser = require("remotesvg.parsesvg")

local doc = parser:parseFile("grapes.svg");
doc:parseAttributes();

doc:write(ImageStream)

That line with doc:parseAttributes(), tells the document to go through all its attributes and parse them, turning them into more useful values from the Lua perspective. In the case above, the representation of ‘y2’ would become: {y2 = 109.8906}, which is a string value.

This gets very interesting when you have values where the string representation and the useful lua representation are different.

<svg>
<line x1="10", y1="20", width = "10cm", height= "12cm" />
</svg>

This will be turning into:

{
  x1 = {value = 10},
  y1 = {value = 20},
  width = {value = 10, units = 'cm'},
  height = {value = 12, units = 'cm'}
}

Now, in my Lua code, I can access these values like so:

local doc = parser:parseFile("grapes.svg");
doc:parseAttributes();
print(doc.svg[1].x1.value);

When I subsequently want to write this value out as valid svg, it will turn back into the string representation with no loss of fidelity.

Hidden in this example is a database query. How do I know that doc.svg[1] is going to give me the ” element that I’m looking for? In this particular case, it’s only because the svg is so simple that I know for a fact that the ” element is going to show up as the first child in the svg document. But, most of the time, that is not going to be the case.

In any svg that’s of substance, there is the usage of various ‘id’ fields, and that’s typically what is used to find an element. So, how to do that in remotesvg? If we look back at the example svg, we see this ‘id’ attribute on the first gradient: id=”linearGradient4892″.

How could I possibly find that gradient element based on the id field? Before that though, let’s look at how to enumerate elements in the document in the first place.

local function printElement(elem)
    if type(elem) == "string" then
        -- don't print content values
        return 
    end
    
    print(string.format("==== %s ====", elem._kind))

    -- print the attributes
    for name, value in elem:attributes() do
        print(name,value)
    end
end

local function test_selectAll()
    -- iterate through all the nodes in 
    -- document order, printing something interesting along
    -- the way

    for child in doc:selectAll() do
	   printElement(child)
    end
end

Here is a simple test case where you have a document already parsed, and you want to iterate through the elements, in document order, and just print them out. This is the first step in viewing the document as a database, rather than as an image. The working end of this example is the call to ‘doc:selectAll()’. This amounts to a call to an iterator that is on the BasicElem class, which looks like this:

--[[
	Traverse the elements in document order, returning
	the ones that match a given predicate.
	If no predicate is supplied, then return all the
	elements.
--]]
function BasicElem.selectElementMatches(self, pred)
	local function yieldMatches(parent, predicate)
		for idx, value in ipairs(parent) do
			if predicate then
				if predicate(value) then
					coroutine.yield(value)
				end
			else
				coroutine.yield(value)
			end

			if type(value) == "table" then
				yieldMatches(value, predicate)
			end
		end
	end

  	return coroutine.wrap(function() yieldMatches(self, pred) end)	
end

-- A convenient shorthand for selecting all the elements
-- in the document.  No predicate is specified.
function BasicElem.selectAll(self)
	return self:selectElementMatches()
end

As you can see, ‘selectAll()’ just turns around and calls ‘selectElementMatches()’, passing in no parameters. The selectElementMatches() function then does the actual work. In Lua, there are a few ways to create iterators. In this particular case, where we want to recursive traverse down a hierarchy of nodes (document order), it’s easiest to use this coroutine method. You could instead keep a stack of nodes, pushing as you go down the hierarchy, popping as you return back up, but this coroutine method is much more compact to code, if a bit harder to understand if you’re not used to coroutines. The end result is an iterator that will traverse down a document hierarchy, in document order.

Notice also that the ‘selectElementMatches’ function takes a predicate. A predicate is simply a function that takes a single parameter, and will return ‘true’ or ‘false’ depending on what it sees there. This will become useful.

So, how to retrieve an element with a particular ID? Well, when we look at our elements, we know that the ‘id’ field is one of the attributes, so essentially, what we want to do is traverse the document looking for elements that have an id attribute that matches what we’re looking for.

function BasicElem.getElementById(self, id)
    local function filterById(entry)
        print("filterById: ", entry.id, id)
        if entry.id == id then
            return true;
        end
    end

    for child in self:selectMatches(filterById) do
        return child;
    end
end

Here’s a convenient function to do just that. And to use it:

local elem = doc:getElementById("linearGradient10460")

That will retrieve the second linear gradient of the pair of gradients from our svg fragment. That’s great! And the syntax is looking very much like what I might write in javascript against the DOM. But, it’s just a database!

Given the selectMatches(), you’re not just limited to querying against attribute values. You can get at anything, and form as complex queries as you like. For example, you could find all the elements that are deep green, and turn them purple with a simple query loop.

Here’s an example of finding all the elements of a particular kind:

local function test_selectElementMatches()
    print("<==== selectElementMatches: entry._kind == 'g' ====>")
	for child in doc:selectElementMatches(function(entry) if entry._kind == "g" then return true end end) do
		print(child._kind)
	end
end

Or finding all the elements that have a ‘sodipodi’ attribute of some kind:

local function test_selectAttribute()
    -- select the elements that have an attribute
    -- with the name 'sodipodi' in them
    local function hasSodipodiAttribute(entry)
        if type(entry) ~= "table" then
            return false;
        end

        for name, value in entry:attributes() do
            --print("hasSodipodi: ", entry._kind, name, value, type(name))
            if name:find("sodipodi") then
                return true;
            end
        end

        return false
    end

    for child in doc:selectElementMatches(hasSodipodiAttribute) do
        if type(child) == "table" then
            printElement(child)
        end
    end
end

Of course, just finding these elements is one thing. Once found, you can use this to filter out those elements you don’t want. for example, eliminating the ones that are inkscape specific.

Well, there you have it. First, you can construct your svg programmatically using Lua syntax. Alternatively, you can simply parse a svg file into a lua structure. Last, you can query your document, no matter how it was constructed, for fun and profit.

Of course, the real benefit of being able to parse, and find elements and the like, is it makes manipulating the svg that much easier. Find the node that represents the graph of values, for example, and change those values over time for some form of animation…


Note To Self – Enumerating bit flags

I’ve been trawling through the Linux V4L2 group of libraries of late as part of LLUI.  v4l2 is one of those sprawling libraries that does all things for all people in terms of video on Linux machines.  It’s roughly equivalent to oh so many similar things from the past on the Windows side.  This is one of the libraries you might utilize if you were to get into streaming from your webcam programmatically.  Of course, you could just read from it directly with libusb, but then you lose out on all the nifty format conversions, and I miss this chance to write another pointless reminder for my later coding self.

So, what’s got me so bothered this time around?  Well, lets say I’m just parusing my system, turning everything into a database as I go along.  I’d like to get a hold of my webcam, and see what it’s capable of.  There’s a call for that of course.  Once you make the appropriate IOCtl call, you end up with a struct that looks like this:

 
[soucecode]
struct v4l2_capability {
uint8_t driver[16];
uint8_t card[32];
uint8_t bus_info[32];
uint32_t version;
uint32_t capabilities;
uint32_t device_caps;
uint32_t reserved[3];
};
[/sourcecode]

The driver, card, and bus_info fields are pretty straight forward as they are simple ‘null terminated’ strings, so you have print them out if you like. It’s that ‘capabilities’ field that gives me fits. This is one of those combined bit flags sort of things. The value can be a combination of any of the numerous ‘capability’ flags, which are these:

-- Values for 'capabilities' field
caps = {
	V4L2_CAP_VIDEO_CAPTURE		= 0x00000001 ; -- Is a video capture device */
	V4L2_CAP_VIDEO_OUTPUT		= 0x00000002; -- Is a video output device */
	V4L2_CAP_VIDEO_OVERLAY		= 0x00000004; -- Can do video overlay */
	V4L2_CAP_VBI_CAPTURE		= 0x00000010; -- Is a raw VBI capture device */
	V4L2_CAP_VBI_OUTPUT			= 0x00000020; -- Is a raw VBI output device */
	V4L2_CAP_SLICED_VBI_CAPTURE	= 0x00000040; -- Is a sliced VBI capture device */
	V4L2_CAP_SLICED_VBI_OUTPUT	= 0x00000080; -- Is a sliced VBI output device */
	V4L2_CAP_RDS_CAPTURE		= 0x00000100; -- RDS data capture */
	V4L2_CAP_VIDEO_OUTPUT_OVERLAY	= 0x00000200; -- Can do video output overlay */
	V4L2_CAP_HW_FREQ_SEEK		= 0x00000400; -- Can do hardware frequency seek  */
	V4L2_CAP_RDS_OUTPUT			= 0x00000800; -- Is an RDS encoder */

	V4L2_CAP_VIDEO_CAPTURE_MPLANE	= 0x00001000;
	V4L2_CAP_VIDEO_OUTPUT_MPLANE	= 0x00002000;
	V4L2_CAP_VIDEO_M2M_MPLANE		= 0x00004000;
	V4L2_CAP_VIDEO_M2M				= 0x00008000;

	V4L2_CAP_TUNER			= 0x00010000; -- has a tuner */
	V4L2_CAP_AUDIO			= 0x00020000; -- has audio support */
	V4L2_CAP_RADIO			= 0x00040000; -- is a radio device */
	V4L2_CAP_MODULATOR		= 0x00080000; -- has a modulator */

	V4L2_CAP_READWRITE              = 0x01000000; -- read/write systemcalls */
	V4L2_CAP_ASYNCIO                = 0x02000000; -- async I/O */
	V4L2_CAP_STREAMING              = 0x04000000; -- streaming I/O ioctls */
}

For the embedded webcam in my laptop, the reported value is: 0x04000001;

Of course, when you’re doing something programmatically, and you just want to check whether a particular flag is set or not, you can just do:

canStream = band(V4L2_CAP_STREAMING, 0x04000001) ~= 0

Very common, and probably some of the most common code you’ll see anywhere. But what else? For various reasons, I want to create the string values for those bit fields, and use those values as keys to tables, or just to print, or to send somewhere, or display, or what have you.

I’ve seen enough ‘C’ code deal with this there is a common patter. First create the #define, or enum statement which encapsulates the values for all the flags. Then, to get the values as strings, create a completely separate string table, which does the mapping of the nice tight enum values and the string values. Then write a little lookup function which can go from the value to the string.

Well, here’s one of those things I love about Lua. In this case, the program IS the database. No need for those parallel representations. Here’s some code:

local pow = math.pow
local bit = require("bit")
local lshift, rshift, band, bor = bit.lshift, bit.rshift, bit.band, bit.bor

local function getValueName(value, tbl)
	for k,v in pairs(tbl) do
		if v == value then
			return k;
		end
	end

	return nil;
end

local function enumbits(bitsValue, tbl, bitsSize)
	local function name_gen(params, state)

		if state >= params.bitsSize then return nil; end

		while(true) do
			local mask = pow(2,state)
			local maskedValue = band(mask, params.bitsValue)
--print(string.format("(%2d) MASK [%x] - %#x", state, mask, maskedValue))			
			if maskedValue ~= 0 then
				return state + 1, getValueName(maskedValue, params.tbl) or "UNKNOWN"
			end

			state = state + 1;
			if state >= params.bitsSize then return nil; end
		end

		return nil;
	end

	return name_gen, {bitsValue = bitsValue, tbl = tbl, bitsSize = bitsSize or 32}, 0
end

return enumbits

The function “getValueName()” at the top there simply does a reverse lookup in the table. That is, given a value, return the string that represents that value (is the string key for that value).

Next, the “enumbits()” function is an enumerator. It will iterate over the bit flags, returning a string name for all the ones that are set to ‘1’, and nothing for any of the other bits. Here’s an example:

local bit = require("bit")
local lshift, rshift, band, bor = bit.lshift, bit.rshift, bit.band, bit.bor

local enumbits = require("enumbits")

local testtbl = {
	LOWEST 	= 0x0001;
	MEDIUM 	= 0x0002;
	HIGHEST = 0x0004;
	MIGHTY 	= 0x0008;
	SLUGGO 	= 0x0010;
	MUGGO 	= 0x0020;
	BUGGO 	= 0x0040;
	PUGGO 	= 0x0080;
}


local caps = {
	V4L2_CAP_VIDEO_CAPTURE		= 0x00000001 ; -- Is a video capture device */
	V4L2_CAP_VIDEO_OUTPUT		= 0x00000002; -- Is a video output device */
	V4L2_CAP_VIDEO_OVERLAY		= 0x00000004; -- Can do video overlay */
	V4L2_CAP_VBI_CAPTURE		= 0x00000010; -- Is a raw VBI capture device */
	V4L2_CAP_VBI_OUTPUT			= 0x00000020; -- Is a raw VBI output device */
	V4L2_CAP_SLICED_VBI_CAPTURE	= 0x00000040; -- Is a sliced VBI capture device */
	V4L2_CAP_SLICED_VBI_OUTPUT	= 0x00000080; -- Is a sliced VBI output device */
	V4L2_CAP_RDS_CAPTURE		= 0x00000100; -- RDS data capture */
	V4L2_CAP_VIDEO_OUTPUT_OVERLAY	= 0x00000200; -- Can do video output overlay */
	V4L2_CAP_HW_FREQ_SEEK		= 0x00000400; -- Can do hardware frequency seek  */
	V4L2_CAP_RDS_OUTPUT			= 0x00000800; -- Is an RDS encoder */

	V4L2_CAP_VIDEO_CAPTURE_MPLANE	= 0x00001000;
	V4L2_CAP_VIDEO_OUTPUT_MPLANE	= 0x00002000;
	V4L2_CAP_VIDEO_M2M_MPLANE		= 0x00004000;
	V4L2_CAP_VIDEO_M2M				= 0x00008000;

	V4L2_CAP_TUNER			= 0x00010000; -- has a tuner */
	V4L2_CAP_AUDIO			= 0x00020000; -- has audio support */
	V4L2_CAP_RADIO			= 0x00040000; -- is a radio device */
	V4L2_CAP_MODULATOR		= 0x00080000; -- has a modulator */

	V4L2_CAP_READWRITE              = 0x01000000; -- read/write systemcalls */
	V4L2_CAP_ASYNCIO                = 0x02000000; -- async I/O */
	V4L2_CAP_STREAMING              = 0x04000000; -- streaming I/O ioctls */
}

local function printBits(bitsValue, tbl)
	tbl = tbl or testtbl
	for _, name in enumbits(bitsValue, tbl) do
		io.write(string.format("%s, ",name))
	end
	print()
end

-- single bits
printBits(lshift(1,0))
printBits(lshift(1,1))
printBits(lshift(1,31))

-- combined bits
printBits(0x0045)
printBits(0x04000001, caps)

With that last test case, what you’ll get is the output:

V4L2_CAP_VIDEO_CAPTURE, V4L2_CAP_STREAMING

Well that’s handy, particularly when you’re doing some debugging. Just a simple 20 line iterator, and you’re in business, printing flag fields like a boss! That is, if you’re in the lua environment, or any dynamic programming environment that supports iteration of a dictionary.

So, this note to future self is about pointing out the fact that even bitflags are nothing than a very compact form of database. Unpacking them into human readable, programmable form, requires just the right routine, and away you go, you never have to bother with dealing with this little item again. Great for debugging, great for sticking keys in tables, great for displaying on controls!


Optimize At the Right Time and Level

It takes a long time to do the simplest things correctly. You want to receive some bytes on a socket?
int recv(SOCKET s,char *buf,int len,int flags);

No problem right? How about that socket? How did that get setup? Is it right for low latency, big buffers, small buffers? what? And what about that buffer? and lord knows which are the best flags to use? First question I guess is, ‘for what purpose’?

For the longest time, I have known that my IP sockets were not the most performant. Through rounds of trying to get low level networking, all the way up through http handling working correctly, I deliberately made some things extremely brain dead simple.

One of those things was my readline() function. this is the workhorse of http processing. Previously, I would literally read a single character at a time, looking for those line terminators. Which meant making that recv() call a bazillion times during normal http processing.

Well, it has worked great, reduces the surface area to help flesh out various bugs, and allowed me to finally implement the IO Completion Port based networking stack that I now have. It works well enough. But, it’s time to optimize. I am now about to add various types of streams such as compressed, encrypted, and the like. For that I need to actually buffer up chunks at a time, not read single characters at a time.

So, I finally optimized this code.

function IOCPNetStream:refillReadingBuffer()
	print("IOCPNetStream:RefillReadingBuffer(): ",self.ReadingBuffer:BytesReadyToBeRead());

	-- if the buffer already has data in it
	-- then just return the number of bytes that
	-- are currently ready to be read.
	if self.ReadingBuffer:BytesReadyToBeRead() > 0 then
		return self.ReadingBuffer:BytesReadyToBeRead();
	end

	-- If there are currently no bytes ready, then we need
	-- to refill the buffer from the source
	local err
	local bytesread

	bytesread, err = self.Socket:receive(self.ReadingBuffer.Buffer, self.ReadingBuffer.Length)

	print("-- LOADED BYTES: ", bytesread, err);

	if not bytesread then
		return false, err;
	end

	if bytesread == 0 then
		return false, "eof"
	end

	self.ReadingBuffer:Reset()
	self.ReadingBuffer.BytesWritten = bytesread

	return bytesread
end

Basically, whenever some bytes are needed within the stream, the refillReadingBuffer() function is called. If there are still bytes sitting in the buffer, then those bytes are used. If it’s empty, then new bytes are read in. The buffer can be whatever size you want, so if your app is better off reading 1500 bytes at a time, then make it so. If you find performance is best at 4096 bytes per read, then set it up that way.

Now that pre-fetching is occuring, all operations related to reading bytes from the source MUST go through this buffer, or bytes will be missed. So, the readLine() function looks like this now:

function IOCPNetStream:readByte()
	-- First see if we can get a byte out of the
	-- Reading buffer
	local abyte,err = self.ReadingBuffer:ReadByte()

	if abyte then
		return abyte
	end

	-- If we did not get a byte out of the reading buffer
	-- try refilling the buffer, and try again
	local bytesread, err = self:refillReadingBuffer()

	if bytesread then
		abyte, err = self.ReadingBuffer:ReadByte()
		return abyte, err
	end

	-- If there was an error
	-- then return that error immediately
	print("-- IOCPNetStream:readByte, ERROR: ", err)
	return false, err
end

function IOCPNetStream:readOneLine(buffer, size)
	local nchars = 0;
	local ptr = ffi.cast("uint8_t *", buff);
	local bytesread
	local byteread
	local err

	while nchars < size do
		byteread, err = self:readByte();

		if not byteread then
			-- err is either "eof" or some other socket error
			break
		else
			if byteread == 0 then
				break
			elseif byteread == LF then
				--io.write("LF]\n")
				break
			elseif byteread == CR then
				-- swallow it and continue on
				--io.write("[CR")
			else
				-- Just a regular character, so save it
				buffer[nchars] = byteread
				nchars = nchars+1
			end
		end
	end

	if err and err ~= "eof" then
		return nil, err
	end

	return nchars
end

The readLine() is still reading a byte at a time, but this time, instead of going to the underlying socket directly, it’s calling the stream’s version of readByte(), which in turn is reading a single byte from the pre-fetched buffer. That’s a lot faster than making the recv() system call, and when the buffer runs out of bytes, it will be automatically refilled, until there is a socket error.

Well, that’s just great. The performance boost is one thing, but there is another benefit. Now that I am reading in controllable chunks at a time, I could up the size to say 16K for the first chunk read for an http request. That would likely get all the data for whatever headers there are included in the buffer up front. From there I could do simple pointer offset/size objects, instead of creating actual strings for every line (creates copies). That would be a great optimization.

I can wait for further optimizations. Getting the pre-fetch working, and continuing to work with IOCP is a more important optimization at this time.

And so it goes. Don’t optimize too early until you really understand the performance characteristics of your system. Then, only belatedly, do what minimal amount you can to achieve the specific goal you are trying to achieve before moving on.


false or nil, which is the better indicator?

I find that it’s the minutia at the lowest level of my frameworks that makes the difference between an elegant well executing system and a bug farm that causes constant pain and suffering. One of the design decisions I struggle with is “should I return nil or ‘false'”.

local func1 = function(condition)
  if condition then
    return true
  end

  return nil
end

local func2 = function(condition)
  if condition then
    return true
  end
  return false
end

if func1(true) then print("Hello, World") end;
if not func1(false) then print("no hope") end;
if not func2(false) then print ("false profit") end;

In these contrived cases, the consequence of using ‘nil’ or ‘false’ does not make a difference. When I perform the ‘if not’ test at the end, both ‘nil’ and ‘false’ will evaluate to ‘false’ and the statement is printed.

I would say that in 99% of the cases in my code, this is perfectly fine, and I can just use either nil or ‘false’ without worrying about the difference.

When you’re capturing return values, and storing the results in a table, then things get much more interesting.

local func1 = function()
  return nil, nil, false
end

local func2 = function()
  return false, "message2", nil
end

local func3 = function()
  return nil, "message3", false, true
end

local func4 = function()
  return nil, "message3", nil, false, true
end

res1 = {func1()};
res2 = {func2()};
res3 = {func3()};
res4 = {func4()};

The key ‘funky’ thing here has to do with how Lua stores ‘nil’ values, and the dual nature of the table structure as both an array and a dictionary. In this case, I a table is constructed by simply wrapping the return value of each of the functions. You might do something like this if you were writing a scheduler which executes a function and captures the returned values in a table, or some similar situation.

print("Size Res1: ", #res1);
>Size Res1: 0

print("Size Res2: ", #res2);
>Size Res2: 2

print("Size Res3: ", #res3);
>Size Res3: 4

print("Size Res4: ", #res4);
>Size Res4: 0

What?

Yah, it’s kind of funky. The challenge is with using the ‘#’ – __len operator is the heuristic to determine the size of a table is kind of funky. As you can see, if there are any gaps in values, as in cases 1, and 4, the thing will be considered to be a dictionary, instead of an array, and thus the length will be reported as zero. On the other hand, if the ‘nil’ is at the beginning of the sequence, the table will still be considered to be an array. You see this with res3. Of course, the fact that it also counts the ‘nil’ value at the beginning as part of the size, is not a good thing.

In the case of res2, you get a size of 2 as there’s no gap that includes a nil, and the trailing ‘nil’ value is not counted.

This seems very wacky behavior. Certainly not reasonably deterministic from a consumer’s perspective. But, there it is.

What this indicates to me is a couple of things. First of all, if you’re going to do something like stuff the return values of a function into a table in this way, you need to be very careful. If you’re then going to iterate those results using the following mechanism:

for i=1,#res do
  somefunc(res[i]);
end

Be extremely careful! Depending on what’s being returned, you may not get the results you’re expecting.

The other thing this little exercise indicates to me is that I want to be very deliberate about how I choose my return values. Instead of being fairly loose, flipping a coin each time to decide whether I will return ‘nil’ or ‘false’, 99% of the time, I’m now going to return ‘false’ instead of nil, unless I have a really good reason for returning ‘nil’.

This will become an issue when I’ve got class constructors, and I immediately start trying to exercise code, but on the other hand, I should be checking the return value of a class constructor anyway, and that check will work properly if I use ‘false’ or ‘nil’, so I’ll favor ‘false’.

And there you have it. The minutia that kills otherwise awesome code. If you pay attention to it early on, you avoid tremendous headaches later.


Lua Coroutines – Getting Started

Sometimes I run across a concept that’s really exciting, and I want to learn/know/master it, but my brain just throws up block after block, preventing me from truly learning it. Lua’s coroutines are kind of like that for me. I mean, think of the prospect of using ‘light weight threads’ in your programs. Lua’s coroutines aren’t ‘threads’ at all though, at least not in the usual ‘preemptive multitasking’ sense that one might think.

Lua coroutines are like programming in Windows 1.0. Basically, cooperative multi-tasking. If any one ‘thread’ doesn’t cooperate, the whole system comes to a standstill until that one task decides to take a break. So, some amount of work is required to do things correctly.

How to get started though?

Well, first of all, I start with a simple function:

local routines = {}

routines.doOnce = function(phrase)
  print(phrase)
end

routines.doOnce("Hello, World")
>Hello, World

A simple routine, that prints whatever I tell it to print. Fantastic, woot! Raise the roof up!!

Why did I go through the trouble of creating a table, and having a function, just to print? All will become clear in a bit.

Now, how about another routine? Perhaps one that does something in a loop:

routines.doLoop = function(iterations)
  for i=1,iterations do
    print("Iteration: ", i)
  end
end

routines.doLoop(3)
>Iteration: 1
>Iteration: 2
>Iteration: 3

Fantastic! Now I have one routine that prints a single thing, and exits, and another that prints a few numbers and exits.

Now just one last one:

routines.doTextChar = function(phrase)
  local nchars = #phrase

  for i=1,nchars do
    print(phrase:sub(i,i))
  end
end
>routines.doTextChar("brown")
>b
>r
>o
>w
>n

Now, what I really want to do is to be able to run all three of these routines “at the same time”, without a lot of fuss. That is, I want to allow the doOnce() routine to run once, then I want the doTextChar(), and doLoop() routines to interleave their work, first one going, then the next.

This is where lua’s coroutines come into the picture.

First of all, I’ll take the doOnce as an example. In order to run it as a corutine, there isn’t anything that needs to change about the routine. But, you do have to start using some coroutine machinery. Instead of calling the routine directly as usual, you call it using coroutine.resume(), and before that, you have to create something to resume, what Lua knows as a ‘thread’, using coroutine.create(). Like this:

local doOnceT = coroutine.create(doOnce)
coroutine.resume(doOnceT, "Hello, World");

This will have the same effect as just running the routine directly, but will run it as a coroutine. Not particularly useful in this case, but it sets us up for some further success.

The looping routines are a little bit more interesting. You could do the following:

local doLoopT = coroutine.create(doLoop)
coroutine.resume(doLoopT, 3)

That will have the same output as before. But, it won’t be very cooperative. The way in which a routine signals that it’s willing to give up a little slice of it’s CPU time is by calling the ‘coroutine.yield()’ function. So, to alter the original routine slightly:

routines.doLoop = function(iterations)
  for i=1,iterations do
    print("Iteration: ", i)
    coroutine.yield();
  end
end

Now, running will do exactly the same thing, but we’re now setup to cooperate with multiple tasks.

I will alter the ‘doTextChar()’ routine in a similar way:

routines.doTextChar = function(phrase)
  local nchars = #phrase

  for i=1,nchars do
    print(phrase:sub(i,i))
    coroutine.yield();
  end
end

OK, so now I have three routines, which are ready for doing some coroutine cooperation. But how to do that? The essential ingredient that is missing is a scheduler, or rather, something to take care of the mundane task of resuming each routine until it is done.

So, in comes the SimpleDispatcher:

local Collections = require "Collections"

local SimpleDispatcher_t = {}
local SimpleDispatcher_mt = {
  __index = SimpleDispatcher_t
}

local SimpleDispatcher = function()
  local obj = {
    tasklist = Collections.Queue.new();
  }
  setmetatable(obj, SimpleDispatcher_mt)

  return obj
end


SimpleDispatcher_t.AddTask = function(self, atask)
  self.tasklist:Enqueue(atask)	
end

SimpleDispatcher_t.AddRoutine = function(self, aroutine, ...)
  local routine = coroutine.create(aroutine)
  local task = {routine = routine, params = {...}}
  self:AddTask(task)

  return task
end

SimpleDispatcher_t.Run =  function(self)
  while self.tasklist:Len() > 0 do
    local task = self.tasklist:Dequeue()
    if not task then 
      break
    end

    if coroutine.status(task.routine) ~= "dead" then
      local status, values = coroutine.resume(task.routine, unpack(task.params));

      if coroutine.status(task.routine) ~= "dead" then
        self:AddTask(task)
      else
        print("TASK FINISHED")
      end
    else
      print("DROPPING TASK")
    end
  end
end

return SimpleDispatcher

Before explaining much, here is how I would use it:

local runner = SimpleDispatcher();

runner:AddRoutine(routines.doOnce, "Hello, World");
runner:AddRoutine(routines.doLoop, 3)
runner:AddRoutine(routines.doTextChar, "The brown")

runner:Run();

Basically, create an instance of this simple dispatcher.
Then, add some routines to it for execution. AddRoutine() does not actually start the routines going, it just queues them up to be run using coroutine.resume().
Lastly, call ‘runner:Run’, which will start a loop going, resulting in each of the tasks being called one after the other. This will result in the following output:

>Hello, World
>1
>T
>2
>h
>3
>e
>
>b
>r
>o
>w
>n

Basically, each time a routine calls ‘coroutine.yield()’, it’s giving up for a bit, and allowing the dispatcher to call the next routine. When it comes back around to calling the routine that gave up a slice of CPU time, it will resume from wherever the ‘yield()’ was called. And that’s the really part of coroutines!

I find the easiest way to think about coroutines is to simply code something, as if it were going to be an independent ‘thread’. Then I throw in a few choice yield() calls here and there to ensure the thread cooperates with other threads that might be running.

There are quite exotic things you can do with coroutines, and all sorts of frameworks to make working with them better/easier, and perhaps more mysterious.

For my simplistic pea brain though, this is a good place to start. Just taking simple routines, and handing them to the dispatcher.

That dispatcher is quite a nice little piece of work. If you think about it, it’s the equivalent of the scheduler within any OS. Assuming your Lua code is running in a single threaded process, being able to do your own scheduling is quite a powerful thing. You can add all sorts of exotics like aging, and thread priority. You can add timers, and I/O event or what have you. The beauty is, the scheduler can become as complex as you care to make it, and your coroutine code does not have to do anything special to adapt to it, just like with thread code in the OS proper.

So, there you have it. Getting started with Lua coroutines.

Now, if only I could combine socket pooling and this Dispatcher in some meaningful way. I wonder…


Examining C++ 11

I was recently looking at doing some more C/C++ coding, and was wandering through the interwebs and came across this succinct explanation of what’s in C++ 11:

http://blog.smartbear.com/software-quality/bid/167271/The-Biggest-Changes-in-C-11-and-Why-You-Should-Care

From this very brief description of the key features, I came to the conclusion that there’s not much there for me really.  There are some good things like ‘auto’, which allows you to let the compiler infer the type of something rather than being explicit.  Not news to those who have been using dynamic languages for the past few years.

Then there’s Lambda expressions.  Ditto on the dynamic language comment.

A refresh of the various libraries, yah, ok, thanks.  Probably one of the most important additions is the nullptr type.  Phew!!  After so many years, there finally a formal definition of null, such that there’s no confusion between “NULL” and ‘0’.

Still existant is the confusion one gets from trying to figure out the exact semantics of ++foo vs foo++, depending on the context.

The saddest thing of all is that there is no garbage collection, or auto reference counting.  There is some support for smarter pointers, so it’s a little easier to not shoot yourself in the foot.

As an evolution of the C/C++ franchise goes, this seems as good an evolutionary step as any.  I personally would have voted for garbage collection over async support, but that’s just me.

What about the current version compared to a modern dynamic language such as LuaJIT?  Lua, with its terse language, does a lot of things, and has very few standard libraries to speak of.  All it really has going for it is the garbage collector, and a very small footprint.  And herein lies the interesting inflection point.  Who is the modern programmer?  Back when C was in its infancy, the “programmer” was typically an electrical engineer, and possibly an “IT” professional, but a fairly low level hacker.  Certainly there were no script kiddies, and everyone at the time probably had gone through flipping switches on a PDP 10, if not at least using IBM punch cards to enter their programs in a batch to get their reams of line printed output.

C has acted as the basis for many a language since.  Simple control loops, assignment statements, and the like.  But the programmer population has changed, the demographics have shifted.  With Visicalc, suddenly office workers were ‘coding’, along with visual basic, then html, and nowadays JavaScript.  Many modern day “programmers” doesn’t know anything about the PDP 10, how memory is laid out, what the Operating system is really doing down there, or anything else about the machine they’re programming for.  Having very low level access that C provides, just isn’t as relevant.

Where this stuff really does matter is when your down programming micro controllers, and other tiny devices.  In those cases, you’re just barely a step above assembly language, and this is where the original intention of C really shines.  By the same token though, C++ does not shine, because it’s too big and bloated.  All the libraries, the complexity, and niceties really don’t serve you well when you’re running in a constrained 64K environment.

So, I’m conflicted.  At one point in my career, I was a C++ expert.  It served me well.  Then I left it behind for things like C#.  I don’t really see the value in getting deeply immersed in the intricacies of the current wave of C++.  It lacks sufficient memory management, and I’ve found over the years, this is one of the most important aspects of a runtime environment.

It seems more prudent to stay immersed in basic C, because that’s what is universal from lowest level microcontrollers to mainframes, and stay connected to a dynamic language, such as javascript, or LuaJIT.

I know that many people will think “but real programmers program in C++”, and all I can say is, real programmers use whatever tools are necessary and appropriate for a particular job.  Real programmers write code that is easy to read and maintain, and that might mean writing in any number of languages over time.

I like the evolution of the C++ language.  I’m happy to not ride that particular train into the future, if I don’t have to.