Iterating over oddities – mstrziter

In some parts of the Windows API, you will run across the case where a ‘multi-string’ is the return value. A ‘multi-string’ might look like this:


Basically, a byte array with embedded null terminated strings. You find this in cases like when you’re enumerating devices, and the locationpaths (where the device is mounted) is returned. You find these in the system registry where they are the REG_MULTI_SZ type. So, if you’re wanting to deal with the system registry, you might run across this particular type. I’m not sure I’d ever have a need to create this particular type in modern day programming. I might choose a different delimeter than the ” byte, but it is fairly effective for ascii strings.

At any rate, I want an iterator over this thing so that I can easily navigate around it without much fuss. Thus was born mstrziter.

local function mstrziter(params)
	params = params or {}

	params.separator = params.separator or 0
	params.datalength = params.datalength or
	params.basetype = params.basetype or ffi.typeof("char")
	params.basetypeptr = ffi.typeof("const $ *", params.basetype)
	params.maxlen = params.maxlen or params.datalength-1;

	local function closure(param, idx)
		if not then
			return nil;

		local len = 0;
		while ffi.cast(param.basetypeptr,[idx + len] ~= param.separator and (len < param.maxlen) do
			len = len +1;

		if len == 0 then
			return nil;

		return (idx + len+1), ffi.string(ffi.cast(param.basetypeptr,, len*ffi.sizeof(param.basetype));

	return closure, params, 0;

With this little construct, I can do the following:

local src = "big\0boy\0baby\0bear\0bounces\0basketballs\0behind\0the\0barn\0\0"

print("---- each(print, mstrziter(src)")
each(print, mstrziter{data = src})

That is, passing a multi-string to mstrziter and it will feed out the strings one by one. Well, I think that’s pretty nifty, and cracks multi-strings wide open. When I first created the thing, I only wanted to deal with null delimeted strings, but there’s no reason this same routine can’t do more.

I can just as easily do the following:

local src2 = "big\rboy\rbaby\rbear\rbounces\rbasketballs\rbehind\rthe\rbarn\r\r"
print("---- take(3, mstrziter(src2) ----")
each(print, take(3, mstrziter{data = src2, separator = string.byte('\r')}))

In this case I’m using ‘\r’ as the delimeter. Of course Lua already has the gmatch() function on strings, so why would I bother with this at all? Well, because I’m not always handed a string to begin with. Most of the times I’m handed a pointer to some buffer. I don’t want to waste time turning that into a Lua string, simply to chop it up into more strings.

Another interesting thing you can do is pass in the base type of the array:

each(print, mstrziter{data = src, basetype=ffi.typeof(“const wchar_t *”)})

In this way, it can deal with standard ‘char *’, or ‘wchar_t *’. But why stop there? You can pass in an array of any type, as long as there is an ‘__eq’ implemented for that type. So, fixed sized records in a database? If you do that, then your queries using the functional programming stuff become fairly easy and interesting.

Well, there you have it. A fairly simple iterator. It serves the purpose I set out for it, and gave me a whole lot more in return. The one wish I have for it is one for Lua in general. When I have a buffer which already contains the content, I would like to create ‘tokens’ instead of full on strings. Meaning, I’d like to have a token which consists of the base pointer, offset, length in the original buffer. But, that becomes a hairy mess, not immutable, not properly garbage collected… So, for now, I’m happy with just creating new copies of the strings.

Device Iteration with Functional Programming

One of the great pleasures I have in life is learning something new. There’s nothing greater than those ‘light bulb goes on’ moments as you realize something and gain a much deeper understanding than you had before.

Well, a little while ago, there was an announcement of this thing called Lua Fun.  Lua Fun is a large set of functions which make functional programming in Lua really easy.  It has the usual suspects such as map, reduce, filter, each, etc.  If you read the documentation, you get a really good understanding of how iterators work in Lua, and more importantly, how LuaJIT is able to fold and manipulate things in hot loops such that the produced code is much tighter than anything I could possibly write in C, or any other language I so happen to use.

So, now I’m a fan of Lua Fun, and I would encourage anyone who’s both into Lua, and functional programming to take a look.

How to use it?  I’ve been enumerating various types of things in the Windows system of late.  Using the MMDevice subsystem, I was able to get an enumeration of the audio devices (that took a lot of COM work).  What about displays, and disk drives, and USB devices, and…  Yes, each one of those things has an attendant API which will facilitate monitoring said category.  But, is there one to rule them all?  Well yes, as it turns out, in most cases what these various APIs are doing is getting information out of the System Registry, and just presenting it reasonably.

There is a way to enumerate all the devices in the system.  You know, like when you bring up the Device Manager in Windows, and you see a tree of devices, and their various details.  The stage is set, how do you do that?  I created a simple object that does the grunt work of enumerating the devices in the system.

The DeviceRecordSet is essentially a query. Creating an instance of the object just gives you a handle onto making query requests. Here is the code:

local ffi = require("ffi")
local bit = require("bit")
local bor = bit.bor;
local band =;

local errorhandling = require("core_errorhandling_l1_1_1");
local SetupApi = require("SetupApi")
local WinNT = require("WinNT")

local DeviceRecordSet = {}
setmetatable(DeviceRecordSet, {
	__call = function(self, ...)
		return self:create(...)

local DeviceRecordSet_mt = {
	__index = DeviceRecordSet,

function DeviceRecordSet.init(self, rawhandle)
	print("init: ", rawhandle)

	local obj = {
		Handle = rawhandle,
	setmetatable(obj, DeviceRecordSet_mt)

	return obj;

function DeviceRecordSet.create(self, Flags)
	Flags = Flags or bor(ffi.C.DIGCF_PRESENT, ffi.C.DIGCF_ALLCLASSES)

	local rawhandle = SetupApi.SetupDiGetClassDevs(

	if rawhandle == nil then
		return nil, errorhandling.GetLastError();

	return self:init(rawhandle)

function DeviceRecordSet.getNativeHandle(self)
	return self.Handle;

function DeviceRecordSet.getRegistryValue(self, key, idx)
	idx = idx or 0;

	did.cbSize = ffi.sizeof("SP_DEVINFO_DATA");

--print("HANDLE: ", self.Handle)
	local res = SetupApi.SetupDiEnumDeviceInfo(self.Handle,idx,did)

	if res == 0 then
		local err = errorhandling.GetLastError()
		--print("after SetupDiEnumDeviceInfo, ERROR: ", err)
		return nil, err;

	local regDataType ="DWORD[1]")
	local pbuffersize ="DWORD[1]",260);
	local buffer ="char[260]")

	local res = SetupApi.SetupDiGetDeviceRegistryProperty(

	if res == 0 then
		local err = errorhandling.GetLastError();
		--print("after GetDeviceRegistryProperty, ERROR: ", err)
		return nil, err;

	--print("TYPE: ", regDataType[0])
	if (regDataType[0] == 1) or (regDataType[0] == 7) then
		return ffi.string(buffer, pbuffersize[0]-1)
	elseif regDataType[0] == ffi.C.REG_DWORD_LITTLE_ENDIAN then
		return ffi.cast("DWORD *", buffer)[0]

	return nil;

function DeviceRecordSet.devices(self, fields)
	fields = fields or {
		{ffi.C.SPDRP_DEVICEDESC, "description"},
		{ffi.C.SPDRP_MFG, "manufacturer"},
		{ffi.C.SPDRP_DEVTYPE, "devicetype"},
		{ffi.C.SPDRP_CLASS, "class"},
		{ffi.C.SPDRP_ENUMERATOR_NAME, "enumerator"},
		{ffi.C.SPDRP_FRIENDLYNAME, "friendlyname"},
		{ffi.C.SPDRP_LOCATION_INFORMATION , "locationinfo"},
		{ffi.C.SPDRP_LOCATION_PATHS, "locationpaths"},
		{ffi.C.SPDRP_SERVICE, "service"},

	local function closure(fields, idx)
		local res = {}

		local count = 0;
		for _it, field in ipairs(fields) do
			local value, err = self:getRegistryValue(field[1], idx)
			if value then
				count = count + 1;
				res[field[2]] = value;

		if count == 0 then
			return nil;
		return idx+1, res;

	return closure, fields, 0

return DeviceRecordSet

The ‘getRegistryValue()’ function is the real workhorse of this object. That’s what gets your values out of the system registry. The other function of importance is ‘devices()’. This is an iterator.

There are a couple of things of note about this iterator. First of all, it does not require ‘up values’ to be held onto. All that means is that everything the iterator needs to operate is carried in the return values from the function. The ‘state’ if you will, is handed in fresh every time the ‘closure()’ is called. This is the key to creating an iterator that will work well with Lua Fun.

By default, this iterator will return quite a few (but not all) fields related to each object, and it will return all the objects. This is ok, because there are typically less than 150 objects in any given system.

Now, I want to do various queries against this set without much fuss. This is where Lua Fun, and functional programming in general, really shines.

First, a little setup:

local ffi = require("ffi")
local DeviceRecordSet = require("DeviceRecordSet")
local serpent = require("serpent")
local Functor = require("Functor")

local fun = require("fun")()
local drs = DeviceRecordSet();

local function printIt(record)
	each(print, record)

This creates an instance of the DeviceRecordSet object, which will be used in the queries. Already the printIt() function is utilizing Lua Fun. The ‘each()’ function will take whatever it’s handed, and perform the function specified. In this case, the ‘record’ will be a table. So, each will iterate over the table entries and print each one of them out. This is the equivalent of doing:

for k,v in pairs(record)
print(k, v)

I think that simply typing ‘each’ is a lot simpler and pretty easy to understand.

How about a query then?

-- show everything for every device
each(printIt, drs:devices())

In this case, the ‘each’ is applied to the results of the ‘devices()’ iterator. For each record coming from the devices iterator, the printIt function will be called, which will in turn print out all the values in the record. That’s pretty nice.

What if I don’t want to see all the fields in the record, I just want to see the objectname, and description fields. Well, this is a ‘map’ operation, or a projection in database parlance, so:

-- do a projection on the fields
local function projection(x)
  return {objectname = x.objectname, description = x.description}
each(printIt, map(projection, drs:devices()))

Working from the inside out, for each record coming from the devices() iterator, call the ‘projection’ function. The return value from the projection function becomes the new record for this iteration. For each of those records, call the printIt function.

Using ‘map’ is great as you can reshape data in any way you like without much fuss.

Lastly, I want to see only the records that are related to “STORAGE”, so…

-- show only certain records
local function enumeratorFilter(x)
	return x.enumerator == "STORAGE"

each(printIt, filter(enumeratorFilter, drs:devices()))

Here, the ‘filter’ iterator is used. So, again, for each of the records coming from the ‘devices()’ enumerator, call the ‘enumeratorFilter’ function. If this function returns ‘true’ for the record, then it is passed along as the next record for the ‘each’. If ‘false’, then it is skipped, and the next record is tried.

This is pretty powerful, and yet simple stuff. The fact that iterators create new iterators, in tight loops, makes for some very dense and efficient code. If you’re interested in why this is so special in LuaJIT, and not many other languages, read up on the Lua Fun documentation.

I’ve killed two birds with one stone. I have finally gotten to the root of all device iterators. I have also learned how to best write iterators that can be used in a functional programming way. Judicious usage of this mechanism will surely make a lot of my code more compact and readable, as well as highly performant.


All the pretty little asyncs…

I have gone on about various forms of async for quite some time now. So could there possibly be more? Well, yes of course!

Here’s the scenario I want to enable. I want to keep track of my file system activity, sending the various operations to a distant storage facility. I want to do this while a UI displays what’s going on, and I want to be able to configure things while its happening, like which events I really care to shadow, and the like.

I don’t want to use multiple OS level threads if I can at all avoid them as they will complicate my programming tremendously. So, what to do.

Well, first I’ll start with the file tracking business. I have talked about change journals in the past. This is a basic mechanism that Windows has to track changes to the file system. Every single open, close, delete, write, etc, has an entry in this journal. If you’re writing a backup program, you’ll be using change journals.

The essence of the change journal functionality is usage of the DeviceIoControl() function. Most of us are very familiar with the likes of CreateFile(), ReadFile(), WriteFile(), CloseHandle(), when it comes to dealing with files. But, for everything else, there is this DeviceIOControl() function.

What is a device? Well, you’d be surprised to learn that most things in the Windows OS are represented by ‘devices’ just like they are in UNIX systems. For example, ‘C:’, is a device. But, also “DISPLAY1” is also a device, as are “LCD” and “PhysicalDisk0”. When it comes to controlling devices, the Win32 level API calls will ultimately make DeviceIoControl() calls with various parameters. That’s great to know as it allows you to create whatever API you want, as long as you know the nuances of the device driver you’re trying to control.

But, I digress. The key point here is that I can open up a device, and I can make a DeviceIoControl() call, and true to form, I can use OVERLAPPED structures, and IO Completion Ports. That makes these calls “async”, or with TINN, cooperative.

To wrap it up in a tidy little bow, here is a Device class which does the grunt work for me:

local ffi = require("ffi")
local bit = require("bit")
local bor = bit.bor;

local core_file = require("core_file_l1_2_0");
local core_io = require("core_io_l1_1_1");
local Application = require("Application")
local IOOps = require("IOOps")
local FsHandles = require("FsHandles");
local errorhandling = require("core_errorhandling_l1_1_1");
local WinBase = require("WinBase");

local Device = {}
setmetatable(Device, {
	__call = function(self, ...)
		return self:open(...)
local Device_mt = {
	__index = Device,

function Device.init(self, rawhandle)
	local obj = {
		Handle = FsHandles.FsHandle(rawhandle)
	setmetatable(obj, Device_mt)
	Application:watchForIO(rawhandle, rawhandle)

	return obj;

function, devicename, dwDesiredAccess, dwShareMode)
	local lpFileName = string.format("\\\\.\\%s", devicename);
	dwDesiredAccess = dwDesiredAccess or bor(ffi.C.GENERIC_READ, ffi.C.GENERIC_WRITE);
	local lpSecurityAttributes = nil;
	local dwCreationDisposition = OPEN_EXISTING;
	local dwFlagsAndAttributes = FILE_FLAG_OVERLAPPED;
	local hTemplateFile = nil;

	local handle = core_file.CreateFileA(

	if handle == INVALID_HANDLE_VALUE then
		return nil, errorhandling.GetLastError();

	return self:init(handle)

function Device.getNativeHandle(self)
	return self.Handle.Handle;

function Device.createOverlapped(self, buff, bufflen)
	local obj ="FileOverlapped");
	obj.file = self:getNativeHandle();
	obj.OVL.Buffer = buff;
	obj.OVL.BufferLength = bufflen;

	return obj;

function Device.control(self, dwIoControlCode, lpInBuffer, nInBufferSize, lpOutBuffer, nOutBufferSize)
	local lpBytesReturned = nil;
	local lpOverlapped = self:createOverlapped(ffi.cast("void *", lpInBuffer), nInBufferSize);

	local status = core_io.DeviceIoControl(self:getNativeHandle(), 
          ffi.cast("void *", lpInBuffer),
          ffi.cast("OVERLAPPED *",lpOverlapped));

	local err = errorhandling.GetLastError();

	-- Error conditions
	-- status == 1, err == WAIT_TIMEOUT (258)
	-- status == 0, err == ERROR_IO_PENDING (997)
	-- status == 0, err == something else

	if status == 0 then
		if err ~= ERROR_IO_PENDING then
			return false, err

    local key, bytes, ovl = Application:waitForIO(self, lpOverlapped);

    return bytes;

return Device

I’ve shown this kind of construct before with the NativeFile object. That object contains Read, and Write functions as well, but lacks the control() function. Of course the two could be combined for maximum benefit.

How to use this thing?

dev = Device("c:")

OK, that’s out of the way. Now, what about this change journal thing? Very simple now that the device is handled.
A change journal can look like this:

-- USNJournal.lua
-- References

local ffi = require("ffi");
local bit = require("bit");
local bor = bit.bor;
local band =;

local core_io = require("core_io_l1_1_1");
local core_file = require("core_file_l1_2_0");
local WinIoCtl = require("WinIoCtl");
local WinBase = require("WinBase");
local errorhandling = require("core_errorhandling_l1_1_1");
local FsHandles = require("FsHandles");
local Device = require("Device")


	An abstraction for NTFS Change journal management
local ChangeJournal = {}
setmetatable(ChangeJournal, {
	__call = function(self, ...)
		return self:open(...);

local ChangeJournal_mt = {
	__index = ChangeJournal;

ChangeJournal.init = function(self, device)
	local obj = {
		Device = device;
	setmetatable(obj, ChangeJournal_mt);

	local jinfo, err = obj:getJournalInfo();

	print("ChangeJournal.init, jinfo: ", jinfo, err)

	if jinfo then
		obj.JournalID = jinfo.UsnJournalID;
		obj.LowestUsn = jinfo.LowestValidUsn;
		obj.FirstUsn = jinfo.FirstUsn;
		obj.MaxSize = jinfo.MaximumSize;
		obj.MaxUsn = jinfo.MaxUsn;
		obj.AllocationSize = jinfo.AllocationDelta;

	return obj;
end = function(self, driveLetter)
	local device, err = Device(driveLetter)

	if not device then
		print(", ERROR: ", err)
		return nil, err

	return self:init(device);

ChangeJournal.getNextUsn = function(self)
	local jinfo, err = self:getJournalInfo();

	if not jinfo then
		return false, err;

	return jinfo.NextUsn;

ChangeJournal.getJournalInfo = function(self)
	local dwIoControlCode = FSCTL_QUERY_USN_JOURNAL;
	local lpInBuffer = nil;
	local nInBufferSize = 0;
	local lpOutBuffer ="USN_JOURNAL_DATA");
	local nOutBufferSize = ffi.sizeof(lpOutBuffer);

	local success, err = self.Device:control(dwIoControlCode, 

	if not success then
		return false, errorhandling.GetLastError();

	return lpOutBuffer;

function ChangeJournal.waitForNextEntry(self, usn, ReasonMask) 
 	usn = usn or self:getNextUsn();
 	local ReasonMask = ReasonMask or 0xFFFFFFFF;
 	local ReturnOnlyOnClose = false;
 	local Timeout = 0;
 	local BytesToWaitFor = 1;

    local ReadData ="READ_USN_JOURNAL_DATA", {usn, ReasonMask, ReturnOnlyOnClose, Timeout, BytesToWaitFor, self.JournalID});

    local pusn ="USN");
    -- This function does not return until the USN
    -- record exits
	local BUF_LEN = ffi.C.USN_PAGE_SIZE;
	local Buffer ="uint8_t[?]", BUF_LEN);
    local dwBytes ="DWORD[1]");

	local success, err = self.Device:control(FSCTL_READ_USN_JOURNAL, 

	if not success then 
		return false, err

	local UsnRecord = ffi.cast("PUSN_RECORD", ffi.cast("PUCHAR",Buffer) + ffi.sizeof("USN")); 

    return UsnRecord;

return ChangeJournal;

This very much looks like the change journal I created a few months back. The primary difference is the device control stuff is abstracted out into the Device object, so it does not need to be repeated here.

When we want to track the changes to the device, we make repeated calls to ‘waitForNextEntry’.

local function test_waitForNextEntry(journal)
    local entry = journal:waitForNextEntry();

    while entry do
        entry = journal:waitForNextEntry();

This is your typical serially written code. There’s nothing that look special about it, no hint of async processing. Behind the covers though, way back in the Device:control() function, the actual sending of a command to the device happens using IO Completion Port, so if you’re running with TINN, this particular task will ‘waitForIO’, and other tasks can continue.

So, using it in context looks like this:

local function main()
    local journal, err = ChangeJournal("c:")

    spawn(test_waitForNextEntry, journal);
    periodic(function() print("tick") end, 1000)


In this case, I spawn the journal waiting/printing thing as its own task. Then I setup a periodic timer to simply print “tick” every second, to show there is some activity.

Since the journaling is cooperative (mostly waiting for io control to complete), the periodic timer, or UI processing, or what have you, is free to run, without any hindrance.

Combine this with the already cooperative UI stuff, and you can imagine how easy it could be to construct the scenario I set out to construct. Since all networking and file system operations in TINN are automatically async, it would be easy to log these values, or send them across a network to be stored for later analysis or what have you.

And there you have it. Async everywhere makes for some very nice scenarios. Being able to do async on any device, whether with standard read/write operations, or io control, makes for very flexible programming.

Next time around, I’m going to show how to do async DNS queries for fun and profit.

TINN Version 0.7 Available

Although TINN is under constant development, there’s nothing like declaring a new “release”. It’s been 3 months since the 0.6 release. So, now there is a 0.7 release. You can read about the TINN v0.7 Release and install it if you would like.

There were 84 commits since the previous release, so I can’t even remember all the changes that were involved. The major addition from my most recent work has to do with the new scheduler as described on this blog. That’s the extensible, plug-in driven scheduler. Pretty nifty for my work at least.

There are quite a few additions, such as a revamped stream object, io completion port supported file interface, a logfile thing, and quite a few more interfaces from the OS.

Other items I have been working on include support for various COM interfaces such as DXGI, DXVA, MMDevice and some others. There are all in the “experimental” folder if you look at the enlistment. They are not quite ready for prime time, so they’re not actually in the v0.7 release.

What can you do with TINN? In short, you can create all sorts of Windows based applications. Everything from scalable web services to interactive multi-tasking UI (including OpenGL based).

TINN is a command line tool (tinn.exe). As such, the easiest thing to do is bring up a command line shell and run various scripts through tinn.

c:\> tinn.exe hello.lua

The TINN repository contains numerous test cases that utilize the various modules of TINN.

That’s it for now.  Next release will be about cleanup and stabilization primarily.

Parallel Conversations

I have been tinkering with words of late. I’ve added ‘waitFor’, ‘when’ and ‘whenever’ to the TINN lexicon, and my programming is becoming easier, more organized, and easier to describe.

Recently I added another set of words: ‘waitSignal’, ‘signalOne’, and ‘signalAll’. If you were using another system, you might call these ‘events’, but really they’re just signaling.

Here’s how I might use them:

local Application = require("Application")(true)

local function waiter(num)
  num = num or 0

  local function closure()
    print(string.format("WAITED: %d",num))

  return closure;

local function main()

  for i=1,4 do
    onSignal(waiter(i), "waiting")

  -- signal only one of them

  -- sleep a bit giving other tasks
  -- a chance to run

  -- signal the rest of them	
  print("SLEEP AFTER signalAll")


First, it spawns 4 tasks which will all be waiting on the ‘waiting’ signal. With ‘signalOne’, only 1 of the 4 waiting tasks will be awakened and continue execution. With the ‘signalAll’, the rest of the waiting tasks are all scheduled to run, and thus continue from where they left of.

The core primitive for signaling is the ‘waitSignal()’ function. This will essentially suspend the currently running task until the specified signal has been given. The ‘onSignal()’ function is a convenience which will spawn a separate task which will do the waiting and execute the specified function at the appropriate time.

If you were using another language, these might be ‘onEvent()’ and ’emit()’. Of course it’s really easy to code up that way, just create a name alias to whatever you like. You could even do:

local function on(eventName, func)
  return onSignal(func, eventName)

So, these are some more words that are in the parallel programming arsenal. In totality, the set is:

Time related

  • sleep
  • delay
  • periodic


  • waitFor
  • when
  • whenever


  • onSignal
  • signalOne
  • signalAll
  • waitSignal

Task and scheduler

  • run
  • stop
  • spawn
  • yield

With these words in hand, doing my multi-task programming is becoming easier by the moment. I don’t worry about the lower level multi-tasking primitive such as mutexes, locks, and the like. I’m not really that concerned with timers, thread pools, or any of the other machinery that makes this all happen. I just write my simple code.

One thing has struck me though. This is all great for a single threaded environment. Basically collaborative multi-tasking. In this age of super fast CPUs, it turns out that going single threaded is just fine. In many cases it’s actually better than going preemptive multi-threaded because you don’t necessarily incur as much OS thread level context switching.

I’d like to go one step further though. A long time ago we didn’t have L1/L2 caches on the CPUs, then they came along, got better, and now are fairly large. As a programmer, my concern for them is fairly minimal. I don’t do much to ensure that the caches are used properly. I just write my code, and either the compiler, or the CPU itself deals with all the mechanics. Same goes with prefetching instructions and the like. I don’t write my code to make that job any easier, it just happens automagically.

Here I am contemplating the same thing about threads. As my CPU has increasingly more cores, I am wondering if the CPU cycles should be viewed the same as we viewed levels of memory in the past. Is the a C1/C2 level of core execution. That is, most things should stay on the same thread because switching to a different core is costly. Something very low level should decide on when that switching occurs.

My application/runtime can know a lot about what I’m going to do because the thing is all laid out in script. After some analysis, perhaps I can decide “this thing is going to sleep for more than 500 ms, therefore I’m going to actually put it over here on this other thread which is actually sleeping.”. In that way, I can reduce my resource usage because I won’t be polling the timer every time through the scheduling loop, I can be smarter.

If I had such smartness, then when I have 16 – 100 cores, I’ll be able to easily take advantage of those multiple cores without having to truly understand and navigate that madness on my own. Let the machine figure it out, it’s much better at that than I am.

In the meanwhile, I’ve got these multi-tasking primitives which are making my life a lot easier when it comes to programming fairly complex systems.

Lua Coroutine Roundup

My WordPress dashboard says this is my 250th post. I figure what better way to mark the occasion than to do a little bit of a roundup on one of my favorite topics.

One of the most awesome features of the Lua language is a built in co-routine mechanism. I’ve talked about co-routines quite a bit, doing a little series on the basics of coroutines, all the way up through a scheduler for multi-tasking.

Coroutines give the programmer the illusion of creating a parallel processing environment. Basically, write up your code “spawn” different coroutines, and then just forget about them… Well, almost. The one thing the Lua language does not give you is a way to manage those coroutines. Co-routines are a mechanism by which ‘threads’ can do cooperative multi-tasking. This is different from the OS level preemptive multitasking that you get with ‘threads’ on most OS libraries. So, I can create coroutines, resume, and yield. It’s the ‘yield’ that gives coroutines the multi-tasking flavor. One co-routine yields, and another co-routine must be told to ‘resume’. Something has to do the telling, and keep track of who’s yielded, and who needs to be resumed. In steps the scheduler.

Summaries from my own archives
Lua Coroutines – Getting Started

Multitask UI Like it’s 1995

Hurry Up and Wait – TINN Timing

Computicles – A tale of two schedulers

Parallel Computing is Child’s Play

Multitasking single threaded UI – Gaming meets networking

Alertable Predicates? What’s that?

Pinging Peers – Creating Network Meshes

From the Interwebs
GitHub Gist: Deco / coroutine_scheduler.lua

Beginner’s Guide to Coroutines – Roblox Wiki

LOOP: Thread Scheduler


And of course the search engines will show you thousands more:

lua coroutine scheduler

I think it would be really great if the Lua language itself had some form of rudimentary scheduler, but scheduling is such a domain specific thing, and the rudiments are so basic, that it’s possibly not worth providing such support. The Go language supports a basic scheduler, and has coroutine support, to great effect I think.

The simple scheduler is simple, just do a round robbin execution of one task after the next as each task yields. I wrote a simple dispatcher which started with exactly this. Then interesting things start to happen. You’d like to have your tasks go into a wait state if they’re doing an IO operation, and not be scheduled again until some IO has occurred. Or, you want to put a thread to sleep for some amount of time. Or, you want a thread to wait on a condition of one sort or another, or signals, or something else. Pretty soon, you basic scheduler is hundreds of lines of code, and things start to get rather complicated.

After going round and round in this problem space, I finally came up with a solution that I think has some legs. Rather than a complex scheduler, I promote a simple scheduler, and add functions to it through normal thread support. So, here is my final ‘scheduler’ for 2013:

local ffi = require("ffi");

local Collections = require("Collections");
local StopWatch = require("StopWatch");

	The Scheduler supports a collaborative processing
	environment.  As such, it manages multiple tasks which
	are represented by Lua coroutines.
local Scheduler = {}
setmetatable(Scheduler, {
	__call = function(self, ...)
		return self:create(...)
local Scheduler_mt = {
	__index = Scheduler,

function Scheduler.init(self, scheduler)
	local obj = {
		Clock = StopWatch();

		TasksReadyToRun = Collections.Queue();
	setmetatable(obj, Scheduler_mt)
	return obj;

function Scheduler.create(self, ...)
	return self:init(...)

		Instance Methods
function Scheduler.tasksArePending(self)
	return self.TasksReadyToRun:Len() > 0

function Scheduler.tasksPending(self)
	return self.TasksReadyToRun:Len();

function Scheduler.getClock(self)
	return self.Clock;

	Task Handling

function Scheduler.scheduleTask(self, afiber, ...)
	if not afiber then
		return false, "no fiber specified"

	afiber.state = "readytorun"

	return afiber;

function Scheduler.removeFiber(self, fiber)
	--print("DROPPING DEAD FIBER: ", fiber);
	return true;

function Scheduler.inMainFiber(self)
	return coroutine.running() == nil; 

function Scheduler.getCurrentFiber(self)
	return self.CurrentFiber;

function Scheduler.step(self)
	-- Now check the regular fibers
	local task = self.TasksReadyToRun:Dequeue()

	-- If no fiber in ready queue, then just return
	if task == nil then
		return true

	if task:getStatus() == "dead" then

		return true;

	-- If the task we pulled off the active list is 
	-- not dead, then perhaps it is suspended.  If that's true
	-- then it needs to drop out of the active list.
	-- We assume that some other part of the system is responsible for
	-- keeping track of the task, and rescheduling it when appropriate.
	if task.state == "suspended" then
		return true;

	-- If we have gotten this far, then the task truly is ready to 
	-- run, and it should be set as the currentFiber, and its coroutine
	-- is resumed.
	self.CurrentFiber = task;
	local results = {task:resume()};

	-- no task is currently executing
	self.CurrentFiber = nil;

	-- once we get results back from the resume, one
	-- of two things could have happened.
	-- 1) The routine exited normally
	-- 2) The routine yielded
	-- In both cases, we parse out the results of the resume 
	-- into a success indicator and the rest of the values returned 
	-- from the routine
	local success = results[1];

	--print("SUCCESS: ", success);
	if not success then
		print("RESUME ERROR")

	-- Again, check to see if the task is dead after
	-- the most recent resume.  If it's dead, then don't
	-- bother putting it back into the readytorun queue
	-- just remove the task from the list of tasks
	if task:getStatus() == "dead" then
		--print("Scheduler, DEAD coroutine, removing")

		return true;

	-- The only way the task will get back onto the readylist
	-- is if it's state is 'readytorun', otherwise, it will
	-- stay out of the readytorun list.
	if task.state == "readytorun" then
		self:scheduleTask(task, results);

	Primary Interfaces

function Scheduler.suspend(self, aTask)
	if not aTask then
		self.CurrentFiber.state = "suspended"
		return self:yield()

	aTask.state = "suspended";

	return true

function Scheduler.yield(self, ...)
	return coroutine.yield(...);

	Running the scheduler itself
function Scheduler.start(self)
	if self.ContinueRunning then
		return false, "scheduler is already running"
	self.ContinueRunning = true;

	while self.ContinueRunning do

function Scheduler.stop(self)
	self.ContinueRunning = false;

return Scheduler

At 195 LOC, this is back down to the basic size of the original simple dispatcher that I started with. There are only a few functions to this interface:

  • scheduleTask()
  • step()
  • suspend()
  • yield()
  • start()

These are the basics to do some scheduling of tasks. There is an assumption that there is an object that can be scheduled, such that ‘scheduleTask’ can attach some metadata, and the like. Other than that, there’s not much here but the basic round robin scheduler.

So, how do you make a more complex scheduler, say one that deals with time? Well, first of all, I’ve also added an Application object, which actually contains a scheduler, and supports various other things related to an application, including add ons to the scheduler. Here’s an excerpt of the Application object construction.

function Application.init(self, ...)
	local sched = Scheduler();

	waitForIO.MessageQuanta = 0;

	local obj = {
		Clock = Stopwatch();
		Scheduler = sched;
		TaskID = 0;
		wfc = waitForCondition(sched);
		wft = waitForTime(sched);
		wfio = waitForIO(sched);
	setmetatable(obj, Application_mt)

	-- Create a task for each add-on
	obj:spawn(obj.wfc.start, obj.wfc)
	obj:spawn(obj.wft.start, obj.wft)
	obj:spawn(obj.wfio.start, obj.wfio)

	return obj;

In the init(), the Application object creates instances for the add ons to the scheduler. These add-ons (waitForCondition, waitForTime, waitForIO) don’t have to adhere to much of an interface, but you do need some function which is callable so that it can be spawned. The Application construction ends with the spawning of the three add-ons as normal threads in the scheduler. Yep, that’s right, the add ons are just like any other thread, running in the scheduler. You might think; But these things need to run with a higher priority than regular threads! And yes, in some cases they do need that. But hay, that’s an exercise for the scheduler. If the core scheduler gains a ‘priority’ mechanism, then these threads can be run with a higher priority, and everything is good.

And what do these add ons look like? I’ve gone over the before, but here’s an example of the timing one:

local Functor = require("Functor")
local tabutils = require("tabutils");

local waitForTime = {}
setmetatable(waitForTime, {
	__call = function(self, ...)
		return self:create(...)

local waitForTime_mt = {
	__index = waitForTime;

function waitForTime.init(self, scheduler)
	local obj = {
		Scheduler = scheduler;
		TasksWaitingForTime = {};
	setmetatable(obj, waitForTime_mt)

	--scheduler:spawn(obj.step, obj)

	return obj;

function waitForTime.create(self, scheduler)
	if not scheduler then
		return nil, "no scheduler specified"

	return self:init(scheduler)

function waitForTime.tasksArePending(self)
	return #self.TasksWaitingForTime > 0

function waitForTime.tasksPending(self)
	return #self.TasksWaitingForTime;

local function compareTaskDueTime(task1, task2)
	if task1.DueTime < task2.DueTime then
		return true
	return false;

function waitForTime.yieldUntilTime(self, atime)
	--print("waitForTime.yieldUntilTime: ", atime, self.Scheduler.Clock:Milliseconds())
	--print("Current Fiber: ", self.CurrentFiber)
	local currentFiber = self.Scheduler:getCurrentFiber();
	if currentFiber == nil then
		return false, "not currently in a running task"

	currentFiber.DueTime = atime;
	tabutils.binsert(self.TasksWaitingForTime, currentFiber, compareTaskDueTime);

	return self.Scheduler:suspend()

function waitForTime.yield(self, millis)
	--print('waitForTime.yield, CLOCK: ', self.Scheduler.Clock)

	local nextTime = self.Scheduler.Clock:Milliseconds() + millis;

	return self:yieldUntilTime(nextTime);

function waitForTime.step(self)
	--print("waitForTime.step, CLOCK: ", self.Scheduler.Clock);

	local currentTime = self.Scheduler.Clock:Milliseconds();

	-- traverse through the fibers that are waiting
	-- on time
	local nAwaiting = #self.TasksWaitingForTime;
--print("Timer Events Waiting: ", nAwaiting)
	for i=1,nAwaiting do

		local fiber = self.TasksWaitingForTime[1];
		if fiber.DueTime <= currentTime then
			--print("ACTIVATE: ", fiber.DueTime, currentTime);
			-- put it back into circulation
			-- preferably at the front of the list
			fiber.DueTime = 0;
			--print("waitForTime.step: ", self)

			-- Remove the fiber from the list of fibers that are
			-- waiting on time
			table.remove(self.TasksWaitingForTime, 1);

function waitForTime.start(self)
	while true do

return waitForTime

The ‘start()’ function is what was scheduled with the scheduler, so that’s the thread that’s actually running. As you can see, it’s just a normal cooperative task.

The real action comes from the ‘yield()’ and ‘step()’ functions. The ‘yield()’s purpose in life is to suspend the current task (take it out of the active running list of the scheduler), and setup the appropriate stuff such that it can be activated later. The purpose of the “step()” function is to go through the list of tasks which are currently suspended, and determine which ones need to be scheduled again, because their time signal has been met.

And that’s how you add items such as “sleep()” to the system. You don’t change the core scheduler, but rather, you add a small module which knows how to deal with timed events, and that’s that.

The application object comes along and provides some convenience methods that can easily be turned into globals, so that you can easily type things like:

delay(function() print("Hello, World!") end, 5000)  -- print "Hello, World! after 5 seconds
periodic(function() print("Hi There!") end, 1000)  -- print "Hi, There!" every second

And there you have it. The core scheduler is a very simple thing. Just a ready list, and some algorithm that determines which one of the tasks in the list gets to run next. This can be as complex as necessary, or stay very simple.

Using an add on mechanism, the overall system can take on more robust scenarios, without increasing the complexity of the individual parts.

This simplified design satisfies all the scenario needs I currently have from high throughput web server to multi-tasking UI with network interaction. I will improve the scheduling algorithm, and probably make that a pluggable function for ease.

Lua coroutines are an awesome feature to have in a scripting language. With a little bit of work, a handy scheduler makes multi-task programming a mindless exercise, and that’s a good thing.

Happy New Year!

What The Functor? – Making them useful

Last time, I introduced the Functor in a rather convoluted way. I said that particular implementation may or may not be as performant as a straight up closure. Well, as it turns out, that’s going to be true most of the time. Why?

You see, when you use a table as a functor, you are forcing a call to ‘getmetatable’ to do a lookup to see if the ‘__call’ metamethod is implemented or not. If it’s not implemented, an error is thrown, if it is implemented, then your operation is performed. Well, that means an extra function call and dereference into our table for every single function call. When this is on a hot path, it could be quite a bummer in terms of performance.

So, why bother with this particular convoluted construct for functor implementation? The beauty of the table technique is that since your functor is actually a table, which gives you they syntactic sugar of being called like a function, it has all the attributes of a table. For example, let’s imagine you want to associate some bit of information with your function, such as when it was created, or version information:

local f1 = Functor(somefunction)
f1.Creation =
f1.Version = "1.2"

That might be useful in a case where you want to refresh your functors if they’re getting to be too old, or you want to compare versions of functions to ensure they are what you expect. Very handy, yet very costly.

Although this is useful, it does impose this overhead, so is there another way to implement the functor without incurring so much overhead?

We of course! If you’re willing to lose the table capabilities of your functor, but you want to retain the fact that you can associate some amount of state with your functor, then a closure is the way to go:

local Functor = function(func, target)
  if not target then return func end

  return function(...)
    return func(target,...)

Yah, that’s right. After I bad mouthed closures when I first introduced functors, here I am praising their value. And it’s true. Given the constraint of ditching associating information with the functor, beyond the passed in state, this closure implementation is a lot faster when it comes time to actually execute the code.

Not only is this code faster, but it’s a lot easier to understand, easier to maintain, etc.

And how would I use this in real life? Well, at the core of TINN is the scheduler, as represented by the IOProcessor class. The scheduler is at the heart of what makes TINN tick, and thus it must be the most compact, performant bit of code that it can possibly be. At the same time, I want it to be flexible, because ultimately I need to ability to change the very core scheduling algorithms and other core features, without having to totally reengineer the codebase to make changes.

So, I’ve been tinkering.

I’ve recently rewritten parts of the scheduler to utilize the functors, and iteratores. Here is the main loop as it stands today:

IOProcessor.start = function(self)

  self.ContinueRunning = true;

  when(Functor(self.noMoreTasks,self), Functor(self.stop, self))

  for astep in self:quantumSteps() do
end = function(self, func, ...)
  if func ~= nil then
    self:spawn(func, ...);


If you were writing a typical TINN based program where you wanted multi-tasking, you would be doing this:

local Task = require("IOProcessor")

local function main()
  -- after 5 seconds, stop the system
  delay(function(tick) Task:stop(), 5 * 1000)

  -- print something every half second
  periodic(function(tick) print("Hello Again: ",tick) end, 500)


The IOProcessor.start() routine is littered with Functors! What are they all doing? OK, first with the ‘addQuantaSteps’. The quanta steps are the runctions that are executed each time through the scheduler’s loop. You can see this in the iterator at the bottom:

  for astep in self:quantumSteps() do

The usual way you’d see this code written is as a while loop, or some other looping construct. In this case, I have used an iterator as the fundamental loop construct. That seems a bit novel, or at least slightly off the beaten path. Why bother with such a construct? Well, first of all, since it’s an iterator, the ‘step()’ function doesn’t actually know anything about the code that is being executed. It doesn’t know how long, and it doesn’t assume any side effects. This gives great flexibility because in order to change the behavior of the main loop only requires changing the behavior of the iterator that is being used. You could easily modify the scheduler to utilize your own custom iterator if you like. The only assumption is that the items coming out of the iterator are functions that can be called, and that’s it.

The ‘addQuantaStep()’ function calls are simply a way to prepopulate the list of functions with a known set of things I want to do each time through the scheduler’s loop. I made this a function call because that way any bit of code could do the same thing before running the scheduler. Here the Functor is being used to ensure we capture the state that goes along with the function.

The last place the Functor is being used here is in the ‘when()’ call:

  when(Functor(self.noMoreTasks,self), Functor(self.stop, self))

“when there are no more tasks, stop”

The functors are used because the functions in question live within the IOProcessor table, and are called with state information, so it needs to be associated with them. As an aside, why not just make these functions flat? Why are they even associated with a table anyway? Well, because, even though at the moment you can run only a single scheduler in your process, there’s no reason in the world why this MUST be the case. With another couple of changes, you’ll actually be able to run multiple schedulers at the same time within a single process. Yep, that might be a tad bit useful. Kind of like running multiple processes at the OS level.

At any rate, just for kicks, here is what the iterator of those functors looks like:

IOProcessor.addQuantaStep = function(self, astep)

IOProcessor.quantumSteps = function(self)
  local index = 0;
  local listSize = #self.QuantaSteps;

  local closure = function()
    if not self.ContinueRunning then
      return nil;

    index = index + 1;
    local astep = self.QuantaSteps[index]

    if (index % listSize) == 0 then
      index = 0

    return astep

  return closure	

Basically, just keep going through the list of steps over and over again until something sets ‘ContinueRunning’ to false.

Having these steps represented as an iterator satisfies the needs that I have for running a fairly dynamically configurable main loop. While I was doing this, I was also considering what implications this might have for doing other things with the iterator. For example, is there something I can do with functional programming? The recent Lua Fun library is all about iterators, and I’m wondering if I can do something with that.

Functor and iterator in hand, I am going to look to streamline the scheduler even further. I am imagining the core of the scheduler is not more than 100 lines of code. The quantum steps, which is where all the real action is, could live separately, and be easily pluggable modules. That’s where this is headed, because I don’t know how to deal with all that complexity otherwise.

So, there you have it. Functors can be useful in practice, and not just a software engineering curiosity.


Get every new post delivered to your Inbox.

Join 51 other followers