LJIT2libc – LuaJIT IS “batteries included”

I would say one of the most common complaints about any platform, framework, product is a paucity of available stuff to fiddle about with.  I’ve heard this criticism a few times leveled at the lua world in general.  Fatter frameworks are usually “batteries included”.  That’s great, when you get a whole ton of stuff that makes writing all sorts of apps relatively easy to do right off the bat.  The challenge of “batteries included” is that you get a whole ton of stuff, most of which you will never use, and some of which doesn’t have the best implementation.

Recently, I’ve been doing quite a lot off luajit bindings on the Linux platform:

If you were into anything from ASCII art graphics drivers to raw frame buffer management in a Linux system, these might start to look like batteries.  They’re not included with the luajit compiler, but they’re relatively easy to add.

But, there’s another one that I’ve been playing with recently which I think is even better:

Luajit is typically built and compiled against the libc and libm libraries.  As such, being able to access routines within those libraries comes for ‘free’ from within luajit, courtesy of the ffi capabilities.  This is very interesting because it means these routines, which are already on your system, are in fact just a stones throw away from being available within your app.
Let’s imagine you wanted to write some script like this, using the libc provided ‘puts()’ function:
puts("Hello, World!");

Well, the lua language has its own print routine, so this is a bit contrived, but let’s say you wanted to do it anyway. Well, to make this work, you need access to the function signature for the ‘puts’ routine and you need that to be accessible in the global namespace. So, it would actually look like this:

local ffi = require("ffi")

ffi.cdef("int puts(const char *);");
puts = ffi.C.puts;

puts("Hello, World!")

Great. A bit of work, and now I can program as wrecklessly as I could in C with all the batteries included in the libc and libm libraries. But, it gets tedious having to write that ffi stuff all over the place, so this is where the LJIT2libc thing comes in. It basically goes and implements a ton of the required ffi.cdef statements to make all those functions readily available to your script. An easy way to pull it into any script is to do the following:

local init = require("test_setup")()

puts("Hello, World!");

That first line ‘init = …’ in this case pulls in all of the definitions, and puts them into the global namespace, so that you can simply write your program without having to know anything about the ffi, or where functions come from or anything like that. Just start writing your code knowing that the batteries are already included.

Now, this example might seem too trivial for words, but just think about what’s in a typical libc library. It’s all the file system, sockets, system calls, math, random numbers, everything you’re likely to need to create higher level application stuff. It’s a lot of stuff that other systems end up creating from scratch, as part of their ‘batteries’.

Of course this is trivializing what’s happening in those batteries, because what you’re getting here is the raw C implementations of things, with all the headaches and dangers that are typically associated with writing code at that level. But, for those who want to have access to that level of code, and apply the safety net of lua where they see fit, this is quite a useful tool I think.

In short, batteries ARE included with every luajit. Those batteries might be just anode, cathode, and electrolyte, without the shell, but there are the raw ingredients. Having these wrappers available just makes it that much easier to think about and deal with a lot of low level stuff where I might previously had to resort to some sort of ‘package’ to achieve something as simple as traverse the file system.

So there you have it. luajit is a ‘batteries included’ system.


LAPHLibs gets a makeover

Quite a while ago (it looks like about 3 years), I create the LAPHLibs repository.  It was an outgrowth of various projects I was doing, and an experiment in open licensing.  The repo is full of routines varying from hash functions to bit banging.  Not a ‘library’ as such, but just a curation of things that are all pure LuaJIT code based.

Well, after I spun it out, I didn’t show it much love.  I made a couple of updates here and there as I found fixes while using the routines in other projects.  Recently though, I found that this is the most linked to project of all my github based projects.  As such, I thought it might be useful to give it a makeover because having all that bad code out there doesn’t really speak well of the Lua language, nor my learnings of it over the past few years.

So, I recently spent a few hours cleaning things up.  Most of the changes are documented in the new CHANGELOG.md file.

If you are one of the ten people who so happens to read this blog, and are a user of bits and pieces from that library, you might want to take a look.

One of the biggest sins that I fixed is the fact that in a lot of cases, I was polluting the global namespace with my functions.  It was inconsistent.  Some functions were global, some local, sometimes even in the same file!  Now everything is local, and there are proper exports.

I also got rid of a few things, like the implementation of strtoul().  The native tonumber() function is much more correct, and deals with all cases I had implemented.

There are a few places where I was doing idiomatic classes, and I cleaned those up by adding proper looking ‘constructors’.

Overall, the set of routines stands a little taller than it did before.  I can’t say when’s the next time I’ll do another overhaul.  I did want to play around a bit more with the bit banging stuff, and perhaps I can add a little bit more from projects I’ve done recently, like the schedlua scheduler and the like.

Bottom line, sometimes it’s actually worth revisiting old code, if for no other reason than to recognize the sins of your past and correct them if possible.


Accessing Open vSwitch using LuaJIT – The LJIT2ovs project

I’ve recently had the need to learn Open vSwitch to do some virtualized networking stuff. As part of my learning experience, I’ve created the LJIT2ovs project on GitHub.

Open vSwitch (OVS) is a pretty decent bit of kit.  It’s use is to act as that bit of bridge code on a machine that supports some software defined networking.  Essentially, a network hub/switch/bridge on your machine.  My particular use case is to support interesting networking topologies with VMs, virtual lans and the like in the context of a data center.  Having OVS allows you to do some fairly fancy things in terms of managing VM movements, and virtual lans.

I have certainly done tons of integration with libraries of various types on platforms from the Raspberry pi, to Windows, and Linux in general.  Some code is easier to interop with and some code is harder.  I wrote a series a couple years back on the benefits and pitfalls of different styles of API development.

The OVS code is fairly easy to interop with using LuaJIT.  The publicly exposed stuff is all C99, and not particularly fancy.  There are a few cases where the usage of macros had me scratching my head, but a quick pass with a C compiler typically helps flesh those out.

One example will help illustrate the ease and pitfalls of doing some forms of interop.  I’ll use the UUID features in the library as an example.

First there’s data structures:

ffi.cdef[[
struct uuid {
uint32_t parts[4];
};
]]

The LuaJIT ffi makes pretty short work of that. Now the various routines that need a struct uuid will have access to what they need. The various functions are thus:

ffi.cdef[[
void uuid_init(void);
void uuid_generate(struct uuid *);
void uuid_zero(struct uuid *);
bool uuid_is_zero(const struct uuid *);
int uuid_compare_3way(const struct uuid *, const struct uuid *);
bool uuid_from_string(struct uuid *, const char *);
bool uuid_from_string_prefix(struct uuid *, const char *);
]]

Again, fairly simple. The use of ‘bool’ instead of ‘int’ as a return type from functions makes life a lot easier as the ffi will translate that to lua’s native boolean type. If it were an int, then instead of simpling doing

if not uuid_from_string(someuuidstring) then
  print("FAIL")
end

You’d have to write

if uuid_from_string(someuuidstring) ~= 0 then
  print("FAIL")
end

It seems like such a minor difference, but it’s actually a pretty huge pain in the backside. When the return value is an int, there are myriad things it can mean. One of them is ‘0 == success’, another is ‘0 == failure’, and there can by various others of course, including checking errno to find out what really went wrong or right. the bool is much more straight forward, success is usually indicated with ‘true’ and failure ‘false’.

Where things get a little more tricky and you have to pay special attention, is when you’re doing interop between different parts of the system and you flow through lua along the way.

Part of OVS has this dynamic string thing, which is one of those stretchy buffers, which will grow as necessary as you add stuff to it. One of the things it has is a ‘printf’ style formating thing.

ffi.cdef[[
void ds_put_format(struct ds *, const char *, ...) ;
]]

Yes, the LuaJIT ffi can deal with variadic functions, but you have to take extra care when feeding it the variable arguments. In particular, you’re responsible for dealing with the types of those arguments. This is where the care comes in. With normal functions (non-variadic), the ffi marshaling will just take care of things, and you don’t even realize a “number” is being turned into an ‘int32_t’, or whatever it needs to be. The ffi just figures it out based on the signature, and does the right thing. Not so with variadics. It can’t know what you need to pass.

So, here’s more from the UUID routines, some macros.

#define UUID_LEN 36
#define UUID_FMT "%08x-%04x-%04x-%04x-%04x%08x"
#define UUID_ARGS(UUID)                             \
    ((unsigned int) ((UUID)->parts[0])),            \
    ((unsigned int) ((UUID)->parts[1] >> 16)),      \
    ((unsigned int) ((UUID)->parts[1] & 0xffff)),   \
    ((unsigned int) ((UUID)->parts[2] >> 16)),      \
    ((unsigned int) ((UUID)->parts[2] & 0xffff)),   \
    ((unsigned int) ((UUID)->parts[3]))

In general, macros are of two types. First are the simple #defines which can turn into local variables. In this particular case, the UUID_LEN and UUID_FMT fit the bill. The UUID_LEN is just a numeric constant, so it can just become a regular number. The UUID_FMT doesn’t use any particularly fancy format routines (luckly lua string.format is similar enough in most cases), so that can become a simple string.

local UUID_LEN = 36
local UUID_FMT = "%08x-%04x-%04x-%04x-%04x%08x"

But what about that macro? Can we just turn it into a function, strip out the naughty bits, and call it a day?

local function UUID_ARGS(UUID)                             
    return ( ((UUID).parts[0])),            
    ( rshift((UUID).parts[1], 16)),      
    ( band((UUID).parts[1], 0xffff)),   
    ( rshift((UUID).parts[2], 16)),      
    ( band((UUID).parts[2], 0xffff)),   
    ( ((UUID).parts[3]))
end

Well, I want to write the following:

local success = uuid.uuid_from_string_prefix(id1, "12345678-1234-5678-1234-5678123456781234")
local output = ds.dynamic_string();
ds.ds_put_format(output, uuid.UUID_FMT, uuid.UUID_ARGS(id1))

print("OUTPUT: ", ds.ds_cstr(output));

Basically just round trip a string into a UUID and back out into a stretchy buffer, and print it as a lua string. Does it work? Nope, not quite. In this case, the output would be:

00000000-0000-0000-0000-45e4de9000000000

Not quite what I was expecting. Head scratch. OK, well, obviously I was a bit simplistic in my conversion of that UUID_ARGS macro. Maybe I should have preserved that casting stuff.

local uint32_t = ffi.typeof("uint32_t")

local function UUID_ARGS(UUID)                             
    local p1 = uint32_t(UUID.parts[0])
    local p2 = uint32_t(rshift(UUID.parts[1], 16))
    local p3 = uint32_t(band(UUID.parts[1], 0xffff))
    local p4 = uint32_t(rshift(UUID.parts[2], 16))
    local p5 = uint32_t(band(UUID.parts[2], 0xffff))
    local p6 = uint32_t(UUID.parts[3])

    return p1, p2, p3, p4, p5, p6
end

In addition to the temporary variables (used for debugging), I stripped out a number of parenthesis, because lua lives better with fewer parens. Basically, I used the ‘uint32_t’ type in a way that is similar to a typecast. Of course, it’s actually worse the way I’ve used it here because it creates a temporary number, which needs to be garbage collected, but the result is now:

OUTPUT: 12345678-1234-5678-1234-567812345678

Which is what I was expecting. So, maybe one last pass to clean it up…

local function UUID_ARGS(UUID)                             
    return ffi.cast("unsigned int", (UUID.parts[0])),
        ffi.cast("unsigned int", rshift(UUID.parts[1], 16)),
        ffi.cast("unsigned int", band(UUID.parts[1], 0xffff)),
        ffi.cast("unsigned int", rshift(UUID.parts[2], 16)),
        ffi.cast("unsigned int", band(UUID.parts[2], 0xffff)),
        ffi.cast("unsigned int", UUID.parts[3])
end

So, essentially, a more faithful representation of the original. Note though, although this will work with the UUID_FMT, when using a ‘C’ based ‘printf’ statements, it will NOT work with the lua string.format routine. It will complain that it was expecting a number, but received a ‘cdata’ instead. So, you have to think fairly clearly about how those APIs and macros are going to be used, and therefore how they should be translated.

Here’s that uuid.lua file in its entirety:

local ffi = require("ffi")
local bit = require("bit")

local rshift, band = bit.rshift, bit.band

local function BUILD_ASSERT_DECL(...)
    assert(...);
end

local UUID_BIT = 128;            -- Number of bits in a UUID. */
local UUID_OCTET = (UUID_BIT / 8); -- Number of bytes in a UUID. */

ffi.cdef[[
/* A Universally Unique IDentifier (UUID) compliant with RFC 4122.
 *
 * Each of the parts is stored in host byte order, but the parts themselves are
 * ordered from left to right.  That is, (parts[0] >> 24) is the first 8 bits
 * of the UUID when output in the standard form, and (parts[3] & 0xff) is the
 * final 8 bits. */
struct uuid {
    uint32_t parts[4];
};
]]

BUILD_ASSERT_DECL(ffi.sizeof("struct uuid") == UUID_OCTET);


local UUID_LEN = 36;
local UUID_FMT = "%08x-%04x-%04x-%04x-%04x%08x";

local function UUID_ARGS(UUID)                             
    return ffi.cast("unsigned int", (UUID.parts[0])),
        ffi.cast("unsigned int", rshift(UUID.parts[1], 16)),
        ffi.cast("unsigned int", band(UUID.parts[1], 0xffff)),
        ffi.cast("unsigned int", rshift(UUID.parts[2], 16)),
        ffi.cast("unsigned int", band(UUID.parts[2], 0xffff)),
        ffi.cast("unsigned int", UUID.parts[3])
end


--[[
/* Returns a hash value for 'uuid'.  This hash value is the same regardless of
 * whether we are running on a 32-bit or 64-bit or big-endian or little-endian
 * architecture. */
--]]
local function uuid_hash(uuid)
    return uuid.parts[0];
end

-- Returns true if 'a == b', false otherwise. */
local function uuid_equals(a, b)

    return (a.parts[0] == b.parts[0]
            and a.parts[1] == b.parts[1]
            and a.parts[2] == b.parts[2]
            and a.parts[3] == b.parts[3]);
end


  
ffi.cdef[[
void uuid_init(void);
void uuid_generate(struct uuid *);
void uuid_zero(struct uuid *);
bool uuid_is_zero(const struct uuid *);
int uuid_compare_3way(const struct uuid *, const struct uuid *);
bool uuid_from_string(struct uuid *, const char *);
bool uuid_from_string_prefix(struct uuid *, const char *);
]]

local Lib_uuid = ffi.load("openvswitch")

-- initialize uuid routines
Lib_uuid.uuid_init();

local exports = {
    Lib_uuid = Lib_uuid;

    UUID_BIT = UUID_BIT;
    UUID_OCTET = UUID_OCTET;
    UUID_FMT = UUID_FMT;
    UUID_ARGS = UUID_ARGS;

    -- types
    uuid = ffi.typeof("struct uuid");

    -- inline routines
    uuid_hash = uuid_hash;
    uuid_equals = uuid_equals;

    -- library functions
    uuid_init = Lib_uuid.uuid_init;
    uuid_generate = Lib_uuid.uuid_generate;
    uuid_zero = Lib_uuid.uuid_zero;
    uuid_is_zero = Lib_uuid.uuid_is_zero;
    uuid_compare_3way = Lib_uuid.uuid_compare_3way;
    uuid_from_string = Lib_uuid.uuid_from_string;
    uuid_from_string_prefix = Lib_uuid.uuid_from_string_prefix;
}

return exports

Just a couple more notes on style. When I’m using such a binding, I’ll typically do the following:

local uuid = require("uuid")

From there, I will want to utilize a simple notation to get at things.

local id = uuid.uuid(); 
uuid.uuid_generate(id);

local output = ds.dynamic_string();
ds.ds_put_format(output, uuid.UUID_FMT, uuid.UUID_ARGS(id))

 

I want to minimize the amount of times I have to use the ffi routines directly because they’re fairly low level, and can lead to some very subtle errors if you’re not careful. So, a properly constructed module will wrap up as much as possible.

In some cases I’ll create a ‘class’, which is a table to encapsulate the various bits and related bytes. In this case I did not, but I might still do it later if I find it useful.

I will typically do the ffi.load to get a handle on the library from which the routines are called, and stick that reference into a table, so the library reference won’t be garbage collected, and therefore possibly unloaded from the running system. And lastly, the function names have this construct:

    uuid_generate = Lib_uuid.uuid_generate;

It’s a simple aliasing so that I don’t have to remember which library the function is located in, I can just call uuid.uuid_generate, and the right thing will happen. This isn’t the most performant way to do it as the compiler can’t optimize the function call as much, so it will be slower. These routines are not typically called in a performant loop, so it’s an ok thing. If you were being really hard core, and needed the extra cycles, you’d make the library call directly instead of going through the table indirections.

The last trick you can do is to import that exports table into the global namespace:

for k,v in pairs(uuid) do
    _G[k] = v;
end

If you do this, then your code can drop the table indirection and become simply:

local id = uuid(); 
uuid_generate(id);

local output = dynamic_string();
ds_put_format(output, UUID_FMT, UUID_ARGS(id))

This is looking pretty close to the original ‘C’ code. Fewer pointers to deal with, no memory allocation problems, no funky difference between ‘.’ and ‘->’ indirection. In this way, Lua is just a nicer version of ‘C’ :-) My coworkers will get a kick out of that sentiment.

And so, there you have it. A fairly small example of beginning to interop with OVS. Open vSwitch is a fairly large codebase. It’s split between simple base library routines, reporting, core packet handling, and higher level apps, daemons and utilities. Starting from the top, the utilities are fairly easy to replace, once the interop at the lowest levels is dealt with. In LJIT2ovs, the lower levels are fairly well covered, and coverage increases as needs dictate. At the higher level, I am beginning with replacing simple routines, such as querying the database. At the end of the day, these higher level utilities are much easier to write, maintain, expand, and incorporate when they are written in script, rather than C.

Next time around I’ll examine some of these routines, and how they can be rewritten, and what benefits might acrue from that exercise.


schedlua – async io

And so, finally we come to the point. Thus far, we looked at the simple scheduler, the core signaling, and the predicate and alarm functions built atop that. That’s all great stuff for fairly straight forward apps that need some amount of concurrency. The best part though is when you can do concurrent networking stuff.

Here’s the setup; I want to issue 20 concurrent GET requests to 20 different sites, and get results back. I want the program to halt once all the tasks have been completed.

--test_linux_net.lua
package.path = package.path..";../?.lua"

local ffi = require("ffi")

local Kernel = require("kernel"){exportglobal = true}
local predicate = require("predicate")(Kernel, true)
local AsyncSocket = require("AsyncSocket")

local sites = require("sites");

-- list of tasks
local taskList = {}


local function httpRequest(s, sitename)
	local request = string.format("GET / HTTP/1.1\r\nUser-Agent: schedlua (linux-gnu)\r\nAccept: */*\r\nHost: %s\r\nConnection: close\r\n\r\n", sitename);
	return s:write(request, #request);
end

local function httpResponse(s)
	local BUFSIZ = 512;
	local buffer = ffi.new("char[512+1]");
	local bytesRead = 0
	local err = nil;
	local cumulative = 0

	repeat
		bytesRead, err = s:read(buffer, BUFSIZ);

		if bytesRead then
			cumulative = cumulative + bytesRead;
		else
			print("read, error: ", err)
			break;
		end
	until bytesRead < 1

	return cumulative;
end


local function siteGET(sitename)
	print("siteGET, BEGIN: ", sitename);

	local s = AsyncSocket();

	local success, err = s:connect(sitename, 80);  

	if success then
		httpRequest(s, sitename);
		httpResponse(s);
	else
		print("connect, error: ", err, sitename);
	end

	s:close();

	print("siteGET, FINISHED: ", sitename)
end


local function allProbesFinished()
	for idx, t in ipairs(taskList) do
		if t:getStatus() ~= "dead" then
			return false;
		end
	end

	return true;
end

local function main()
	for count=1,20 do
		table.insert(taskList, Kernel:spawn(siteGET, sites[math.random(#sites)]))
		Kernel:yield();
	end

	when(allProbesFinished, halt);
end

run(main)

Step by step. The httpRequest() function takes a socket, and does the most bare mimimal HTTP GET request, assuming the socket is already connected to the site.

Similarly, the httpResponse() function gets a response back from the server, and reads as much as it can until the socket is closed (because the Connection: close header was sent).

That’s about the most basic of HTTP request/response pairs you can have, ignoring doing any parsing of the returned data.

Alright, so let’s wrap those two up into a function called siteGET(). siteGET(sitename) takes the name of a site, creates a socket, connects it to the site, and then issues the httpRequest(), and then the httpResponse(). Very simple. What I like about this is that the httpRequest(); httpResponse() sequence is executed in serial as far as I’m concerned. I don’t have to be worried about the httpResponse() being issued before the request completes. Furthermore, if I didn’t use a spawn(), I could simply execute the code directly and be none the wiser.

I want to execute these siteGET()s concurrently though, so within main(), I start up 20 of these tasks, and let them go. Then comes the waiting part:

local function allProbesFinished()
	for idx, t in ipairs(taskList) do
		if t:getStatus() ~= "dead" then
			return false;
		end
	end

	return true;
end

	when(allProbesFinished, halt);

Going back to our knowledge of predicates, we know that the ‘when’ function takes a predicate (function that returns true/false), and will execute the second function when the predicate returns true.

OK, so we just need to come up with a predicate which tells us that all the tasks have completed. Easy enough as a list of the tasks is generated when they are spawned. So, we just go through that list and see if any of them are still running. If there is a single one that is still running, the predicate will return false, and ‘halt()’ will not be called. As soon as the last task finished, the predicate will return true, and the halt() function will be called.

Of course, most things in schedlua are convenient compositions of deeper things (with signals being at the core).

Instead of using the ‘when’ function, you could write the code more directly like this:

	while true
		if allProbesFinished() then
			halt();
			break;
		end
		yield();
	end

That doesn’t quite look as nice as just using the when() function I think. Also, you’re sitting in the main() function, which is no big deal as there’s nothing else trying to execute after this, but it just doesn’t seem as clean. Furthermore, the ‘when’ function might have some magic in its implementation, such as a better understanding of the state of tasks, or special knowledge of the scheduler, or who knows what. At any rate, either way essentially implements a barrier, and the technique can be used anywhere you want to perform an action after some set of tasks has completed. The allProbesFinished() function can be generalized to wait on any list of tasks, maybe call it “waitForTasks()” or some such thing.

At any rate, that completes the primitives that are baked into the core schedlua package. Everything from signals, to predicates, alarms, and finally async io. Of course this is Linux, so async io works with any file descriptor, not just network sockets, so file management or device communications in general can be thrown into the mix.

Now that the basics work, it’s a matter of cleaning up, writing more test cases, fixing bugs, reorganizing, and optimizing resource usage a bit. In general though, the constructs are there, and it’s ready to be used for real applications.


schedlua – the kernel

Previously, I talked about the scheduler within the new scedlua.  A scheduler is a fairly simple thing, it just decides which of the many ready tasks will run next.  The default scheduler follows a fairly simple FIFO strategy, so there are no priorities, favorites, or the like.  Of course this wouldn’t be any fun if you were stuck with just one scheduler, so naturally enough this is an easily pluggable part of the system.  But what does this plugging?

In steps the Kernel.  In general, the schedlua project is about creating a set of tools by which highly performant services can be constructed.  schedlua largely supports the concept that a single processor can be highly leveraged if programmed correctly.  It does not try to gain performance through the usage of multiple threads, but rather it just takes on the task of suspending various tasks which are blocked on IO or otherwise idle, and letting tasks which are ready to run do their thing.  The concurrency model is cooperative, not preemptive, so if any one task misbehaves, the process can become stuck.

So, let’s take a look at this code:

--kernel.lua
-- kernel is a singleton, so return
-- single instance if we've already been
-- through this code
print("== KERNEL INCLUDED ==")

local Scheduler = require("scheduler")
local Task = require("task")
local Queue = require("queue")
local Functor = require("functor")

local Kernel = {
	ContinueRunning = true;
	TaskID = 0;
	Scheduler = Scheduler();
	TasksSuspendedForSignal = {};
}

setmetatable(Kernel, {
    __call = function(self, params)
    	params = params or {}
    	params.Scheduler = params.Scheduler or self.Scheduler
    	
    	if params.exportglobal then
    		self:globalize();
    	end

    	self.Scheduler = params.Scheduler;

    	return self;
    end,
})

function Kernel.getNewTaskID(self)
	self.TaskID = self.TaskID + 1;
	return self.TaskID;
end

function Kernel.getCurrentTaskID(self)
	return self:getCurrentTask().TaskID;
end

function Kernel.getCurrentTask(self)
	return self.Scheduler:getCurrentTask();
end

function Kernel.spawn(self, func, ...)
	local task = Task(func, ...)
	task.TaskID = self:getNewTaskID();
	self.Scheduler:scheduleTask(task, {...});
	
	return task;
end

function Kernel.suspend(self, ...)
	self.Scheduler:suspendCurrentFiber();
	return self:yield(...)
end

function Kernel.yield(self, ...)
	return self.Scheduler:yield();
end


function Kernel.signalOne(self, eventName, ...)
	if not self.TasksSuspendedForSignal[eventName] then
		return false, "event not registered", eventName
	end

	local nTasks = #self.TasksSuspendedForSignal[eventName]
	if nTasks < 1 then
		return false, "no tasks waiting for event"
	end

	local suspended = self.TasksSuspendedForSignal[eventName][1];

	self.Scheduler:scheduleTask(suspended,{...});
	table.remove(self.TasksSuspendedForSignal[eventName], 1);

	return true;
end

function Kernel.signalAll(self, eventName, ...)
	if not self.TasksSuspendedForSignal[eventName] then
		return false, "event not registered"
	end

	local nTasks = #self.TasksSuspendedForSignal[eventName]
	if nTasks < 1 then
		return false, "no tasks waiting for event"
	end

	for i=1,nTasks do
		self.Scheduler:scheduleTask(self.TasksSuspendedForSignal[eventName][1],{...});
		table.remove(self.TasksSuspendedForSignal[eventName], 1);
	end

	return true;
end

function Kernel.waitForSignal(self, eventName)
	local currentFiber = self.Scheduler:getCurrentTask();

	if currentFiber == nil then
		return false, "not currently in a running task"
	end

	if not self.TasksSuspendedForSignal[eventName] then
		self.TasksSuspendedForSignal[eventName] = {}
	end

	table.insert(self.TasksSuspendedForSignal[eventName], currentFiber);

	return self:suspend()
end

function Kernel.onSignal(self, func, eventName)
	local function closure()
		self:waitForSignal(eventName)
		func();
	end

	return self:spawn(closure)
end



function Kernel.run(self, func, ...)

	if func ~= nil then
		self:spawn(func, ...)
	end

	while (self.ContinueRunning) do
		self.Scheduler:step();		
	end
end

function Kernel.halt(self)
	self.ContinueRunning = false;
end

function Kernel.globalize()
	halt = Functor(Kernel.halt, Kernel);
    onSignal = Functor(Kernel.onSignal, Kernel);

    run = Functor(Kernel.run, Kernel);

    signalAll = Functor(Kernel.signalAll, Kernel);
    signalOne = Functor(Kernel.signalOne, Kernel);

    spawn = Functor(Kernel.spawn, Kernel);
    suspend = Functor(Kernel.suspend, Kernel);

    waitForSignal = Functor(Kernel.waitForSignal, Kernel);

    yield = Functor(Kernel.yield, Kernel);
end

return Kernel;

 

From the top, you can see the Kernel requires the scheduler, task and functor. The scheduler has already been explained. The Kernel serves a couple of purposes. First of all, it manages the scheduler. The ‘run’ function at the bottom is the ‘loop’ of the application. It will run until ‘halt’ is called. Each time through the loop it’s telling the scheduler to take a step.

Also at the bottom, you can see usage of the Functor. A functor is just a simple convenience wrapper that helps you call a function on a table at a later point. Those functors are used to make the keywords global.

There are two primary things the kernel does. One is to spawn new tasks, the other is to provide a central point for signal handling.

First, let’s look at the ‘spawn’.

function Kernel.spawn(self, func, ...)
	local task = Task(func, ...)
	task.TaskID = self:getNewTaskID();
	self.Scheduler:scheduleTask(task, {...});
	
	return task;
end

This is actually the coolest part of the system in terms of how the programming model is expressed. Here’s an example of it in use.

local Kernel = require("kernel"){exportglobal = true}


local function numbers(ending)
	local idx = 0;
	local function closure()
		idx = idx + 1;
		if idx > ending then
			return nil;
		end
		return idx;
	end
	
	return closure;
end

local function task1()
	print("first task, first line")
	yield();
	print("first task, second line")
end

local function task2()
	print("second task, only line")
end

local function counter(name, nCount)
	for num in numbers(nCount) do
		print(name, num);
		yield();
	end
	halt();
end

local function main()
	local t0 = spawn(counter, "counter1", 5)
	local t1 = spawn(task1)
	local t2 = spawn(task2)
	local t3 = spawn(counter, "counter2", 7)
end

run(main)

Basically, any time you want something to happen concurrently, you just say ‘spawn(func, params) and that’s that.

What happens is a Task object is created which holds onto the function object as well as the initial set of parameters. This task is then sent to the scheduler to be run. From then on out you can forget about it. Of course, you’re handed the task when you say ‘spawn’, so you do have a chance of suspending and killing it off in the future if you like. Similarly, you can wait for a task to complete as well.

So, that’s spawning.

The other major feature in the kernel is signal handling.

function Kernel.waitForSignal(self, eventName)
	local currentFiber = self.Scheduler:getCurrentTask();

	if currentFiber == nil then
		return false, "not currently in a running task"
	end

	if not self.TasksSuspendedForSignal[eventName] then
		self.TasksSuspendedForSignal[eventName] = {}
	end

	table.insert(self.TasksSuspendedForSignal[eventName], currentFiber);

	return self:suspend()
end

This is probably THE most important routine in the whole system. Basically, take the current task, and put it onto the suspension list, connected with the signal name (signals are just a string value). Later on, when you want the task to resume, you would signal it using either signalOne(), or signalAll().

A little bit of code demonstrating this:

local Kernel = require("kernel"){exportglobal = true}
local Functor = require("functor")

local function numbers(ending)
	local idx = 0;
	local function closure()
		idx = idx + 1;
		if idx > ending then
			return nil;
		end
		return idx;
	end
	
	return closure;
end

local function waitingOnCount(name, ending)
	local eventName = name..tostring(ending)
	waitForSignal(eventName)

	print("REACHED COUNT: ", ending)
end

local function onCountFinished(name)
	print("Counter Finished: ", name)
end

local function counter(name, nCount)
	for num in numbers(nCount) do
		print(num)
		local eventName = name..tostring(num);
		--print(eventName)
		signalOne(eventName);

		yield();
	end

	signalAll(name..'-finished')
end

local function main()
	local t1 = spawn(counter, "counter", 50)
	local t2 = spawn(waitingOnCount, "counter", 20)
	local t3 = spawn(function() print("LAMDA"); waitForSignal("counter15") print("reached 15!!") end)
	
	-- test signalAll().  All three of these should trigger when
	-- counter finishes
	local t13 = onSignal(Functor(onCountFinished, "counter-1"), "counter-finished")
	local t14 = onSignal(Functor(onCountFinished, "counter-2"), "counter-finished")
	local t15 = onSignal(Functor(onCountFinished, "counter-3"), "counter-finished")
end

run(main)

Here, there is a counter(), which is just counting, and firing off a signal for each number. The various waitingOnCount(), and LAMBDA routines are going to respond to the appropriate signals.

Finally, the t13, t14, and t15 tasks are waiting for the “counter-finished” signal, and they will all fire off and print their little message.

Of course, at this point you could have something that would call ‘halt()’ so you don’t have to press Ctl-C to stop the process, but you get the idea.

And that’s pretty much it for the kernel. Absent from here are the async io, predicates, alarms and the like. They are available in schedlua, but they’re not a part of the kernel. Instead of being part of the kernel proper, these are essentially modules. They utilize the signal and spawn features built into the kernel, and they’re free to do their own thing.

I’ll get into the details of alarms and predicates next time around to demonstrate the concept of easy add-on modules.


Schedulers revisited

A ran quite a long series on the buildup of the TINN scheduler, including this gem on the primitives included in the application model: Parallel Conversations

The scheduler developed in that whole series served me very well, taking on timing, predicates, signals, and async IO.  It was all based on Windows primitives, and although fairly modular, was an evolved codebase, which had some lumps and inefficiencies.  So now, two years on, I decided to reimagine that scheduler.  Primary reasons?  I had some time on my hands, and I wanted a scheduler that works equally well on Linux as on Windows.

And so, the schedlua project was born.

What were the design criteria this time around?

  • Reduce Size and complexity
  • Increase composability
  • Do not force features where they’re not desired

In the previous incarnation, I went back and forth between having an “Application” object, and the “Scheduler”.  Things were primarily driven by the Application, and it had a central role in the instance of any running app.  I still favor having this central role, but I changed the meaning and composition of that central player this time around.  First though, there is the “Scheduler”.


local ffi = require("ffi");

local Queue = require("queue")
local Task = require("task");


--[[
	The Scheduler supports a collaborative processing
	environment.  As such, it manages multiple tasks which
	are represented by Lua coroutines.

	The scheduler works by being the arbitrator of which routine is running
	in the application.  When work is to be done, it is encapsulated in 
	the form of a task object.  The scheduler relies upon that task object
	to call a form of 'yield' at some point, at which time it will receive
	the main thread of execution, and will pick the next task out of the ready
	list to run.
--]]
local Scheduler = {}
setmetatable(Scheduler, {
	__call = function(self, ...)
		return self:create(...)
	end,
})
local Scheduler_mt = {
	__index = Scheduler,
}

function Scheduler.init(self, ...)
	local obj = {
		TasksReadyToRun = Queue();
	}
	setmetatable(obj, Scheduler_mt)
	
	return obj;
end

function Scheduler.create(self, ...)
	return self:init(...)
end


function Scheduler.tasksPending(self)
	return self.TasksReadyToRun:length();
end


-- put a task on the ready list
-- the 'task' should be something that can be executed,
-- whether it's a function, functor, or something that has a '__call'
-- metamethod implemented.
-- The 'params' is a table of parameters which will be passed to the function
-- when it's ready to run.
function Scheduler.scheduleTask(self, task, params)
	--print("Scheduler.scheduleTask: ", task, params)
	params = params or {}
	
	if not task then
		return false, "no task specified"
	end

	task:setParams(params);
	self.TasksReadyToRun:enqueue(task);	
	task.state = "readytorun"

	return task;
end

function Scheduler.removeFiber(self, fiber)
	--print("REMOVING DEAD FIBER: ", fiber);
	return true;
end

function Scheduler.inMainFiber(self)
	return coroutine.running() == nil; 
end

function Scheduler.getCurrentFiber(self)
	return self.CurrentFiber;
end

function Scheduler.suspendCurrentFiber(self, ...)
	self.CurrentFiber.state = "suspended"
end

function Scheduler.step(self)
	-- Now check the regular fibers
	local task = self.TasksReadyToRun:dequeue()

	-- If no fiber in ready queue, then just return
	if task == nil then
		print("Scheduler.step: NO TASK")
		return true
	end

	if task:getStatus() == "dead" then
		self:removeFiber(task)

		return true;
	end

	-- If the task we pulled off the active list is 
	-- not dead, then perhaps it is suspended.  If that's true
	-- then it needs to drop out of the active list.
	-- We assume that some other part of the system is responsible for
	-- keeping track of the task, and rescheduling it when appropriate.
	if task.state == "suspended" then
		--print("suspended task wants to run")
		return true;
	end

	-- If we have gotten this far, then the task truly is ready to 
	-- run, and it should be set as the currentFiber, and its coroutine
	-- is resumed.
	self.CurrentFiber = task;
	local results = {task:resume()};

	-- once we get results back from the resume, one
	-- of two things could have happened.
	-- 1) The routine exited normally
	-- 2) The routine yielded
	--
	-- In both cases, we parse out the results of the resume 
	-- into a success indicator and the rest of the values returned 
	-- from the routine
	--local pcallsuccess = results[1];
	--table.remove(results,1);

	local success = results[1];
	table.remove(results,1);

--print("PCALL, RESUME: ", pcallsuccess, success)

	-- no task is currently executing
	self.CurrentFiber = nil;


	if not success then
		print("RESUME ERROR")
		print(unpack(results));
	end

	-- Again, check to see if the task is dead after
	-- the most recent resume.  If it's dead, then don't
	-- bother putting it back into the readytorun queue
	-- just remove the task from the list of tasks
	if task:getStatus() == "dead" then
		self:removeFiber(task)

		return true;
	end

	-- The only way the task will get back onto the readylist
	-- is if it's state is 'readytorun', otherwise, it will
	-- stay out of the readytorun list.
	if task.state == "readytorun" then
		self:scheduleTask(task, results);
	end
end

function Scheduler.yield(self, ...)
	return coroutine.yield(...);
end


return Scheduler

I think the code is pretty straight forward to follow, except for that ‘step()’ function.  So, what is a scheduler anyway?  Well, we want to have some level of ‘concurrency’ in our applications.  That is, I would love to be able to have the appearance of running several tasks at once on my machine, without much work.  In modern day computing, this is fairly well covered by the OS as applications.  We can run several apps at the same time on our computers, and the kernel takes care of the details of how that gets done.  Modern kernels simply split  time between tasks, giving some amount of time to each task in turn, so they feel like they are running in parallel, or concurrently.  It’s a nice illusion because the machines are fast enough, and there are enough times when a task will simply be waiting around anyway that the amount of concurrency achieved can be very high.

So, what if you want some of that kernel goodness within the context of your app directly?  You want to handle mouse and keyboard, while doing some drawing and networking at the same time.  This is typically where ‘threads’ step in, and down the rabbit hole we go, getting involved in threading models, concurrency primitives and the like.  Well, in Lua, there’s a concurrency primitive built in, in the form of coroutines.  Coroutines are a fairly old style of concurrency whereby the application gets control of the CPU, and only gives it up willingly by calling a ‘yield()’ function at some point.  OK, great.  So, you start a routine, and at some point you call yield, and some other routine gets a chance to run.  But, which routine?  Aha, so this is where the ‘scheduler’ comes in.

The scheduler might be considered the ‘home’ routine.  That is, once a routine gives up a slide of time by calling ‘yield’, the scheduler is the place they will return to.  At that point, the scheduler can decide which routines is to be run next.  It can do this because it has a list of ready to run tasks (those placed on the list using ‘scheduleTask’).  In this scheduler, it simply takes the next one off the list, and tells it to resume.  That is the essence of the ‘step()’ function.

Great!  And what does it look like in action?

--test_scheduler.lua
package.path = package.path..";../?.lua"

Scheduler = require("scheduler")()
Task = require("task")
local taskID = 0;

local function getNewTaskID()
	taskID = taskID + 1;
	return taskID;
end

local function spawn(scheduler, func, ...)
	local task = Task(func, ...)
	task.TaskID = getNewTaskID();
	scheduler:scheduleTask(task, {...});
	
	return task;
end


local function task1()
	print("first task, first line")
	Scheduler:yield();
	print("first task, second line")
end

local function task2()
	print("second task, only line")
end

local function main()
	local t1 = spawn(Scheduler, task1)
	local t2 = spawn(Scheduler, task2)

	while (true) do
		--print("STATUS: ", t1:getStatus(), t2:getStatus())
		if t1:getStatus() == "dead" and t2:getStatus() == "dead" then
			break;
		end
		Scheduler:step()
	end
end

main()

That’s the scheduler in isolation.  In this test case, you can see that task creation, and the ‘spawn’ primitive aren’t part of the scheduler proper.  They don’t need to be.  The scheduler should only be concerned with deciding which task is going to be next.  In this way, you can imagine creating all sorts of different kinds of schedulers.  You cold introduce the concept of priorities, or aging, or realtime, or whatever.  The same general idea would apply, the scheduler just needs to decide which task is going to run next.  Also of not here, the ‘main()’ function operates as the “even loop” if you will.  This is the bit of code that drives the scheduler between steps.  Having this separate from the scheduler proper will become advantageous down the road when we want to compose the scheduler will a larger application framework.

So, this is the first step in revitalizing the concurrency core I built into the TINN project.  This time around, more composable, more feature rich, more easily cross platform, more bang for the buck.  Next time around, I’ll examine the “kernel”.


TINN Reboot

I always dread writing posts that start with “it’s been a long time since…”, but here it is.

It’s been a long time since I did anything with TINN.  I didn’t actually abandon it, I just put it on the back burner as I was writing a bunch of code in C/C++ over the past year.  I did do quite a lot of experimental stuff in TINN, adding new interfaces, trying out new classes, creating a better coroutine experience.

The thing with software is, a lot of testing is required to ensure things actually work as expected and fail gracefully when they don’t.  Some things I took from the ‘experimental’ category are:

fun.lua – A library of functional routines specifically built for LuaJIT and it’s great handling of tail recursion.

msiterators.lua – Some handy iterators that split out some very MS specific string types

Now that msiterators is part of the core, it makes it much easier to do things like query the system registry and get the list of devices, or batteries, or whatever, in a simple table form.  That opens up some of the other little experiments, like enumerating batteries, monitors, and whatnot, which I can add in later.

There are not earth shattering, and don’t represent a year’s worth of waiting, but soon enough I’ll create a new package with new goodness in it.  This begs the question, what is TINN useful for?  I originally created it for the purpose of doing network programming, like you could do with node.  Then it turned into a way of doing Windows programming in general.  Since TINN provides scripted access to almost all the interesting low level APIs that are in Windows, it’s very handy for trying out how an API works, and whether it is good for a particular need.

In addition to just giving ready access to low level Windows APIs, it serves as a form of documentation as well.  When I look at a Windows API, it’s not obvious how to handle all the parameters.  Which ones do I allocate, which ones come from the system, which special function do I call when I’m done.  Since I read the docs when I create the interface, the wrapper code encapsulates that reading of the documentation, and thus acts as an encapsulated source of knowledge that’s sitting right there with the code.  Quite handy.

At any rate, TINN is not dead, long live TINN!


Follow

Get every new post delivered to your Inbox.

Join 51 other followers