Windows and Raspberry Pi, Oh my!
Posted: February 2, 2015 Filed under: Musings, RaspBerry Pi, TINN | Tags: windows Leave a commentI woke this morning to two strange realities. My sometimes beloved Seahawks did not win the SuperBowl, and the Raspberry Pi Foundation announced the Raspberry Pi 2, which will run Windows 10!
I’ll conveniently forget the first reality for now as there’s always next season. But that second reality? I’ve long been a fan of the Raspberry Pi. Not because of the specific piece of hardware, but because at the time it was first announced, it was the first of the somewhat reasonable $35 computers. The hardware itself has long since been eclipsed by other notables, but none of them have quite got the Raspberry Pi community thing going on, nor the volumes. Now the Pi is moving into “we use them for embedded” territory, not just for the kids to learn programming.
And now along comes Windows! This is interesting in two respects. First, I did quite a bit of work putting a LuaJIT skin on the Raspberry Pi some time back. At the time, I did it because I wanted to learn all about the deep down internals of the Raspberry Pi, but from the comforts of Lua. At the time, I leveraged an early form of the ljsyscall library to take care of the bulk of the *NIX specific system calls. I was going to go one step further and implement the very lowest interface to the Video chip, but that didn’t seem like a very worthwhile effort, so I left it at the Khronos OpenGL ES level.
At roughly the same time, I started implementing LuaJIT Win32 APIs, starting with LJIT2Win32. Then I went hog wild and implemeted TINN, which for me is the ultimate in LuaJIT APIs for Win32 systems. Both ljsyscall and TINN exist because programming at the OS level is a very tedious/esoteric process. Most of the time the low level OS specifics are paved over with one higher level API/framework or another. Well, these are in fact such frameworks, giving access to the OS at a very high level from the LuaJIT programming language.
So, this new Windows on Pi, what of it? Well, finally I can program the Raspberry Pi using the TINN tool. This is kind of cool for me. I’m not forced into using Linux on this tiny platform, where I might be more familiar with the Windows API and how things work. Even better, as TINN is tuned to running things like coroutines and IO Completion ports, I should be able to push the tiny device to its limits with respect to IO at least. Same goes for multi-threaded programming. All the goodness I’ve enjoyed on my Windows desktop will now be readily available to me on the tiny Pi.
The new pi is a quad core affair, which means the kids will learn about muteness, semaphores and the like… Well, actually, I’d expect the likes of the go language, TINN, and other tools to come to the rescue. The beauty of Windows on Pi is likely going to be the ease of programming. When I last programmed on the Pi directly, I used the nano editor, and print() for debugging. I couldn’t really use eclipse, as it was too slow back then. Now the Pi will likely just be a Visual Studio target, maybe even complete with simulator. That would be a great way to program. All the VS goodness that plenty of people have learned to love. Or maybe a slimmed down version that’s not quite so enterprise industrial.
But, what are these Pi used for anyway? Are they truly replacement PC? Are they media servers, NAS boxes, media players? The answer is YES to all, to varying degrees. Following along the ‘teach the kids to program’ theme, having an relatively inexpensive box that allows you to program can not be a bad thing. Making Windows and Linux available can not be a bad thing. Having a multi-billion dollar software company supporting your wares, MUST be a good thing. Love to hate Microsoft? Meh, lots of Windows based resources are available in the world, so, I don’t see how it does any harm.
On the very plus side, as this is a play towards makers, it will force Microsoft to consider the various and sundry application varieties that are currently being pursued by those outside the corporate enterprise space. Robotics will force a reconsideration of realtime constraints. As well, vision might become a thing. Creating an even more coherent story around media would be a great thing. And maybe bringing the likes of the Kinect to this class of machine? Well, not in this current generation.
The news on this monday is both melancholy and eye brow raising. I for one will be happy to program the latest Raspberry Pi using TINN.
TINN Reboot
Posted: January 14, 2015 Filed under: LuaJIT, News, TINN | Tags: core libraries, win32 Leave a commentI always dread writing posts that start with “it’s been a long time since…”, but here it is.
It’s been a long time since I did anything with TINN. I didn’t actually abandon it, I just put it on the back burner as I was writing a bunch of code in C/C++ over the past year. I did do quite a lot of experimental stuff in TINN, adding new interfaces, trying out new classes, creating a better coroutine experience.
The thing with software is, a lot of testing is required to ensure things actually work as expected and fail gracefully when they don’t. Some things I took from the ‘experimental’ category are:
fun.lua – A library of functional routines specifically built for LuaJIT and it’s great handling of tail recursion.
msiterators.lua – Some handy iterators that split out some very MS specific string types
Now that msiterators is part of the core, it makes it much easier to do things like query the system registry and get the list of devices, or batteries, or whatever, in a simple table form. That opens up some of the other little experiments, like enumerating batteries, monitors, and whatnot, which I can add in later.
There are not earth shattering, and don’t represent a year’s worth of waiting, but soon enough I’ll create a new package with new goodness in it. This begs the question, what is TINN useful for? I originally created it for the purpose of doing network programming, like you could do with node. Then it turned into a way of doing Windows programming in general. Since TINN provides scripted access to almost all the interesting low level APIs that are in Windows, it’s very handy for trying out how an API works, and whether it is good for a particular need.
In addition to just giving ready access to low level Windows APIs, it serves as a form of documentation as well. When I look at a Windows API, it’s not obvious how to handle all the parameters. Which ones do I allocate, which ones come from the system, which special function do I call when I’m done. Since I read the docs when I create the interface, the wrapper code encapsulates that reading of the documentation, and thus acts as an encapsulated source of knowledge that’s sitting right there with the code. Quite handy.
At any rate, TINN is not dead, long live TINN!
A Dictionary with a count
Posted: February 7, 2014 Filed under: Core Programming, LuaJIT, TINN | Tags: bag, luajit 1 CommentThe most interesting type in Lua is the table object. The table serves dual purposes. It can act as an array, as well as a hash table. As a dictionary, you can do simple things like:
local tbl = {} tbl["alpha"] = "alpha-value"
The problem with this construction is that you can not easily find out the number of items that are within the dictionary. The easiest way is to enumerate the whole thing, and keep a count:
local count = 0; for k,v in pairs(tbl) do count = count + 1; end
And so, what I really want is something that acts as a simple dictionary, but gives me the ability to easily find the count. I’ve created the “Bag”, which is simply a wrapper on a table, but it gives you a count.
local Bag = {} setmetatable(Bag, { __call = function(self, ...) return self:_new(...); end, }) local Bag_mt = { __index = function(self, key) --print("__index: ", key) return self.tbl[key] end, __newindex = function(self, key, value) --print("__newindex: ", key, value) if value == nil then self.__Count = self.__Count - 1; else self.__Count = self.__Count + 1; end --rawset(self, key, value) self.tbl[key] = value; end, __len = function(self) -- print("__len: ", self.__Count) return self.__Count; end, } function Bag._new(self, obj) local obj = { tbl = {}, __Count = 0, } setmetatable(obj, Bag_mt); return obj; end
Each Bag instance has a ‘tbl’ and a ‘__Count’. Within the metastable, both the ‘__newindex’ and the ‘__index’ are implemented. The ‘__newindex’ is used when you make an assignment. So, when I do:
tbl["alpha"] = "alpha-value"
The ‘__newindex’ is called. This will in turn simply put the value into the self.tbl table. While it does that, it also increments the count of values that are stored in the Bag. When you assign “nil” to an entry, this will remove it from the underlying table, and thus decrement the count.
Then there’s the ‘__len’ metamethod. This will normally return the length of an item. In the case of regular tables, as long as they are being used as arrays (contiguous indices), then it will return the number of items. In the case of a regular table being used as a dictionary, it will return 0. So, implementing it here gives the Bag the ability to use the convenient ‘#’ operator.
local Collections = require("Collections") local names = { alpha = "alpha-value"; beta = "beta-value"; gamma = "gamma-value"; } local bg = Collections.Bag(); for k,v in pairs(names) do print("adding: ", k, v) bg[k] = v; end; print("Count after add: ", #bg) bg["gamma"] = nil; print("Count, after 1 remove: ", #bg) print("beta: ", bg["beta"])
Lua 5.1 will not utilize the ‘__len’ metamethod, only 5.2 will do that. So, with LuaJIT, which is half way between the two, you need to make sure you compile with the -DLUAJIT_ENABLE_LUA52COMPAT flag, or you won’t get the expected behavior.
This is great, and achieves what I wanted. But, I still want it to act as a dictionary in terms of being iterable, so I should implement a __pairs metamethod, but that can wait.
So, there you have it. A quick and dirty improvement upon tables which saves me the headache and wasted CPU cycles of counting my dictionary entries.
There are many situations where I am using a simple dictionary, but I want to quickly find the count of items in the dictionary
Iterating Over Oddities – of strings, arrays, and counting from 0
Posted: February 5, 2014 Filed under: Core Programming, Lua, LuaJIT, TINN | Tags: functional programming, iterator, lua Leave a commentCould there possibly be anything more said about iterators and strings? Well, yah, actually tons. Last time around, I showed a simple iterator over a “string”. The focus was primarily on satisfying the job of parsing out null terminated strings from within a ‘null terminated’ string.
As I was doing that, I was also speculating as to whether I could use the exact same iterator to parse fixed sized records from an array of records. I actually wrote a giant iterator that does just that. Then I got to thinking, ‘this isn’t the way to do it’. Iterators, you see, can be broken down into constituent parts. The Lua documentation itself has quite a lot to say about iterators of various forms. One of the most promising bits of documentation is on 7.3 – Stateless Iterators. In order to pull this off, you split up the “iterator” into a few parts.
generator – The function that gets called every time you need a new value. The parameters that are passed to it are the “fixedpart”, and the “control”.
invariant state – this is the part of the iterator that doesn’t change much. For example, the source string that you might be iterating over.
index – this is the part that changes every time generator is called.
so, when you do the following:
local values = {'a', 'b', 'c'} for _idx, value in ipairs(values) do print(value) end
The ‘generator’ is some function returned from the ‘ipairs()’ function. It will be called again and again, until it returns nil.
The ‘invariant state’ is the ‘values’ array. This will be fed to the generator each time a new value is needed.
And last, the _idx, is the index value. It will also be fed to the generator, along with the ‘invariant state’.
So, how about applying this to my previous multi string iterator?
local ffi = require("ffi") local fun = require("fun") local floor = math.floor; -- a nil generator. -- good for cases when there's no data local function nil_gen(param, state) return nil end local function delim_gen(param, idx) local len = 0; while ((idx+len) < param.nelems) do --print("wchar: ", string.char(ffi.cast(param.basetypeptr, param.data)[idx + len])) if ffi.cast(param.basetypeptr, param.data)[idx + len] ~= param.separator then len = len + 1; else break end end if len == 0 then return nil; end return idx + len + 1, ffi.cast(param.basetypeptr, param.data)+idx, len end local function array_gen(param, idx) if idx >= param.nelems then return nil; end return idx+1, ffi.cast(param.basetypeptr, param.data)+idx, 1 end local function striter(params) if not params then return nil_gen, params, nil end if not params.data then return nil_gen, params, nil end params.datalength = params.datalength or #params.data if params.basetype then if type(params.basetype)== "string" then params.basetype = ffi.typeof(params.basetype) end end params.basetype = params.basetype or ffi.typeof("char") params.basetypeptr = ffi.typeof("const $ *", params.basetype) params.basetypesize = ffi.sizeof(params.basetype) params.nelems = math.floor(params.datalength / params.basetypesize) if params.separator ~= nil then return delim_gen, params, 0 else return array_gen, params, 0 end return nil_gen, nil, nil end
How to apply it?
local src3 = "big,boy,baby,bear,bounces,basketballs,behind,the,barn,," local function printAnsi(ptr, len) print(ffi.string(ptr, len)) end each(printAnsi, striter{data=src3, basetype="char"})
Here, I am using the Lua Fun ‘each’ function to drive my iterator. I could just as easily use a simple ‘for – in’ loop, but I’m getting all functional these days. What the last statement says is, “for each of the items coming out of the striter iterator, call the ‘printAnsi()’ function”.
The striter function is called with a table that contains the various parameters it will need. In this particular case, I’ve left off the parenthesis, because in Lua, if it’s just a single table value, or a string, you can do that.
So, how about that striter() function? Looking back, it has the job of returning a ‘generator’, ‘invariant state’, and a ‘index’. Well, it cheats a bit because the ‘invariant’ also contains the ‘index’. The ‘invariant’ is the fact that the table value doesn’t change, even though the contents can. This is just a matter of convenience.
At any rate, the striter() function decides which generator to return based on what it sees in the parameters. For example, if it sees a separator, then it will return the ‘delim_gen’ generator. That generator functions pretty much the same was as the one I created last time for the multi string thing. In the case where it doesn’t see a separator, it will return the ‘array_gen’ generator. That generator will assume it is being handed a pointer to an array of values of a particular type.
One thing to note that is different this time around from the mstrziter, Lua string creation does not occur within the iterator itself. Rather than return a string value, the iterator will simply return an offset and a length. It is up to the caller to determine what they want to do with the values.
This is kind of a key to an IEnumerable chain. Do the least amount of work as possibly, deferring really heavy work towards the end of your chain. This lazy evaluation makes for a more efficient chain. So, the ‘printAnsi’ function is at the end of the chain. It might have turned out that instead of creating strings at all, I might have wanted to send the values across a network to be stored in a database. In that case, the pointer, offset, length is perfect to be consumed directy by the Socket:send(buff, len) function, so no copying would be necessary.
How about that array case?
Let’s imagine I wanted to print out every value of the string one by one.
each(printAnsi, striter{data=src3, basetype="char"})
In this case, I’m creating an iterator, not specifying the separator (so the array_gen will be used). I have also specified the ‘basetype’ of the elements of my array. That’s so it can calculate how many there are, and create a pointer of the appropriate type. And you’re done!
Of course, the ‘basetype’ could just as easily be ‘BGR32’, or ‘PersonRecord’, or whatever fixed size type you so happen to have stored in some array. Makes for some fairly easy ‘tokenizing’ of array values.
To go further, what’s say you have a multi string based on ‘wchar_t’, and delimeted by ‘ ‘ (space) characters?
How about a little convenience function?
local function wmstriter(data, separator, datalength) if type(separator) == "string" then separator = string.byte(separator) end datalength = datalength or ffi.sizeof(data) return map(core_string.toAnsi, striter{data=data, datalength = datalength, basetype="wchar_t", separator=separator}) end
and using it:
local whello, whellolen = core_string.toUnicode("Hello World"); each(print, wmstriter(whello, ' '))
In this case, I create a ‘wchar_t’ based string, using the ‘toUnicode()’ function. Then I feed that to the wmstriter iterator, and print out each of the words, delimeted by a ‘(sp)’.
The core ‘striter()’ remains the same, and the generators don’t change. You just build up more useful ways of feeding them and consuming them.
I find this to be fairly useful and powerful. When you think about the generators, they have a fairly simple task. Do one small job, and do it well. The complexity of a larger system is gained through thoughtful composition of these simpler parts, rather than building hugely complex macro parts.
This is helpful for code maintenance. I find it hard to maintain largish single functions with hundreds of lines of code. I find it fairly easy to maintain a generator that has a few lines of code, and does a fairly simple job. I find the composition model easy as well. I can look back on it over time, read the functional chain, and understand what was intended.
And there you have it. More iterating over oddities.
Device Iteration with Functional Programming
Posted: January 29, 2014 Filed under: Core Programming, Lua, LuaJIT, Microsoft, System Programming, TINN | Tags: device enumeration, functional programming, lua fun, tinn, windows Leave a commentOne of the great pleasures I have in life is learning something new. There’s nothing greater than those ‘light bulb goes on’ moments as you realize something and gain a much deeper understanding than you had before.
Well, a little while ago, there was an announcement of this thing called Lua Fun. Lua Fun is a large set of functions which make functional programming in Lua really easy. It has the usual suspects such as map, reduce, filter, each, etc. If you read the documentation, you get a really good understanding of how iterators work in Lua, and more importantly, how LuaJIT is able to fold and manipulate things in hot loops such that the produced code is much tighter than anything I could possibly write in C, or any other language I so happen to use.
So, now I’m a fan of Lua Fun, and I would encourage anyone who’s both into Lua, and functional programming to take a look.
How to use it? I’ve been enumerating various types of things in the Windows system of late. Using the MMDevice subsystem, I was able to get an enumeration of the audio devices (that took a lot of COM work). What about displays, and disk drives, and USB devices, and… Yes, each one of those things has an attendant API which will facilitate monitoring said category. But, is there one to rule them all? Well yes, as it turns out, in most cases what these various APIs are doing is getting information out of the System Registry, and just presenting it reasonably.
There is a way to enumerate all the devices in the system. You know, like when you bring up the Device Manager in Windows, and you see a tree of devices, and their various details. The stage is set, how do you do that? I created a simple object that does the grunt work of enumerating the devices in the system.
The DeviceRecordSet is essentially a query. Creating an instance of the object just gives you a handle onto making query requests. Here is the code:
local ffi = require("ffi") local bit = require("bit") local bor = bit.bor; local band = bit.band; local errorhandling = require("core_errorhandling_l1_1_1"); local SetupApi = require("SetupApi") local WinNT = require("WinNT") local DeviceRecordSet = {} setmetatable(DeviceRecordSet, { __call = function(self, ...) return self:create(...) end, }) local DeviceRecordSet_mt = { __index = DeviceRecordSet, } function DeviceRecordSet.init(self, rawhandle) print("init: ", rawhandle) local obj = { Handle = rawhandle, } setmetatable(obj, DeviceRecordSet_mt) return obj; end function DeviceRecordSet.create(self, Flags) Flags = Flags or bor(ffi.C.DIGCF_PRESENT, ffi.C.DIGCF_ALLCLASSES) local rawhandle = SetupApi.SetupDiGetClassDevs( nil, nil, nil, Flags); if rawhandle == nil then return nil, errorhandling.GetLastError(); end return self:init(rawhandle) end function DeviceRecordSet.getNativeHandle(self) return self.Handle; end function DeviceRecordSet.getRegistryValue(self, key, idx) idx = idx or 0; did = ffi.new("SP_DEVINFO_DATA") did.cbSize = ffi.sizeof("SP_DEVINFO_DATA"); --print("HANDLE: ", self.Handle) local res = SetupApi.SetupDiEnumDeviceInfo(self.Handle,idx,did) if res == 0 then local err = errorhandling.GetLastError() --print("after SetupDiEnumDeviceInfo, ERROR: ", err) return nil, err; end local regDataType = ffi.new("DWORD[1]") local pbuffersize = ffi.new("DWORD[1]",260); local buffer = ffi.new("char[260]") local res = SetupApi.SetupDiGetDeviceRegistryProperty( self:getNativeHandle(), did, key, regDataType, buffer, pbuffersize[0], pbuffersize); if res == 0 then local err = errorhandling.GetLastError(); --print("after GetDeviceRegistryProperty, ERROR: ", err) return nil, err; end --print("TYPE: ", regDataType[0]) if (regDataType[0] == 1) or (regDataType[0] == 7) then return ffi.string(buffer, pbuffersize[0]-1) elseif regDataType[0] == ffi.C.REG_DWORD_LITTLE_ENDIAN then return ffi.cast("DWORD *", buffer)[0] end return nil; end function DeviceRecordSet.devices(self, fields) fields = fields or { {ffi.C.SPDRP_DEVICEDESC, "description"}, {ffi.C.SPDRP_MFG, "manufacturer"}, {ffi.C.SPDRP_DEVTYPE, "devicetype"}, {ffi.C.SPDRP_CLASS, "class"}, {ffi.C.SPDRP_ENUMERATOR_NAME, "enumerator"}, {ffi.C.SPDRP_FRIENDLYNAME, "friendlyname"}, {ffi.C.SPDRP_LOCATION_INFORMATION , "locationinfo"}, {ffi.C.SPDRP_LOCATION_PATHS, "locationpaths"}, {ffi.C.SPDRP_PHYSICAL_DEVICE_OBJECT_NAME, "objectname"}, {ffi.C.SPDRP_SERVICE, "service"}, } local function closure(fields, idx) local res = {} local count = 0; for _it, field in ipairs(fields) do local value, err = self:getRegistryValue(field[1], idx) if value then count = count + 1; res[field[2]] = value; end end if count == 0 then return nil; end return idx+1, res; end return closure, fields, 0 end return DeviceRecordSet
The ‘getRegistryValue()’ function is the real workhorse of this object. That’s what gets your values out of the system registry. The other function of importance is ‘devices()’. This is an iterator.
There are a couple of things of note about this iterator. First of all, it does not require ‘up values’ to be held onto. All that means is that everything the iterator needs to operate is carried in the return values from the function. The ‘state’ if you will, is handed in fresh every time the ‘closure()’ is called. This is the key to creating an iterator that will work well with Lua Fun.
By default, this iterator will return quite a few (but not all) fields related to each object, and it will return all the objects. This is ok, because there are typically less than 150 objects in any given system.
Now, I want to do various queries against this set without much fuss. This is where Lua Fun, and functional programming in general, really shines.
First, a little setup:
--test_enumdevices.lua local ffi = require("ffi") local DeviceRecordSet = require("DeviceRecordSet") local serpent = require("serpent") local Functor = require("Functor") local fun = require("fun")() local drs = DeviceRecordSet(); local function printIt(record) print("==========") each(print, record) print("----------") end
This creates an instance of the DeviceRecordSet object, which will be used in the queries. Already the printIt() function is utilizing Lua Fun. The ‘each()’ function will take whatever it’s handed, and perform the function specified. In this case, the ‘record’ will be a table. So, each will iterate over the table entries and print each one of them out. This is the equivalent of doing:
for k,v in pairs(record)
print(k, v)
end
I think that simply typing ‘each’ is a lot simpler and pretty easy to understand.
How about a query then?
-- show everything for every device each(printIt, drs:devices())
In this case, the ‘each’ is applied to the results of the ‘devices()’ iterator. For each record coming from the devices iterator, the printIt function will be called, which will in turn print out all the values in the record. That’s pretty nice.
What if I don’t want to see all the fields in the record, I just want to see the objectname, and description fields. Well, this is a ‘map’ operation, or a projection in database parlance, so:
-- do a projection on the fields local function projection(x) return {objectname = x.objectname, description = x.description} end each(printIt, map(projection, drs:devices()))
Working from the inside out, for each record coming from the devices() iterator, call the ‘projection’ function. The return value from the projection function becomes the new record for this iteration. For each of those records, call the printIt function.
Using ‘map’ is great as you can reshape data in any way you like without much fuss.
Lastly, I want to see only the records that are related to “STORAGE”, so…
-- show only certain records local function enumeratorFilter(x) return x.enumerator == "STORAGE" end each(printIt, filter(enumeratorFilter, drs:devices()))
Here, the ‘filter’ iterator is used. So, again, for each of the records coming from the ‘devices()’ enumerator, call the ‘enumeratorFilter’ function. If this function returns ‘true’ for the record, then it is passed along as the next record for the ‘each’. If ‘false’, then it is skipped, and the next record is tried.
This is pretty powerful, and yet simple stuff. The fact that iterators create new iterators, in tight loops, makes for some very dense and efficient code. If you’re interested in why this is so special in LuaJIT, and not many other languages, read up on the Lua Fun documentation.
I’ve killed two birds with one stone. I have finally gotten to the root of all device iterators. I have also learned how to best write iterators that can be used in a functional programming way. Judicious usage of this mechanism will surely make a lot of my code more compact and readable, as well as highly performant.
Asynchronous DNS lookups on Windows
Posted: January 28, 2014 Filed under: Core Programming, LuaJIT, Microsoft, Network Programming, System Programming, TINN | Tags: async, dns, getaddrinfo, tinn, windows Leave a commentI began this particular journey because I wanted to do DNS lookups asynchronously on Windows. There is of course a function for that:
DnsQueryEx
The problem I ran into is that unlike the various other Windows functions I’ve done with async, this one does not use an IO Completion Port. Instead it uses a mechanism called APC (Asynchronouse Procedure Call). With this little bit of magic, you pass in a pointer to a function which will be called in your thread context, kind of in between when other things are happening. Well, given the runtime environment I’m using, I don’t think this quite works out. Basically, I’d have a function being called leaving the VM in an unknown state.
So, I got to digging. I figured, how hard can it be to make calls to a DNS server directly? After all, it is nothing more than a network based service with a well known protocol. Once I could make a straight networking call, then I could go back to leveraging IO Completion Ports just like I do for all other IO.
You can view the DNS system as nothing more than a database to which you pose queries. You express your queries using some nice well defined protocol, which is ancient in origin, and fairly effective considering how frequently DNS queries are issued. Although I could man up and write the queries from scratch, Windows helps me here by providing functions that will format the query into a buffer for me.
But, before I get into that, what do the queries look like? What am I looking up? Well, a Domain Name Server serves up translations of names to other names and numbers. For example, I need to find the IP address of http://www.bing.com. I can look for CNAMES (an alias), or ‘A’ records (direct to an IP address. This gets esoteric and confusing, so a little code can help:
-- Prepare the DNS request local dwBuffSize = ffi.new("DWORD[1]", 2048); local buff = ffi.new("uint8_t[2048]") local wID = clock:GetCurrentTicks() % 65536; local res = windns_ffi.DnsWriteQuestionToBuffer_UTF8( ffi.cast("DNS_MESSAGE_BUFFER*",buff), dwBuffSize, ffi.cast("char *",strToQuery), wType, wID, true )
DnsWriteQuestionToBuffer_UTF8 is the Windows function which helps me to write a DNS query into a buffer, which will then be send to the actual dns server.
wType, represents the type of record you want to be returned. The values might be something like:
wType = ffi.C.DNS_TYPE_A wType = ffi.C.DNS_TYPE_MX - mail records wType = ffi.C.DNS_TYPE_CNAME
There are about a hundred different types that you can query for. The vast majority of the time though, you either looking for ‘A’, or ‘CNAME’ records.
The wID is just a unique identifier for the particular query so that if you’re issuing several on the same channel, you can check the response to ensure they match up.
OK. Now I have a DNS query stuffed into a buffer, how do I make the query and get the results?
-- Send the request. local IPPORT_DNS = 53; local remoteAddr = sockaddr_in(IPPORT_DNS, AF_INET); remoteAddr.sin_addr.S_addr = ws2_32.inet_addr( "209.244.0.3"); -- create the UDP socket local socket, err = NativeSocket( AF_INET, SOCK_DGRAM, IPPROTO_UDP ); -- send the query local iRes, err = socket:sendTo( ServerAddress, ffi.sizeof(ServerAddress), buff, dwBuffSize[0]);
This little bit of code shows the socket creation, and the actual ‘sendTo’ call. Of note, the “209.244.0.3” represents the IP address of a well known public DNS server. In this case it is hosted by Level 3, which is a internet services provider. There are of course calls you can make to figure out which DNS server your machine is typically configured to use, but this way the query will always work, no matter which network you are on.
Notice the socket is a UDP socket.
At this point, we’re already running cooperatively due to the fact that within TINN, all IO is done cooperatively, without the programmer needing to do much special.
Now to receive the query response back:
-- Try to receive the results local RecvFromAddr = sockaddr_in(); local RecvFromAddrSize = ffi.sizeof(RecvFromAddr); local cbReceived, err = self.Socket:receiveFrom(RecvFromAddr, RecvFromAddrSize, buff, 2048);
Basically just wait for the server to send back a response. Of course, like the sendTo, the receiveFrom works cooperatively, so that if the developer issues several ‘spawn’ commands, each query could be running in its own task, working cooperatively.
Once you have the response, you can parse out the results. The results come back as a set of records. There are of course functions which will help you to parse these records out. The key here is that the record type is indicated, and its up to the developer to pull out the relevant information.
The complete DNSNameServer class is here:
local ffi = require("ffi") local Application = require("Application") local windns_ffi = require("windns_ffi") local NativeSocket = require("NativeSocket") local ws2_32 = require("ws2_32") local Stopwatch = require("StopWatch") local clock = Stopwatch(); -- DNS UDP port local IPPORT_DNS = 53; local DNSNameServer = {} setmetatable(DNSNameServer, { __call = function(self, ...) return self:create(...) end, }) local DNSNameServer_mt = { __index = DNSNameServer, } function DNSNameServer.init(self, serveraddress) local socket, err = NativeSocket( AF_INET, SOCK_DGRAM, IPPROTO_UDP ); if not socket then return nil, err end local obj = { Socket = socket, ServerAddress = serveraddress, } setmetatable(obj, DNSNameServer_mt) return obj; end function DNSNameServer.create(self, servername) local remoteAddr = sockaddr_in(IPPORT_DNS, AF_INET); remoteAddr.sin_addr.S_addr = ws2_32.inet_addr( servername ); return self:init(remoteAddr) end -- Construct DNS_TYPE_A request, send it to the specified DNS server, wait for the reply. function DNSNameServer.Query(self, strToQuery, wType, msTimeout) wType = wType or ffi.C.DNS_TYPE_A msTimeout = msTimeout or 60 * 1000 -- 1 minute -- Prepare the DNS request local dwBuffSize = ffi.new("DWORD[1]", 2048); local buff = ffi.new("uint8_t[2048]") local wID = clock:GetCurrentTicks() % 65536; local res = windns_ffi.DnsWriteQuestionToBuffer_UTF8( ffi.cast("DNS_MESSAGE_BUFFER*",buff), dwBuffSize, ffi.cast("char *",strToQuery), wType, wID, true ) if res == 0 then return false, "DnsWriteQuestionToBuffer_UTF8 failed." end -- Send the request. local iRes, err = self.Socket:sendTo(self.ServerAddress, ffi.sizeof(self.ServerAddress), buff, dwBuffSize[0]); if (not iRes) then print("Error sending data: ", err) return false, err end -- Try to receive the results local RecvFromAddr = sockaddr_in(); local RecvFromAddrSize = ffi.sizeof(RecvFromAddr); local cbReceived, err = self.Socket:receiveFrom(RecvFromAddr, RecvFromAddrSize, buff, 2048); if not cbReceived then print("Error Receiving Data: ", err) return false, err; end if( 0 == cbReceived ) then return false, "Nothing received" end -- Parse the DNS response received with DNS API local pDnsResponseBuff = ffi.cast("DNS_MESSAGE_BUFFER*", buff); windns_ffi.DNS_BYTE_FLIP_HEADER_COUNTS ( pDnsResponseBuff.MessageHead ); if pDnsResponseBuff.MessageHead.Xid ~= wID then return false, "wrong transaction ID" end local pRecord = ffi.new("DNS_RECORD *[1]",nil); iRes = windns_ffi.DnsExtractRecordsFromMessage_W( pDnsResponseBuff, cbReceived, pRecord ); pRecord = pRecord[0]; local pRecordA = ffi.cast("DNS_RECORD *", pRecord); local function closure() if pRecordA == nil then return nil; end if pRecordA.wType == wType then local retVal = pRecordA pRecordA = pRecordA.pNext return retVal; end -- Find the next record of the specified type repeat pRecordA = pRecordA.pNext; until pRecordA == nil or pRecordA.wType == wType if pRecordA ~= nil then local retVal = pRecordA pRecordA = pRecordA.pNext return retVal; end -- Free the resources if pRecord ~= nil then windns_ffi.DnsRecordListFree( pRecord, ffi.C.DnsFreeRecordList ); end return nil end return closure end function DNSNameServer.A(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_A) end function DNSNameServer.MX(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_MX) end function DNSNameServer.CNAME(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_CNAME) end function DNSNameServer.SRV(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_SRV) end return DNSNameServer
Notice at the end there are some convenience functions for a few of the well known DNS record types. The ‘Query()’ function is generic, and will return records of any type. These convenience functions just make it easier.
And how to use it?
local ffi = require("ffi") local DNSNameServer = require("DNSNameServer") local core_string = require("core_string_l1_1_0") --local serveraddress = "10.211.55.1" -- xfinity local serveraddress = "209.244.0.3" -- level 3 local domains = { "www.nanotechstyles.com", "www.adafruit.com", "adafruit.com", "adamation.com", "www.adamation.com", "microsoft.com", "google.com", "ibm.com", "oracle.com", "sparkfun.com", "apple.com", "netflix.com", "www.netflix.com", "www.us-west-2.netflix.com", "www.us-west-2.prodaa.netflix.com", "news.com", "hardkernel.org", "amazon.com", "walmart.com", "target.com", "godaddy.com", "luajit.org", } local function queryA() local function queryDomain(name) local dns = DNSNameServer(serveraddress) -- ms corporate print("==== DNS A ====> ", name) for record in dns:A(name) do local a = IN_ADDR(); a.S_addr = record.Data.A.IpAddress print(string.format("name: %s\tIP: %s, TTL %d", name, a, record.dwTtl)); end end for _, name in ipairs(domains) do spawn(queryDomain, name) --queryDomain(name) end end local function queryCNAME() local dns = DNSNameServer(serveraddress) -- ms corporate local function queryDomain(name) print("==== DNS CNAME ====> ", name) for record in dns:CNAME(name) do print(core_string.toAnsi(record.pName), core_string.toAnsi(record.Data.CNAME.pNameHost)) end end for _, name in ipairs(domains) do queryDomain(name) end end local function queryMX() local function queryDomain(name) local dns = DNSNameServer(serveraddress) -- ms corporate print("==== DNS MX ====> ", name) for record in dns:MX(name) do print(core_string.toAnsi(record.pName), core_string.toAnsi(record.Data["MX"].pNameExchange)) end end for _, name in ipairs(domains) do spawn(queryDomain, name) end end local function querySRV() local dns = DNSNameServer(serveraddress) -- ms corporate for _, name in ipairs(domains) do print("==== DNS SRV ====> ", name) for record in dns:SRV(name) do print(core_string.toAnsi(record.pName), core_string.toAnsi(record.Data.SRV.pNameTarget)) end end end local function main() queryA(); --queryCNAME(); --queryMX(); --querySRV(); end run(main)
The function queryA() will query for the ‘A’ records, and print them out. Notice that it has knowledge of the giant union structure that contains the results, and it pulls out the specific information for ‘A’ records. It will create a new instance of the DNSNameServer for each query. That’s not as bad as it might seem. All it amounts to is creating a new UDP socket for each query. Since each query is spawned into its own task, they are all free to run and complete independently, which was the goal of this little exercise.
In the case of the CNAME query, there is only a single socket, and it is used repeatedly, serially, for each query.
The difference between the two styles is noticeable. For the serial case, the queries might ‘take a while’, because you have to wait for each result to come back before issuing the next query. In the cooperative case, you issue several queries in parallel, so the total time will only be as long as the longest query.
That’s a good outcome.
I like this style of programming. You go as low as you can to root out where the system might otherwise block, and you make that part cooperative. That way everything else above it is automatically cooperative. I also like the fact that it feels like I’m getting some parallelism, but I’m not using any of the typical primitives of parallelism, including actual threads, mutexes, and the like.
Well, that’s a hefty bit of code, and it serves the purpose I set out, so I’m a happy camper. Now, if I could just turn those unions into tables automatically…
All the pretty little asyncs…
Posted: January 24, 2014 Filed under: Core Programming, Lua, LuaJIT, Microsoft, System Programming, TINN | Tags: async, deviceiocontrol, microsoft, tinn, windows Leave a commentI have gone on about various forms of async for quite some time now. So could there possibly be more? Well, yes of course!
Here’s the scenario I want to enable. I want to keep track of my file system activity, sending the various operations to a distant storage facility. I want to do this while a UI displays what’s going on, and I want to be able to configure things while its happening, like which events I really care to shadow, and the like.
I don’t want to use multiple OS level threads if I can at all avoid them as they will complicate my programming tremendously. So, what to do.
Well, first I’ll start with the file tracking business. I have talked about change journals in the past. This is a basic mechanism that Windows has to track changes to the file system. Every single open, close, delete, write, etc, has an entry in this journal. If you’re writing a backup program, you’ll be using change journals.
The essence of the change journal functionality is usage of the DeviceIoControl() function. Most of us are very familiar with the likes of CreateFile(), ReadFile(), WriteFile(), CloseHandle(), when it comes to dealing with files. But, for everything else, there is this DeviceIOControl() function.
What is a device? Well, you’d be surprised to learn that most things in the Windows OS are represented by ‘devices’ just like they are in UNIX systems. For example, ‘C:’, is a device. But, also “DISPLAY1” is also a device, as are “LCD” and “PhysicalDisk0”. When it comes to controlling devices, the Win32 level API calls will ultimately make DeviceIoControl() calls with various parameters. That’s great to know as it allows you to create whatever API you want, as long as you know the nuances of the device driver you’re trying to control.
But, I digress. The key point here is that I can open up a device, and I can make a DeviceIoControl() call, and true to form, I can use OVERLAPPED structures, and IO Completion Ports. That makes these calls “async”, or with TINN, cooperative.
To wrap it up in a tidy little bow, here is a Device class which does the grunt work for me:
--[[ References http://msdn.microsoft.com/en-us/magazine/cc163415.aspx --]] local ffi = require("ffi") local bit = require("bit") local bor = bit.bor; local core_file = require("core_file_l1_2_0"); local core_io = require("core_io_l1_1_1"); local Application = require("Application") local IOOps = require("IOOps") local FsHandles = require("FsHandles"); local errorhandling = require("core_errorhandling_l1_1_1"); local WinBase = require("WinBase"); local Device = {} setmetatable(Device, { __call = function(self, ...) return self:open(...) end, }) local Device_mt = { __index = Device, } function Device.init(self, rawhandle) local obj = { Handle = FsHandles.FsHandle(rawhandle) } setmetatable(obj, Device_mt) Application:watchForIO(rawhandle, rawhandle) return obj; end function Device.open(self, devicename, dwDesiredAccess, dwShareMode) local lpFileName = string.format("\\\\.\\%s", devicename); dwDesiredAccess = dwDesiredAccess or bor(ffi.C.GENERIC_READ, ffi.C.GENERIC_WRITE); dwShareMode = bor(FILE_SHARE_READ, FILE_SHARE_WRITE); local lpSecurityAttributes = nil; local dwCreationDisposition = OPEN_EXISTING; local dwFlagsAndAttributes = FILE_FLAG_OVERLAPPED; local hTemplateFile = nil; local handle = core_file.CreateFileA( lpFileName, dwDesiredAccess, dwShareMode, lpSecurityAttributes, dwCreationDisposition, dwFlagsAndAttributes, hTemplateFile); if handle == INVALID_HANDLE_VALUE then return nil, errorhandling.GetLastError(); end return self:init(handle) end function Device.getNativeHandle(self) return self.Handle.Handle; end function Device.createOverlapped(self, buff, bufflen) local obj = ffi.new("FileOverlapped"); obj.file = self:getNativeHandle(); obj.OVL.Buffer = buff; obj.OVL.BufferLength = bufflen; return obj; end function Device.control(self, dwIoControlCode, lpInBuffer, nInBufferSize, lpOutBuffer, nOutBufferSize) local lpBytesReturned = nil; local lpOverlapped = self:createOverlapped(ffi.cast("void *", lpInBuffer), nInBufferSize); local status = core_io.DeviceIoControl(self:getNativeHandle(), dwIoControlCode, ffi.cast("void *", lpInBuffer), nInBufferSize, lpOutBuffer, nOutBufferSize, lpBytesReturned, ffi.cast("OVERLAPPED *",lpOverlapped)); local err = errorhandling.GetLastError(); -- Error conditions -- status == 1, err == WAIT_TIMEOUT (258) -- status == 0, err == ERROR_IO_PENDING (997) -- status == 0, err == something else if status == 0 then if err ~= ERROR_IO_PENDING then return false, err end end local key, bytes, ovl = Application:waitForIO(self, lpOverlapped); return bytes; end return Device
I’ve shown this kind of construct before with the NativeFile object. That object contains Read, and Write functions as well, but lacks the control() function. Of course the two could be combined for maximum benefit.
How to use this thing?
dev = Device("c:") dev:control(...)
OK, that’s out of the way. Now, what about this change journal thing? Very simple now that the device is handled.
A change journal can look like this:
-- USNJournal.lua -- References -- http://msdn.microsoft.com/en-us/library/windows/desktop/aa364563(v=vs.85).aspx -- http://www.microsoft.com/msj/0999/journal/journal.aspx -- http://www.microsoft.com/msj/1099/journal2/journal2.aspx -- local ffi = require("ffi"); local bit = require("bit"); local bor = bit.bor; local band = bit.band; local core_io = require("core_io_l1_1_1"); local core_file = require("core_file_l1_2_0"); local WinIoCtl = require("WinIoCtl"); local WinBase = require("WinBase"); local errorhandling = require("core_errorhandling_l1_1_1"); local FsHandles = require("FsHandles"); local Device = require("Device") --[[ ChangeJournal An abstraction for NTFS Change journal management --]] local ChangeJournal = {} setmetatable(ChangeJournal, { __call = function(self, ...) return self:open(...); end, }); local ChangeJournal_mt = { __index = ChangeJournal; } ChangeJournal.init = function(self, device) local obj = { Device = device; } setmetatable(obj, ChangeJournal_mt); local jinfo, err = obj:getJournalInfo(); print("ChangeJournal.init, jinfo: ", jinfo, err) if jinfo then obj.JournalID = jinfo.UsnJournalID; obj.LowestUsn = jinfo.LowestValidUsn; obj.FirstUsn = jinfo.FirstUsn; obj.MaxSize = jinfo.MaximumSize; obj.MaxUsn = jinfo.MaxUsn; obj.AllocationSize = jinfo.AllocationDelta; end return obj; end ChangeJournal.open = function(self, driveLetter) local device, err = Device(driveLetter) if not device then print("ChangeJournal.open, ERROR: ", err) return nil, err end return self:init(device); end ChangeJournal.getNextUsn = function(self) local jinfo, err = self:getJournalInfo(); if not jinfo then return false, err; end return jinfo.NextUsn; end ChangeJournal.getJournalInfo = function(self) local dwIoControlCode = FSCTL_QUERY_USN_JOURNAL; local lpInBuffer = nil; local nInBufferSize = 0; local lpOutBuffer = ffi.new("USN_JOURNAL_DATA"); local nOutBufferSize = ffi.sizeof(lpOutBuffer); local success, err = self.Device:control(dwIoControlCode, lpInBuffer, nInBufferSize, lpOutBuffer, nOutBufferSize); if not success then return false, errorhandling.GetLastError(); end return lpOutBuffer; end function ChangeJournal.waitForNextEntry(self, usn, ReasonMask) usn = usn or self:getNextUsn(); local ReasonMask = ReasonMask or 0xFFFFFFFF; local ReturnOnlyOnClose = false; local Timeout = 0; local BytesToWaitFor = 1; local ReadData = ffi.new("READ_USN_JOURNAL_DATA", {usn, ReasonMask, ReturnOnlyOnClose, Timeout, BytesToWaitFor, self.JournalID}); local pusn = ffi.new("USN"); -- This function does not return until the USN -- record exits local BUF_LEN = ffi.C.USN_PAGE_SIZE; local Buffer = ffi.new("uint8_t[?]", BUF_LEN); local dwBytes = ffi.new("DWORD[1]"); local success, err = self.Device:control(FSCTL_READ_USN_JOURNAL, ReadData, ffi.sizeof(ReadData), Buffer, BUF_LEN); if not success then return false, err end local UsnRecord = ffi.cast("PUSN_RECORD", ffi.cast("PUCHAR",Buffer) + ffi.sizeof("USN")); return UsnRecord; end return ChangeJournal;
This very much looks like the change journal I created a few months back. The primary difference is the device control stuff is abstracted out into the Device object, so it does not need to be repeated here.
When we want to track the changes to the device, we make repeated calls to ‘waitForNextEntry’.
local function test_waitForNextEntry(journal) local entry = journal:waitForNextEntry(); while entry do printJournalEntry(entry); entry = journal:waitForNextEntry(); end end
This is your typical serially written code. There’s nothing that look special about it, no hint of async processing. Behind the covers though, way back in the Device:control() function, the actual sending of a command to the device happens using IO Completion Port, so if you’re running with TINN, this particular task will ‘waitForIO’, and other tasks can continue.
So, using it in context looks like this:
local function main() local journal, err = ChangeJournal("c:") spawn(test_waitForNextEntry, journal); periodic(function() print("tick") end, 1000) end run(main)
In this case, I spawn the journal waiting/printing thing as its own task. Then I setup a periodic timer to simply print “tick” every second, to show there is some activity.
Since the journaling is cooperative (mostly waiting for io control to complete), the periodic timer, or UI processing, or what have you, is free to run, without any hindrance.
Combine this with the already cooperative UI stuff, and you can imagine how easy it could be to construct the scenario I set out to construct. Since all networking and file system operations in TINN are automatically async, it would be easy to log these values, or send them across a network to be stored for later analysis or what have you.
And there you have it. Async everywhere makes for some very nice scenarios. Being able to do async on any device, whether with standard read/write operations, or io control, makes for very flexible programming.
Next time around, I’m going to show how to do async DNS queries for fun and profit.
TINN Version 0.7 Available
Posted: January 12, 2014 Filed under: Computicle, Core Programming, Lua, LuaJIT, System Programming, TINN | Tags: computicle, lua, luajit, tinn 4 CommentsAlthough TINN is under constant development, there’s nothing like declaring a new “release”. It’s been 3 months since the 0.6 release. So, now there is a 0.7 release. You can read about the TINN v0.7 Release and install it if you would like.
There were 84 commits since the previous release, so I can’t even remember all the changes that were involved. The major addition from my most recent work has to do with the new scheduler as described on this blog. That’s the extensible, plug-in driven scheduler. Pretty nifty for my work at least.
There are quite a few additions, such as a revamped stream object, io completion port supported file interface, a logfile thing, and quite a few more interfaces from the OS.
Other items I have been working on include support for various COM interfaces such as DXGI, DXVA, MMDevice and some others. There are all in the “experimental” folder if you look at the enlistment. They are not quite ready for prime time, so they’re not actually in the v0.7 release.
What can you do with TINN? In short, you can create all sorts of Windows based applications. Everything from scalable web services to interactive multi-tasking UI (including OpenGL based).
TINN is a command line tool (tinn.exe). As such, the easiest thing to do is bring up a command line shell and run various scripts through tinn.
c:\> tinn.exe hello.lua
The TINN repository contains numerous test cases that utilize the various modules of TINN.
That’s it for now. Next release will be about cleanup and stabilization primarily.
Parallel Conversations
Posted: January 8, 2014 Filed under: Core Programming, Lua, LuaJIT, System Programming, TINN | Tags: event, lua, luajit, multi tasking, signal 1 CommentI have been tinkering with words of late. I’ve added ‘waitFor’, ‘when’ and ‘whenever’ to the TINN lexicon, and my programming is becoming easier, more organized, and easier to describe.
Recently I added another set of words: ‘waitSignal’, ‘signalOne’, and ‘signalAll’. If you were using another system, you might call these ‘events’, but really they’re just signaling.
Here’s how I might use them:
local Application = require("Application")(true) local function waiter(num) num = num or 0 local function closure() print(string.format("WAITED: %d",num)) end return closure; end local function main() for i=1,4 do onSignal(waiter(i), "waiting") end -- signal only one of them signalOne("waiting") -- sleep a bit giving other tasks -- a chance to run sleep(500) -- signal the rest of them signalAll("waiting")) sleep(2000) print("SLEEP AFTER signalAll") end run(main)
First, it spawns 4 tasks which will all be waiting on the ‘waiting’ signal. With ‘signalOne’, only 1 of the 4 waiting tasks will be awakened and continue execution. With the ‘signalAll’, the rest of the waiting tasks are all scheduled to run, and thus continue from where they left of.
The core primitive for signaling is the ‘waitSignal()’ function. This will essentially suspend the currently running task until the specified signal has been given. The ‘onSignal()’ function is a convenience which will spawn a separate task which will do the waiting and execute the specified function at the appropriate time.
If you were using another language, these might be ‘onEvent()’ and ’emit()’. Of course it’s really easy to code up that way, just create a name alias to whatever you like. You could even do:
local function on(eventName, func) return onSignal(func, eventName) end
So, these are some more words that are in the parallel programming arsenal. In totality, the set is:
Time related
- sleep
- delay
- periodic
Predicates
- waitFor
- when
- whenever
Signaling
- onSignal
- signalOne
- signalAll
- waitSignal
Task and scheduler
- run
- stop
- spawn
- yield
With these words in hand, doing my multi-task programming is becoming easier by the moment. I don’t worry about the lower level multi-tasking primitive such as mutexes, locks, and the like. I’m not really that concerned with timers, thread pools, or any of the other machinery that makes this all happen. I just write my simple code.
One thing has struck me though. This is all great for a single threaded environment. Basically collaborative multi-tasking. In this age of super fast CPUs, it turns out that going single threaded is just fine. In many cases it’s actually better than going preemptive multi-threaded because you don’t necessarily incur as much OS thread level context switching.
I’d like to go one step further though. A long time ago we didn’t have L1/L2 caches on the CPUs, then they came along, got better, and now are fairly large. As a programmer, my concern for them is fairly minimal. I don’t do much to ensure that the caches are used properly. I just write my code, and either the compiler, or the CPU itself deals with all the mechanics. Same goes with prefetching instructions and the like. I don’t write my code to make that job any easier, it just happens automagically.
Here I am contemplating the same thing about threads. As my CPU has increasingly more cores, I am wondering if the CPU cycles should be viewed the same as we viewed levels of memory in the past. Is the a C1/C2 level of core execution. That is, most things should stay on the same thread because switching to a different core is costly. Something very low level should decide on when that switching occurs.
My application/runtime can know a lot about what I’m going to do because the thing is all laid out in script. After some analysis, perhaps I can decide “this thing is going to sleep for more than 500 ms, therefore I’m going to actually put it over here on this other thread which is actually sleeping.”. In that way, I can reduce my resource usage because I won’t be polling the timer every time through the scheduling loop, I can be smarter.
If I had such smartness, then when I have 16 – 100 cores, I’ll be able to easily take advantage of those multiple cores without having to truly understand and navigate that madness on my own. Let the machine figure it out, it’s much better at that than I am.
In the meanwhile, I’ve got these multi-tasking primitives which are making my life a lot easier when it comes to programming fairly complex systems.
Lua Coroutine Roundup
Posted: January 3, 2014 Filed under: Core Programming, Lua, LuaJIT, System Programming, TINN | Tags: coroutine, lua, luajit, scheduler Leave a commentMy WordPress dashboard says this is my 250th post. I figure what better way to mark the occasion than to do a little bit of a roundup on one of my favorite topics.
One of the most awesome features of the Lua language is a built in co-routine mechanism. I’ve talked about co-routines quite a bit, doing a little series on the basics of coroutines, all the way up through a scheduler for multi-tasking.
Coroutines give the programmer the illusion of creating a parallel processing environment. Basically, write up your code “spawn” different coroutines, and then just forget about them… Well, almost. The one thing the Lua language does not give you is a way to manage those coroutines. Co-routines are a mechanism by which ‘threads’ can do cooperative multi-tasking. This is different from the OS level preemptive multitasking that you get with ‘threads’ on most OS libraries. So, I can create coroutines, resume, and yield. It’s the ‘yield’ that gives coroutines the multi-tasking flavor. One co-routine yields, and another co-routine must be told to ‘resume’. Something has to do the telling, and keep track of who’s yielded, and who needs to be resumed. In steps the scheduler.
Summaries from my own archives
Lua Coroutines – Getting Started
Hurry Up and Wait – TINN Timing
Computicles – A tale of two schedulers
Parallel Computing is Child’s Play
Multitasking single threaded UI – Gaming meets networking
Alertable Predicates? What’s that?
Pinging Peers – Creating Network Meshes
From the Interwebs
GitHub Gist: Deco / coroutine_scheduler.lua
Beginner’s Guide to Coroutines – Roblox Wiki
And of course the search engines will show you thousands more:
I think it would be really great if the Lua language itself had some form of rudimentary scheduler, but scheduling is such a domain specific thing, and the rudiments are so basic, that it’s possibly not worth providing such support. The Go language supports a basic scheduler, and has coroutine support, to great effect I think.
The simple scheduler is simple, just do a round robbin execution of one task after the next as each task yields. I wrote a simple dispatcher which started with exactly this. Then interesting things start to happen. You’d like to have your tasks go into a wait state if they’re doing an IO operation, and not be scheduled again until some IO has occurred. Or, you want to put a thread to sleep for some amount of time. Or, you want a thread to wait on a condition of one sort or another, or signals, or something else. Pretty soon, you basic scheduler is hundreds of lines of code, and things start to get rather complicated.
After going round and round in this problem space, I finally came up with a solution that I think has some legs. Rather than a complex scheduler, I promote a simple scheduler, and add functions to it through normal thread support. So, here is my final ‘scheduler’ for 2013:
local ffi = require("ffi"); local Collections = require("Collections"); local StopWatch = require("StopWatch"); --[[ The Scheduler supports a collaborative processing environment. As such, it manages multiple tasks which are represented by Lua coroutines. --]] local Scheduler = {} setmetatable(Scheduler, { __call = function(self, ...) return self:create(...) end, }) local Scheduler_mt = { __index = Scheduler, } function Scheduler.init(self, scheduler) local obj = { Clock = StopWatch(); TasksReadyToRun = Collections.Queue(); } setmetatable(obj, Scheduler_mt) return obj; end function Scheduler.create(self, ...) return self:init(...) end --[[ Instance Methods --]] function Scheduler.tasksArePending(self) return self.TasksReadyToRun:Len() > 0 end function Scheduler.tasksPending(self) return self.TasksReadyToRun:Len(); end function Scheduler.getClock(self) return self.Clock; end --[[ Task Handling --]] function Scheduler.scheduleTask(self, afiber, ...) if not afiber then return false, "no fiber specified" end afiber:setParams(...); self.TasksReadyToRun:Enqueue(afiber); afiber.state = "readytorun" return afiber; end function Scheduler.removeFiber(self, fiber) --print("DROPPING DEAD FIBER: ", fiber); return true; end function Scheduler.inMainFiber(self) return coroutine.running() == nil; end function Scheduler.getCurrentFiber(self) return self.CurrentFiber; end function Scheduler.step(self) -- Now check the regular fibers local task = self.TasksReadyToRun:Dequeue() -- If no fiber in ready queue, then just return if task == nil then return true end if task:getStatus() == "dead" then self:removeFiber(task) return true; end -- If the task we pulled off the active list is -- not dead, then perhaps it is suspended. If that's true -- then it needs to drop out of the active list. -- We assume that some other part of the system is responsible for -- keeping track of the task, and rescheduling it when appropriate. if task.state == "suspended" then return true; end -- If we have gotten this far, then the task truly is ready to -- run, and it should be set as the currentFiber, and its coroutine -- is resumed. self.CurrentFiber = task; local results = {task:resume()}; -- no task is currently executing self.CurrentFiber = nil; -- once we get results back from the resume, one -- of two things could have happened. -- 1) The routine exited normally -- 2) The routine yielded -- -- In both cases, we parse out the results of the resume -- into a success indicator and the rest of the values returned -- from the routine local success = results[1]; table.remove(results,1); --print("SUCCESS: ", success); if not success then print("RESUME ERROR") print(unpack(results)); end -- Again, check to see if the task is dead after -- the most recent resume. If it's dead, then don't -- bother putting it back into the readytorun queue -- just remove the task from the list of tasks if task:getStatus() == "dead" then --print("Scheduler, DEAD coroutine, removing") self:removeFiber(task) return true; end -- The only way the task will get back onto the readylist -- is if it's state is 'readytorun', otherwise, it will -- stay out of the readytorun list. if task.state == "readytorun" then self:scheduleTask(task, results); end end --[[ Primary Interfaces --]] function Scheduler.suspend(self, aTask) if not aTask then self.CurrentFiber.state = "suspended" return self:yield() end aTask.state = "suspended"; return true end function Scheduler.yield(self, ...) return coroutine.yield(...); end --[[ Running the scheduler itself --]] function Scheduler.start(self) if self.ContinueRunning then return false, "scheduler is already running" end self.ContinueRunning = true; while self.ContinueRunning do self:step(); end --print("FINISHED STEP ITERATION") end function Scheduler.stop(self) self.ContinueRunning = false; end return Scheduler
At 195 LOC, this is back down to the basic size of the original simple dispatcher that I started with. There are only a few functions to this interface:
- scheduleTask()
- step()
- suspend()
- yield()
- start()
These are the basics to do some scheduling of tasks. There is an assumption that there is an object that can be scheduled, such that ‘scheduleTask’ can attach some metadata, and the like. Other than that, there’s not much here but the basic round robin scheduler.
So, how do you make a more complex scheduler, say one that deals with time? Well, first of all, I’ve also added an Application object, which actually contains a scheduler, and supports various other things related to an application, including add ons to the scheduler. Here’s an excerpt of the Application object construction.
function Application.init(self, ...) local sched = Scheduler(); waitForIO.MessageQuanta = 0; local obj = { Clock = Stopwatch(); Scheduler = sched; TaskID = 0; wfc = waitForCondition(sched); wft = waitForTime(sched); wfio = waitForIO(sched); } setmetatable(obj, Application_mt) -- Create a task for each add-on obj:spawn(obj.wfc.start, obj.wfc) obj:spawn(obj.wft.start, obj.wft) obj:spawn(obj.wfio.start, obj.wfio) return obj; end
In the init(), the Application object creates instances for the add ons to the scheduler. These add-ons (waitForCondition, waitForTime, waitForIO) don’t have to adhere to much of an interface, but you do need some function which is callable so that it can be spawned. The Application construction ends with the spawning of the three add-ons as normal threads in the scheduler. Yep, that’s right, the add ons are just like any other thread, running in the scheduler. You might think; But these things need to run with a higher priority than regular threads! And yes, in some cases they do need that. But hay, that’s an exercise for the scheduler. If the core scheduler gains a ‘priority’ mechanism, then these threads can be run with a higher priority, and everything is good.
And what do these add ons look like? I’ve gone over the before, but here’s an example of the timing one:
local Functor = require("Functor") local tabutils = require("tabutils"); local waitForTime = {} setmetatable(waitForTime, { __call = function(self, ...) return self:create(...) end, }) local waitForTime_mt = { __index = waitForTime; } function waitForTime.init(self, scheduler) local obj = { Scheduler = scheduler; TasksWaitingForTime = {}; } setmetatable(obj, waitForTime_mt) --scheduler:addQuantaStep(Functor(obj.step,obj)); --scheduler:spawn(obj.step, obj) return obj; end function waitForTime.create(self, scheduler) if not scheduler then return nil, "no scheduler specified" end return self:init(scheduler) end function waitForTime.tasksArePending(self) return #self.TasksWaitingForTime > 0 end function waitForTime.tasksPending(self) return #self.TasksWaitingForTime; end local function compareTaskDueTime(task1, task2) if task1.DueTime < task2.DueTime then return true end return false; end function waitForTime.yieldUntilTime(self, atime) --print("waitForTime.yieldUntilTime: ", atime, self.Scheduler.Clock:Milliseconds()) --print("Current Fiber: ", self.CurrentFiber) local currentFiber = self.Scheduler:getCurrentFiber(); if currentFiber == nil then return false, "not currently in a running task" end currentFiber.DueTime = atime; tabutils.binsert(self.TasksWaitingForTime, currentFiber, compareTaskDueTime); return self.Scheduler:suspend() end function waitForTime.yield(self, millis) --print('waitForTime.yield, CLOCK: ', self.Scheduler.Clock) local nextTime = self.Scheduler.Clock:Milliseconds() + millis; return self:yieldUntilTime(nextTime); end function waitForTime.step(self) --print("waitForTime.step, CLOCK: ", self.Scheduler.Clock); local currentTime = self.Scheduler.Clock:Milliseconds(); -- traverse through the fibers that are waiting -- on time local nAwaiting = #self.TasksWaitingForTime; --print("Timer Events Waiting: ", nAwaiting) for i=1,nAwaiting do local fiber = self.TasksWaitingForTime[1]; if fiber.DueTime <= currentTime then --print("ACTIVATE: ", fiber.DueTime, currentTime); -- put it back into circulation -- preferably at the front of the list fiber.DueTime = 0; --print("waitForTime.step: ", self) self.Scheduler:scheduleTask(fiber); -- Remove the fiber from the list of fibers that are -- waiting on time table.remove(self.TasksWaitingForTime, 1); end end end function waitForTime.start(self) while true do self:step(); self.Scheduler:yield(); end end return waitForTime
The ‘start()’ function is what was scheduled with the scheduler, so that’s the thread that’s actually running. As you can see, it’s just a normal cooperative task.
The real action comes from the ‘yield()’ and ‘step()’ functions. The ‘yield()’s purpose in life is to suspend the current task (take it out of the active running list of the scheduler), and setup the appropriate stuff such that it can be activated later. The purpose of the “step()” function is to go through the list of tasks which are currently suspended, and determine which ones need to be scheduled again, because their time signal has been met.
And that’s how you add items such as “sleep()” to the system. You don’t change the core scheduler, but rather, you add a small module which knows how to deal with timed events, and that’s that.
The application object comes along and provides some convenience methods that can easily be turned into globals, so that you can easily type things like:
delay(function() print("Hello, World!") end, 5000) -- print "Hello, World! after 5 seconds periodic(function() print("Hi There!") end, 1000) -- print "Hi, There!" every second
And there you have it. The core scheduler is a very simple thing. Just a ready list, and some algorithm that determines which one of the tasks in the list gets to run next. This can be as complex as necessary, or stay very simple.
Using an add on mechanism, the overall system can take on more robust scenarios, without increasing the complexity of the individual parts.
This simplified design satisfies all the scenario needs I currently have from high throughput web server to multi-tasking UI with network interaction. I will improve the scheduling algorithm, and probably make that a pluggable function for ease.
Lua coroutines are an awesome feature to have in a scripting language. With a little bit of work, a handy scheduler makes multi-task programming a mindless exercise, and that’s a good thing.
Happy New Year!