Fast Apps, Microsoft Style


That’s what I exclaimed at least a couple of times this morning as I sat at a table in a makeshift “team room” in building 43 at Microsoft’s Redmond campus. What was the exclamation for? Well, over the past 3 months, I’ve been working on a quick strike project with a new team, and today we finally announced our “Public Preview“.  Or, if you want to get right to the product: Cloud App Discovery

I’m not a PM or marketing type, so it’s best to go and read the announcement for yourself if you want to get the official spiel on the project.  Here, I want to write a bit about the experience of coming up with a project, in short order, in the new Microsoft.

It all started back in January for me.  I was just coming off another project, and casting about for the next hardest ‘mission impossible’ to jump on.  I had a brief conversation with a dev manager who posed the question; “Is it possible to reestablish the ‘perimeter’ for IT guys in this world of cloud computing”?  An intriguing question.  The basic problem was, if you go to a lot of IT guys, they can barely tell you how many of the people within their corporation are using, let alone DropBox from a cafe in Singapore.  Forget the notion of even trying to control such access.  The corporate ‘firewall’ is almost nothing more than a quartz space heater at this point, preventing very little, and knowing about even less.

So, with that question in mind, we laid out 3 phases of development.  Actually, they were already laid out before I joined the party (by a couple of weeks), so I just heard the pitch.  It was simple, the first phase of development is to see if we can capture network traffic, using various means, and project it up to the  cloud where we could use some machine learning to give an admin a view of what’s going on.

Conveniently sidestepping any objections actual employees might have with this notion, I got to thinking on how it could be done.

For my part, we wanted to have something sitting on the client machine (a windows machine that the user is using), which will inspect all network traffic coming and going, and generate some reports to be sent up to the cloud.  Keep in mind, this is all consented activity, the employee gets to opt in to being monitored in this way.  All in the open and up front.

At the lowest level, my first inclination was to use a raw socket to create a packet sniffer, but Windows has a much better solution these days, built for exactly this purpose.  The Windows Filter Platform, allows you to create a ‘filter’ which you can configure to callout to a function whenever there is traffic.  My close teammate implemented that piece, and suddenly we had a handle on network packets.

We fairly quickly decided on an interface between that low level packet sniffing, and the higher level processor.  It’s as easy as this:


int WriteBytes(char *buff, int bufflen);
int ReadBytes(char *buff, int bufflen, int &bytesRead);

I’m paraphrasing a bit, but it really is that simple. What’s it do? Well, the fairly raw network packets are sent into ‘WriteBytes’, some processing is done, and a ‘report’ becomes available through ‘ReadBytes’. The reports are a JSON formatted string which then gets turned into the appropriate thing to be sent up to the cloud.

The time it took from hearing about the basic product idea, to a prototype of this thing was about 3 weeks.

What do I do once I get network packets? Well, the network packets represent a multiplexed stream of packets, as if I were a NIC. All incoming, outgoing, all TCP ports. Once I receive some bytes, I have to turn it back into individual streams, then start doing some ‘parsing’. Right now we handle http and TLS. For http, I do full http parsing, separating out headers, reading bodies, and the like. I did that by leveraging the http parsing work I had done for TINN already. I used C++ in this case, but it’s all relatively the same.

TLS is a different story. At this ‘discovery’ phase, it was more about simple parsing. So, reading the record layer, decoding client_hello and server_hello, certificate, and the like. This gave me a chance to implement TLS processing using C++ instead of Lua. One of the core components that I leveraged was the byte order aware streams that I had developed for TINN. That really is the crux of most network protocol handling. If you can make herds or tails of what the various RFCs are saying, it usually comes down to doing some simple serialization, but getting the byte ordering is the hardest part. 24-bit big endian integers?

At any rate, http parsing, fairly quick. TLS client_hello, fast enough, although properly handling the extensions took a bit of time. At this point, we’d be a couple months in, and our first partners get to start kicking the tires.

For such a project, it’s very critical that real world customers are involved really early, almost sitting in our design meetings. They course corrected us, and told us what was truly important and annoying about what we were doing, right from day one.

From the feedback, it becomes clear that getting more information, like the amount of traffic flowing through the pipes is as interesting as the meta information, so getting the full support for flows becomes a higher priority. For the regular http traffic, no problem. The TLS becomes a bit more interesting. In order to deal with that correctly, it becomes necessary to suck in more of the TLS implementation. Read the server_hello, and the certificate information. Well, if you’re going to read in the cert, you might as well get the subject common name out so you can use that bit of meta information. Now comes ASN.1 (DER) parsing, and x509 parsing. That code took about 2 weeks, working “nights and weekends” while the other stuff was going on. It took a good couple of weeks not to integrate, but to write enough test cases, with real live data, to ensure that it was actually working correctly.

The last month was largely a lot of testing, making sure corner cases were dealt with and the like. As the client code is actually deployed to a bunch of machines, it really needed to be rock solid, no memory leaks, no excessive resource utilization, no CPU spiking, just unobtrusive, quietly getting the job done.

So, that’s what it does.

Now, I’ve shipped at Microsoft for numerous years. The fastest cycles I’ve usually dealt with are on the order of 3 months. That’s usually for a product that’s fairly mature, has plenty of engineering system support, and a well laid out roadmap. Really you’re just turning the crank on an already laid out plan.

This AppDiscovery project has been a bit different. It did not start out with a plan that had a 6 month planning cycle in front of it. It was a hunch that we could deliver customer value by implementing something that was challenging enough, but achievable, in a short amount of time.

So, how is this different than Microsoft of yore? Well, yes, we’ve always been ‘customer focused’, but this is to the extreme. I’ve never had customers this involved in what I was doing this early in the development cycle. I mean literally, before the first prototypical bits are even dry, the PM team is pounding on the door asking “when can I give it to the customers?”. That’s a great feeling actually.

The second thing is how much process we allowed ourselves to use. Recognizing that it’s a first run, and recognizing that customers might actually say “mehh, not interested”, it doesn’t make sense to spin up the classic development cycle which is meant to maintain a product for 10-14 years. A much more streamlined lifecycle which favors delivering quality code and getting customer feedback, is what we employed. If it turns out that customers really like the product, then there’s room to fit the cycle to a cycle that is more appropriate for longer term support.

The last thing that’s special is the amount of leveraging Open Source we are allowing ourselves these days. Microsoft has gone full tilt on OpenSource support. I didn’t personally end up using much myself, but we are free to use it elsewhere (with some legal guidelines). This is encouraging, because for crypto, I’m looking forward to using things like SipHash, and ChaCha20, which don’t come natively with the Microsoft platform.

Overall, as Microsoft continues to evolve and deliver ‘customer centric’ stuff, I’m pretty excited and encouraged that we’ll be able to use this same model time and again to great effect. Microsoft has a lot of smart engineers. Combined with some new directives about meeting customer expectations at the market, we will surely be cranking out some more interesting stuff.

I’ve implemented some interesting stuff while working on this project, some if it I’ll share here.

Microsoft Part II

I joined Microsoft in 1998 to work on MSXML. One of the reasons I joined way back then is because MS was in trouble with the DOJ, and competitors were getting more interesting. I thought “They’re either going down, or they’re going to resurge, either way, it will be a fun ride”.

Here it is, more than 15 years later, and I find my sentiment about the same. Microsoft has been in trouble the past few years. Missing a few trends, losing our way, catching our breath as our competitors run farther and faster ahead of us…

In the past 4 years, I’ve been associated with the rise of Azure, and most recently associated with our various identity services. In the past couple of months, I’ve been heads down working in an internal startup, which is about to deliver bits to the web. That’s 2 months from conception to delivery of a public preview of a product. That’s fairly unheard of for our giant company.

But, today, I saw a blizzard of news that made me think ye olde company has some life yet left in it.

The strictly Microsoft related news…
Windows Azure Active Directory Premium
C# Goes Open Source
TypeScript goes 1.0
Windows 8.1 is FREE for devices less than 9″!!

Of all of these, I think the Windows 8.1 going for free is probably the most impactful from a ‘game changer’ perspective. Android is everywhere, probably largely because it is ‘free’. I can’t spit in the wind without hitting a new micro device that runs Android, and doesn’t run Windows. Perhaps this will begin to change somewhat.

Then there’s peripheral news like…
intel Galileo board ($99) is fully programmable from Visual Studio
Novena laptop goes for crowd funding

The Novena laptop is very interesting because it’s a substantial offering created by a couple of hardcore engineers. It is clearly a MUST HAVE machine for any self respecting hard/software hacker. It’s not the most powerful laptop in the world, and that’s beside the point. What it does represent is that some good engineers, hooked up with a solid supply chain, can produce goods that are almost price competitive with commodity goods. That and the fact that this is just an extraordinary hack machine.

I find the Galileo interesting because other than some third party support for Arduino programming from MSVC, this is a serious support drive for small things, from Microsoft. Given the previous news about the ‘free’, this Galileo support bodes well. You could conceivably get a $99 ‘computer’ with some form of Windows OS, and use it at the heart of your robot, quadcopter, art display, home automation thing…

Of course, the rest of the tinker market is heading even lower priced with things like the Teensy 3.1 at around $20. No “OS” per se, but surely capable hardware that could benefit from a nicely integrated programming environment and support from Microsoft. But, you don’t want Windows on such a device. You want to leverage some core technologies that Microsoft has in-house, and just apply it in various places. Wouldn’t it be great if all of Microsoft’s internal software was made available as installable packages…

Then there’s the whole ‘internet of things’ angle. Microsoft actually has a bunch of people focused in this space, but there’s no public offerings as yet. We’re Microsoft though, so you can imagine what the outcomes might look like. Just imagine lots of tiny little devices all tied to Microsoft services in some way, including good identities and all that.

Out on the fringe, non-Microsoft, there is, with their latest board back from manufacturing. I micro controller that runs node.js (and typescript for that matter), which is WiFi connected. That is bound to have a profound impact for those who are doing quick and dirty web connected physical computing.

Having spent the past few weeks coding in C++, I have been feeling the weight of years of piled on language cruft. I’ve been longing for the simplicity of the Lua language, and my beloved TINN, but that will just have to wait a few more weeks. In the meanwhile, I did purchase a Mojo FPGA board, in the hopes that I will once again get into FPGA programming, because “hardware is the new software”.

At the end of the day, I am as excited about the prospects of working at Microsoft as I was in 1998. My enthusiasm isn’t constrained by the possibilities of what Microsoft itself might do, rather I am overjoyed at the pace of development and innovation across the industry. There are new frontiers opening up all the time. New markets to explore, new waves to catch. It’s not all about desktops, browsers, office suites, search engines, phones, and tablets. Every day, there’s a new possibility, and the potential for a new application. Throw in 3D printing, instant manufacturing, and a smattering of BitCoin, and we’re living in a braver new world every day!!

Revisiting C++

I was a C++ expert twice in the past. The first time around was because I was doing some work for Taligent, and their whole operating system was written in C++. With that system I got knee deep into the finer details of templates, and exceptions, to a degree that will likely never be seen on the planet earth.

The second time around, was because I was programming on the BeOS. Not quite as crazy as the Taligent experience, but C/C++ were all the rage.

Then I drifted into Microsoft, and C# was born. For the past 15 years, it’s been a slow rise to dominance with C# in certain quarters of Microsoft. It just so happens that this corresponds to the rise of the virus attacks on Windows, as well as the shift in programming skills of college graduates. In the early days of spectacular virus attacks, you could attribute most of them to buffer overruns, which allowed code to run on the stack. This was fairly easily plugged by C#, and security coding standards.

Today, I am working on a project where once again I am learning C++. This time around it’s C++ 11, which is decidedly more mature than the C++ I learned while working on Taligent. It’s not so dramatically different as say the difference between Lisp and Cobol, but it gained a lot of stuff over the years.

I thought I would jot down some of the surface differences I have noticed since I’ve been away.

First, to compare C++ to Lua, there are some surface differences. Most of the languages I program in today have their roots in Algol, so they largely look the same. But, there are some simple dialect differences. C++ is full of curly braces ‘{}’, semi-colons ‘;’, and parenthesis ‘()’. Oh my god with the parens and semis!! With Lua, parens are optional, semis are optional, and instead of curlies, there are ‘do’, ‘end’, or simply ‘end’. For loops are different, array indices are different (unless you’re doing interop with the FFI), and do/while is repeat/until.

These are all minor differences, like say the differences between Portuguese and Spanish. You can still understand the other if you speak one. Perhaps not perfectly, but there is a relatively easy translation path.

Often times in language wars, these are the superficial differences that people talk about. Meh, not interesting enough to drive me one way or another.

But then, there’s this other stuff, which is truly the essence of the differences. Strong typing/duck typing, managed memory, dynamic code execution. I say ‘Lua’ here, but really that could be a standin for C#, node.js, Python, or Ruby. Basically, there are a set of modern languages which exhibit a similar set of features which are different enough from C/C++ that there is a difference in the programming models.

To illustrate, here’s a bit of C++ code that I have written recently. The setup is this, I receive a packet of data, typically the beginning of a HTTP conversation. From that packet of data, I must be able to ‘parse’ the thing, determine whether it is http/https, pull out headers, etc. I need to build a series of in-place parsers, which keep the amount of memory allocated to a minimum, and work fairly quickly. So, the first piece is this thing called a AShard_t:

#pragma once

#include "anl_conf.h"

class  DllExport AShard_t  {
	uint8_t *	m_Data;
	size_t	m_Length;
	size_t	m_Offset;

	// Constructors
	AShard_t(const char *);
	AShard_t(uint8_t *data, size_t length, size_t offset);

	// Virtual Destructor
	virtual ~AShard_t() {};

	// type cast
	operator uint8_t *() {return getData();}

	// Operator Overloads
	AShard_t & operator= (const AShard_t & rhs);

	// Properties
	uint8_t *	getData() {return &m_Data[m_Offset];};
	size_t		getLength() {return m_Length;};

	// Member functions
	AShard_t &	clear();
	AShard_t &	first(AShard_t &front, AShard_t &rest, uint8_t delim) const;
	bool		indexOfChar(const uint8_t achar, size_t &idx) const;
	bool		indexOfShard(const AShard_t &target, size_t &idx);
	bool 		isEmpty() const;
	void		print() const;
	bool		rebase();
	char *		tostringz() const;
	AShard_t &	trimfrontspace();


OK, so it’s actually a fairly simple data structure. Assuming you have a buffer of data, a shard is just a pointer into that buffer. It contains the pointer, an offset, and a length. You might say that the pointer/offset combo is redundant, you probably don’t need both. The offset could be eliminated, assuming the pointer is always at the base of the structure. But, there might be a design choice that makes this useful later.

At any rate, there’s a lot going on here for such a simple class. First of all, there’s that ‘#pragma once’ at the top. Ah yes, good ol’ C preprocessor, needs to be told not to load stuff it’s already loaded before. There’s there’s class vs struct, not to be confused with ‘typedef struct’. Public/Protected/Private, copy constructor or ‘operator=’. And heaven forbid you forget to make a public default constructor. You will not be able to create an array of these things without it!

These are not mere dialectual differences, these are the differences between Spanish and Hungarian. You MUST know about the default constructor thing, or things just won’t work.

As far as implementation is concerned, I did a mix of things here, primarily because the class is so small. I’ve inserted some simple “string” processing right into the class, because I found them to be constantly useful. ‘first’, ‘indexOfChar’, and ‘indexOfShard’ turn out to be fairly handy when you’re trying to parse through something in place. ‘first’ is like in Lisp, get the first element off the list of elements. In this case you can specify a single character delimiter. ‘indexOfChar’, is like strchr() function from C, except in this case it’s aware of the length, and it doesn’t assume a ‘null’ terminated string. ‘indexOfShard’ is like ‘strstr’, or ‘strpbrk’. With these in hand, you can do a lot of ‘tokenizing’.

Here’s an example of parsing a URL:

bool parseUrl(const AShard_t &uriShard)
  AShard_t shard = uriShard;
  AShard_t rest;
  AShard_t scheme;
  AShard_t url;
  AShard_t authority;
  AShard_t hostname;
  AShard_t port;
  AShard_t resquery;
  AShard_t resource;
  AShard_t query;

  // http:
  shard.first(scheme, rest, ':');

  // the 'rest' represents the resource, which 
  // includes the authority + query
  // so try and separate authority from query if the 
  // query part exists
  shard = rest;
  // skip past the '//'
  shard.m_Offset += 2;
  shard.m_Length -= 2;

  // Now we have the url separated from the scheme
  url = shard;

  // separate the authority from the resource based on '/'
  url.first(authority, rest, '/');
  resquery = rest;

  // Break the authority into host and port
  authority.first(hostname, rest, ':');
  port = rest;

  // Back to the resource.  Split it into resource/query
  parseResourceQuery(resquery, resource, query);

  // Print the shards
  printf("URI: "); uriShard.print();
  printf("  Scheme: "); scheme.print();
  printf("  URL: "); url.print();
  printf("    Authority: "); authority.print();
  printf("      Hostname: "); hostname.print();
  printf("      Port: "); port.print();
  printf("    Resquery: "); resquery.print();
  printf("      Resource: "); resource.print();
  printf("      Query: "); query.print();

  return true;

AShard_t url0("");

Of course, I’m leaving out error checking, but even for this simple tokenization, it’s fairly robust because in most cases, if a ‘first’ fails, you’ll just gen an empty ‘rest’, but definitely not a crash.

So, how does this fair against my beloved LuaJIT? Well, at this level things are about the same. In Lua, I could create exactly the same structure, using a table, and perform exactly the same operations. Only, if I wanted to do it without using the ffi, I’d have to stuff the data into a Lua string object (which causes a copy), then use the lua string.char, count from 1, etc. totally doable, and probably fairly optimized. There is a bit of a waste though because in Lua, everything interesting is represented by a table, so that’s a much bigger data structure than this simple AShard_t. It’s bigger in terms of memory footprint, and it’s probably slower in execution because it’s a generalized data structure that can serve many wonderful purposes.

For memory management, at this level of structure, things are relatively easy. Since the shard does not copy the data, it doesn’t actually do any allocations, so there’s relatively little to cleanup. The most common use case for shards is that they’ll either be stack based, or they’ll be stuffed into a data structure. In either case, their lifetime is fairly short and well managed, so memory management isn’t a big issue. If they are dynamically allocated, then of course there’s something to be concerned with.

Well, that touches the ice berg. I’ve re-attached to C++, and so far the gag reflex hasn’t driven me insane, so I guess it’s ok to continue.

Next, I’ll explore how insanely great the world becomes when shards roam the earth.

A Dictionary with a count

The most interesting type in Lua is the table object. The table serves dual purposes. It can act as an array, as well as a hash table. As a dictionary, you can do simple things like:

local tbl = {}
tbl["alpha"] = "alpha-value"

The problem with this construction is that you can not easily find out the number of items that are within the dictionary. The easiest way is to enumerate the whole thing, and keep a count:

local count = 0;
for k,v in pairs(tbl) do
  count = count + 1;

And so, what I really want is something that acts as a simple dictionary, but gives me the ability to easily find the count. I’ve created the “Bag”, which is simply a wrapper on a table, but it gives you a count.

local Bag = {}
setmetatable(Bag, {
	__call = function(self, ...)
		return self:_new(...);

local Bag_mt = {
	__index = function(self, key)
		--print("__index: ", key)
		return self.tbl[key]

	__newindex = function(self, key, value)		
		--print("__newindex: ", key, value)
		if value == nil then
			self.__Count = self.__Count - 1;
			self.__Count = self.__Count + 1;

		--rawset(self, key, value)
		self.tbl[key] = value;

	__len = function(self)
--		print("__len: ", self.__Count)
		return self.__Count;

function Bag._new(self, obj)
	local obj = {
		tbl = {},
		__Count = 0,

	setmetatable(obj, Bag_mt);

	return obj;

Each Bag instance has a ‘tbl’ and a ‘__Count’. Within the metastable, both the ‘__newindex’ and the ‘__index’ are implemented. The ‘__newindex’ is used when you make an assignment. So, when I do:

tbl["alpha"] = "alpha-value"

The ‘__newindex’ is called. This will in turn simply put the value into the self.tbl table. While it does that, it also increments the count of values that are stored in the Bag. When you assign “nil” to an entry, this will remove it from the underlying table, and thus decrement the count.

Then there’s the ‘__len’ metamethod. This will normally return the length of an item. In the case of regular tables, as long as they are being used as arrays (contiguous indices), then it will return the number of items. In the case of a regular table being used as a dictionary, it will return 0. So, implementing it here gives the Bag the ability to use the convenient ‘#’ operator.

local Collections = require("Collections")

local names = {
	alpha = "alpha-value";
	beta = "beta-value";
	gamma = "gamma-value";

local bg = Collections.Bag();

for k,v in pairs(names) do
	print("adding: ", k, v)
	bg[k] = v;

print("Count after add: ", #bg)

bg["gamma"] = nil;

print("Count, after 1 remove: ", #bg)

print("beta: ", bg["beta"])

Lua 5.1 will not utilize the ‘__len’ metamethod, only 5.2 will do that. So, with LuaJIT, which is half way between the two, you need to make sure you compile with the -DLUAJIT_ENABLE_LUA52COMPAT flag, or you won’t get the expected behavior.

This is great, and achieves what I wanted. But, I still want it to act as a dictionary in terms of being iterable, so I should implement a __pairs metamethod, but that can wait.

So, there you have it. A quick and dirty improvement upon tables which saves me the headache and wasted CPU cycles of counting my dictionary entries.

There are many situations where I am using a simple dictionary, but I want to quickly find the count of items in the dictionary

Iterating Over Oddities – of strings, arrays, and counting from 0

Could there possibly be anything more said about iterators and strings? Well, yah, actually tons. Last time around, I showed a simple iterator over a “string”. The focus was primarily on satisfying the job of parsing out null terminated strings from within a ‘null terminated’ string.

As I was doing that, I was also speculating as to whether I could use the exact same iterator to parse fixed sized records from an array of records. I actually wrote a giant iterator that does just that. Then I got to thinking, ‘this isn’t the way to do it’. Iterators, you see, can be broken down into constituent parts. The Lua documentation itself has quite a lot to say about iterators of various forms. One of the most promising bits of documentation is on 7.3 – Stateless Iterators.  In order to pull this off, you split up the “iterator” into a few parts.


generator – The function that gets called every time you need a new value.  The parameters that are passed to it are the “fixedpart”, and the “control”.

invariant state – this is the part of the iterator that doesn’t change much.  For example, the source string that you might be iterating over.

index – this is the part that changes every time generator is called.

so, when you do the following:

local values = {'a', 'b', 'c'}
for _idx, value in ipairs(values) do

The ‘generator’ is some function returned from the ‘ipairs()’ function. It will be called again and again, until it returns nil.

The ‘invariant state’ is the ‘values’ array. This will be fed to the generator each time a new value is needed.

And last, the _idx, is the index value. It will also be fed to the generator, along with the ‘invariant state’.

So, how about applying this to my previous multi string iterator?

local ffi = require("ffi")
local fun = require("fun")

local floor = math.floor;

-- a nil generator.  
-- good for cases when there's no data
local function nil_gen(param, state)
    return nil

local function delim_gen(param, idx)
	local len = 0;

	while ((idx+len) < param.nelems) do
		--print("wchar: ", string.char(ffi.cast(param.basetypeptr,[idx + len]))
		if ffi.cast(param.basetypeptr,[idx + len] ~= param.separator then
			len = len + 1;
	if len == 0 then
		return nil;

	return idx + len + 1, ffi.cast(param.basetypeptr,, len

local function array_gen(param, idx)
	if idx >= param.nelems then
		return nil;

	return idx+1, ffi.cast(param.basetypeptr,, 1

local function striter(params)
	if not params then
		return nil_gen, params, nil

	if not then
		return nil_gen, params, nil

	params.datalength = params.datalength or
	if params.basetype then
		if type(params.basetype)== "string" then
			params.basetype = ffi.typeof(params.basetype)
	params.basetype = params.basetype or ffi.typeof("char")
	params.basetypeptr = ffi.typeof("const $ *", params.basetype)
	params.basetypesize = ffi.sizeof(params.basetype)
	params.nelems = math.floor(params.datalength / params.basetypesize)

	if params.separator ~= nil then
		return delim_gen, params, 0
		return array_gen, params, 0

	return nil_gen, nil, nil

How to apply it?

local src3 = "big,boy,baby,bear,bounces,basketballs,behind,the,barn,,"

local function printAnsi(ptr, len)
  print(ffi.string(ptr, len))

each(printAnsi, striter{data=src3, basetype="char"})

Here, I am using the Lua Fun ‘each’ function to drive my iterator. I could just as easily use a simple ‘for – in’ loop, but I’m getting all functional these days. What the last statement says is, “for each of the items coming out of the striter iterator, call the ‘printAnsi()’ function”.

The striter function is called with a table that contains the various parameters it will need. In this particular case, I’ve left off the parenthesis, because in Lua, if it’s just a single table value, or a string, you can do that.

So, how about that striter() function? Looking back, it has the job of returning a ‘generator’, ‘invariant state’, and a ‘index’. Well, it cheats a bit because the ‘invariant’ also contains the ‘index’. The ‘invariant’ is the fact that the table value doesn’t change, even though the contents can. This is just a matter of convenience.

At any rate, the striter() function decides which generator to return based on what it sees in the parameters. For example, if it sees a separator, then it will return the ‘delim_gen’ generator. That generator functions pretty much the same was as the one I created last time for the multi string thing. In the case where it doesn’t see a separator, it will return the ‘array_gen’ generator. That generator will assume it is being handed a pointer to an array of values of a particular type.

One thing to note that is different this time around from the mstrziter, Lua string creation does not occur within the iterator itself. Rather than return a string value, the iterator will simply return an offset and a length. It is up to the caller to determine what they want to do with the values.

This is kind of a key to an IEnumerable chain. Do the least amount of work as possibly, deferring really heavy work towards the end of your chain. This lazy evaluation makes for a more efficient chain. So, the ‘printAnsi’ function is at the end of the chain. It might have turned out that instead of creating strings at all, I might have wanted to send the values across a network to be stored in a database. In that case, the pointer, offset, length is perfect to be consumed directy by the Socket:send(buff, len) function, so no copying would be necessary.

How about that array case?

Let’s imagine I wanted to print out every value of the string one by one.

each(printAnsi, striter{data=src3, basetype="char"})

In this case, I’m creating an iterator, not specifying the separator (so the array_gen will be used). I have also specified the ‘basetype’ of the elements of my array. That’s so it can calculate how many there are, and create a pointer of the appropriate type. And you’re done!

Of course, the ‘basetype’ could just as easily be ‘BGR32′, or ‘PersonRecord’, or whatever fixed size type you so happen to have stored in some array. Makes for some fairly easy ‘tokenizing’ of array values.

To go further, what’s say you have a multi string based on ‘wchar_t’, and delimeted by ‘ ‘ (space) characters?

How about a little convenience function?

local function wmstriter(data, separator, datalength)
  if type(separator) == "string" then
    separator = string.byte(separator)

  datalength = datalength or ffi.sizeof(data)
  return map(core_string.toAnsi, striter{data=data, datalength = datalength, basetype="wchar_t", separator=separator})

and using it:

local whello, whellolen = core_string.toUnicode("Hello World");

each(print, wmstriter(whello, ' '))

In this case, I create a ‘wchar_t’ based string, using the ‘toUnicode()’ function. Then I feed that to the wmstriter iterator, and print out each of the words, delimeted by a ‘(sp)’.

The core ‘striter()’ remains the same, and the generators don’t change. You just build up more useful ways of feeding them and consuming them.

I find this to be fairly useful and powerful. When you think about the generators, they have a fairly simple task. Do one small job, and do it well. The complexity of a larger system is gained through thoughtful composition of these simpler parts, rather than building hugely complex macro parts.

This is helpful for code maintenance. I find it hard to maintain largish single functions with hundreds of lines of code. I find it fairly easy to maintain a generator that has a few lines of code, and does a fairly simple job. I find the composition model easy as well. I can look back on it over time, read the functional chain, and understand what was intended.

And there you have it. More iterating over oddities.


Iterating over oddities – mstrziter

In some parts of the Windows API, you will run across the case where a ‘multi-string’ is the return value. A ‘multi-string’ might look like this:


Basically, a byte array with embedded null terminated strings. You find this in cases like when you’re enumerating devices, and the locationpaths (where the device is mounted) is returned. You find these in the system registry where they are the REG_MULTI_SZ type. So, if you’re wanting to deal with the system registry, you might run across this particular type. I’m not sure I’d ever have a need to create this particular type in modern day programming. I might choose a different delimeter than the ” byte, but it is fairly effective for ascii strings.

At any rate, I want an iterator over this thing so that I can easily navigate around it without much fuss. Thus was born mstrziter.

local function mstrziter(params)
	params = params or {}

	params.separator = params.separator or 0
	params.datalength = params.datalength or
	params.basetype = params.basetype or ffi.typeof("char")
	params.basetypeptr = ffi.typeof("const $ *", params.basetype)
	params.maxlen = params.maxlen or params.datalength-1;

	local function closure(param, idx)
		if not then
			return nil;

		local len = 0;
		while ffi.cast(param.basetypeptr,[idx + len] ~= param.separator and (len < param.maxlen) do
			len = len +1;

		if len == 0 then
			return nil;

		return (idx + len+1), ffi.string(ffi.cast(param.basetypeptr,, len*ffi.sizeof(param.basetype));

	return closure, params, 0;

With this little construct, I can do the following:

local src = "big\0boy\0baby\0bear\0bounces\0basketballs\0behind\0the\0barn\0\0"

print("---- each(print, mstrziter(src)")
each(print, mstrziter{data = src})

That is, passing a multi-string to mstrziter and it will feed out the strings one by one. Well, I think that’s pretty nifty, and cracks multi-strings wide open. When I first created the thing, I only wanted to deal with null delimeted strings, but there’s no reason this same routine can’t do more.

I can just as easily do the following:

local src2 = "big\rboy\rbaby\rbear\rbounces\rbasketballs\rbehind\rthe\rbarn\r\r"
print("---- take(3, mstrziter(src2) ----")
each(print, take(3, mstrziter{data = src2, separator = string.byte('\r')}))

In this case I’m using ‘\r’ as the delimeter. Of course Lua already has the gmatch() function on strings, so why would I bother with this at all? Well, because I’m not always handed a string to begin with. Most of the times I’m handed a pointer to some buffer. I don’t want to waste time turning that into a Lua string, simply to chop it up into more strings.

Another interesting thing you can do is pass in the base type of the array:

each(print, mstrziter{data = src, basetype=ffi.typeof(“const wchar_t *”)})

In this way, it can deal with standard ‘char *’, or ‘wchar_t *’. But why stop there? You can pass in an array of any type, as long as there is an ‘__eq’ implemented for that type. So, fixed sized records in a database? If you do that, then your queries using the functional programming stuff become fairly easy and interesting.

Well, there you have it. A fairly simple iterator. It serves the purpose I set out for it, and gave me a whole lot more in return. The one wish I have for it is one for Lua in general. When I have a buffer which already contains the content, I would like to create ‘tokens’ instead of full on strings. Meaning, I’d like to have a token which consists of the base pointer, offset, length in the original buffer. But, that becomes a hairy mess, not immutable, not properly garbage collected… So, for now, I’m happy with just creating new copies of the strings.

Device Iteration with Functional Programming

One of the great pleasures I have in life is learning something new. There’s nothing greater than those ‘light bulb goes on’ moments as you realize something and gain a much deeper understanding than you had before.

Well, a little while ago, there was an announcement of this thing called Lua Fun.  Lua Fun is a large set of functions which make functional programming in Lua really easy.  It has the usual suspects such as map, reduce, filter, each, etc.  If you read the documentation, you get a really good understanding of how iterators work in Lua, and more importantly, how LuaJIT is able to fold and manipulate things in hot loops such that the produced code is much tighter than anything I could possibly write in C, or any other language I so happen to use.

So, now I’m a fan of Lua Fun, and I would encourage anyone who’s both into Lua, and functional programming to take a look.

How to use it?  I’ve been enumerating various types of things in the Windows system of late.  Using the MMDevice subsystem, I was able to get an enumeration of the audio devices (that took a lot of COM work).  What about displays, and disk drives, and USB devices, and…  Yes, each one of those things has an attendant API which will facilitate monitoring said category.  But, is there one to rule them all?  Well yes, as it turns out, in most cases what these various APIs are doing is getting information out of the System Registry, and just presenting it reasonably.

There is a way to enumerate all the devices in the system.  You know, like when you bring up the Device Manager in Windows, and you see a tree of devices, and their various details.  The stage is set, how do you do that?  I created a simple object that does the grunt work of enumerating the devices in the system.

The DeviceRecordSet is essentially a query. Creating an instance of the object just gives you a handle onto making query requests. Here is the code:

local ffi = require("ffi")
local bit = require("bit")
local bor = bit.bor;
local band =;

local errorhandling = require("core_errorhandling_l1_1_1");
local SetupApi = require("SetupApi")
local WinNT = require("WinNT")

local DeviceRecordSet = {}
setmetatable(DeviceRecordSet, {
	__call = function(self, ...)
		return self:create(...)

local DeviceRecordSet_mt = {
	__index = DeviceRecordSet,

function DeviceRecordSet.init(self, rawhandle)
	print("init: ", rawhandle)

	local obj = {
		Handle = rawhandle,
	setmetatable(obj, DeviceRecordSet_mt)

	return obj;

function DeviceRecordSet.create(self, Flags)
	Flags = Flags or bor(ffi.C.DIGCF_PRESENT, ffi.C.DIGCF_ALLCLASSES)

	local rawhandle = SetupApi.SetupDiGetClassDevs(

	if rawhandle == nil then
		return nil, errorhandling.GetLastError();

	return self:init(rawhandle)

function DeviceRecordSet.getNativeHandle(self)
	return self.Handle;

function DeviceRecordSet.getRegistryValue(self, key, idx)
	idx = idx or 0;

	did.cbSize = ffi.sizeof("SP_DEVINFO_DATA");

--print("HANDLE: ", self.Handle)
	local res = SetupApi.SetupDiEnumDeviceInfo(self.Handle,idx,did)

	if res == 0 then
		local err = errorhandling.GetLastError()
		--print("after SetupDiEnumDeviceInfo, ERROR: ", err)
		return nil, err;

	local regDataType ="DWORD[1]")
	local pbuffersize ="DWORD[1]",260);
	local buffer ="char[260]")

	local res = SetupApi.SetupDiGetDeviceRegistryProperty(

	if res == 0 then
		local err = errorhandling.GetLastError();
		--print("after GetDeviceRegistryProperty, ERROR: ", err)
		return nil, err;

	--print("TYPE: ", regDataType[0])
	if (regDataType[0] == 1) or (regDataType[0] == 7) then
		return ffi.string(buffer, pbuffersize[0]-1)
	elseif regDataType[0] == ffi.C.REG_DWORD_LITTLE_ENDIAN then
		return ffi.cast("DWORD *", buffer)[0]

	return nil;

function DeviceRecordSet.devices(self, fields)
	fields = fields or {
		{ffi.C.SPDRP_DEVICEDESC, "description"},
		{ffi.C.SPDRP_MFG, "manufacturer"},
		{ffi.C.SPDRP_DEVTYPE, "devicetype"},
		{ffi.C.SPDRP_CLASS, "class"},
		{ffi.C.SPDRP_ENUMERATOR_NAME, "enumerator"},
		{ffi.C.SPDRP_FRIENDLYNAME, "friendlyname"},
		{ffi.C.SPDRP_LOCATION_INFORMATION , "locationinfo"},
		{ffi.C.SPDRP_LOCATION_PATHS, "locationpaths"},
		{ffi.C.SPDRP_SERVICE, "service"},

	local function closure(fields, idx)
		local res = {}

		local count = 0;
		for _it, field in ipairs(fields) do
			local value, err = self:getRegistryValue(field[1], idx)
			if value then
				count = count + 1;
				res[field[2]] = value;

		if count == 0 then
			return nil;
		return idx+1, res;

	return closure, fields, 0

return DeviceRecordSet

The ‘getRegistryValue()’ function is the real workhorse of this object. That’s what gets your values out of the system registry. The other function of importance is ‘devices()’. This is an iterator.

There are a couple of things of note about this iterator. First of all, it does not require ‘up values’ to be held onto. All that means is that everything the iterator needs to operate is carried in the return values from the function. The ‘state’ if you will, is handed in fresh every time the ‘closure()’ is called. This is the key to creating an iterator that will work well with Lua Fun.

By default, this iterator will return quite a few (but not all) fields related to each object, and it will return all the objects. This is ok, because there are typically less than 150 objects in any given system.

Now, I want to do various queries against this set without much fuss. This is where Lua Fun, and functional programming in general, really shines.

First, a little setup:

local ffi = require("ffi")
local DeviceRecordSet = require("DeviceRecordSet")
local serpent = require("serpent")
local Functor = require("Functor")

local fun = require("fun")()
local drs = DeviceRecordSet();

local function printIt(record)
	each(print, record)

This creates an instance of the DeviceRecordSet object, which will be used in the queries. Already the printIt() function is utilizing Lua Fun. The ‘each()’ function will take whatever it’s handed, and perform the function specified. In this case, the ‘record’ will be a table. So, each will iterate over the table entries and print each one of them out. This is the equivalent of doing:

for k,v in pairs(record)
print(k, v)

I think that simply typing ‘each’ is a lot simpler and pretty easy to understand.

How about a query then?

-- show everything for every device
each(printIt, drs:devices())

In this case, the ‘each’ is applied to the results of the ‘devices()’ iterator. For each record coming from the devices iterator, the printIt function will be called, which will in turn print out all the values in the record. That’s pretty nice.

What if I don’t want to see all the fields in the record, I just want to see the objectname, and description fields. Well, this is a ‘map’ operation, or a projection in database parlance, so:

-- do a projection on the fields
local function projection(x)
  return {objectname = x.objectname, description = x.description}
each(printIt, map(projection, drs:devices()))

Working from the inside out, for each record coming from the devices() iterator, call the ‘projection’ function. The return value from the projection function becomes the new record for this iteration. For each of those records, call the printIt function.

Using ‘map’ is great as you can reshape data in any way you like without much fuss.

Lastly, I want to see only the records that are related to “STORAGE”, so…

-- show only certain records
local function enumeratorFilter(x)
	return x.enumerator == "STORAGE"

each(printIt, filter(enumeratorFilter, drs:devices()))

Here, the ‘filter’ iterator is used. So, again, for each of the records coming from the ‘devices()’ enumerator, call the ‘enumeratorFilter’ function. If this function returns ‘true’ for the record, then it is passed along as the next record for the ‘each’. If ‘false’, then it is skipped, and the next record is tried.

This is pretty powerful, and yet simple stuff. The fact that iterators create new iterators, in tight loops, makes for some very dense and efficient code. If you’re interested in why this is so special in LuaJIT, and not many other languages, read up on the Lua Fun documentation.

I’ve killed two birds with one stone. I have finally gotten to the root of all device iterators. I have also learned how to best write iterators that can be used in a functional programming way. Judicious usage of this mechanism will surely make a lot of my code more compact and readable, as well as highly performant.



Get every new post delivered to your Inbox.

Join 47 other followers