Splunking Windows – Extracting pleasure from legacy apis

If you are a modern programmer of Windows apps, there are numerous frameworks for you, hundreds of SDKs, scripted wrappers, IDEs to hide behind, and just layers upon layers of goodness to keep you safe and sane.  So, when it comes to using some of the core Windows APIs directly, you can be forgiven for not even knowing they exist, let alone how to use them from your favorite environment.

I’ve done a ton of exploration on the intricacies of the various Linux interfaces, Spelunking Linux goes over everything from auxv to procfs, and quite a few in between.  But, what about Windows?  Well, I’ve recently embarked on a new project lj2win32 (not to be confused with earlier LJIT2Win32).  The general purpose of this project is to bring the goodness of TINN to the average LuaJIT developer.  Whereas TINN is a massive project that strives to cover the entirety of the known world of common Windows interfaces, and provides a ready to go multi-tasking programming environment, lj2win32 is almost the opposite.  It does not provide its own shell, rather it just provides the raw bindings necessary for the developer to create whatever they want.  It’s intended to be a simple luarocks install, much in the way the ljsyscall works for creating a standard binding to UNIX kinds of systems without much fuss or intrusion.

In creating this project, I’ve tried to adhere to a couple of design principles to meet some objectives.

First objective is that it must ultimately be installable using luarocks.  This means that I have to be conscious about the organization of the file structure.  To wit, everything of consequence lives in a ‘win32’ directory.  The package name might ultimately be ‘win32’.  Everything is referenced from there.

Second objective, provide the barest minimum bindings.  Don’t change names of things, don’t introduce non-windows semantics, don’t create several layers of class hierarchies to objectify the interfaces.  Now, of course there are some very simple exceptions, but they should be fairly limited.  The idea being, anyone should be able to take this as a bare minimum, and add their own layers atop it.  It’s hard to resist objectifying these interfaces though, and everything from Microsoft’s ancient MFC, ATL, and every framework since, has thrown layers of object wrappers on the core Win32 interfaces.  In this case, wrappers and other suggestions will show up in the ‘tests’ directory.  That is fertile ground for all manner of fantastical object wrapperage.

Third objective, keep the dependencies minimal.  If you do programming in C on Windows, you include a couple of well known headers files at the beginning of your program, and the whole world gets dragged in.  Everything is pretty much in a global namespace, which can lead to some bad conflicts, but they’ve been worked out over time.  In lj2win32, there are only a couple things in the global namespace, everything else is either in some table, or within the ffi.C facility.  Additionally, the wrappings are clustered in a way that follows the Windows API Sets.  API sets are a mechanism Windows has for pulling apart interdependencies in the various libraries that make up the OS.  In short, it’s just a name (so happens to end in ‘.dll’) which is used by the loader to load in various functions.  If you use these special names, instead of the traditional ‘kernel32’, ‘advapi32’, you might pull in a smaller set of stuff.

With all that, I thought I’d explore one particular bit of minutia as an example of how things could go.

The GetSystemMetrics() function call is a sort of dumping ground for a lot of UI system information.  Here’s where you can find things like how big the screen is, how many monitors there are, how many pixels are used for the menu bars, and the like.  Of course this is just a wrapper on items that probably come from the registry, or various devices and tidbits hidden away in other databases throughout the system, but it’s the convenient developer friendly interface.

The signature looks like this

int WINAPI GetSystemMetrics(
_In_ int nIndex
);

A simple enough call. And a simple enough binding:

ffi.cdef[[
int GetSystemMetrics(int nIndex);
]]

Of course, there is the ‘nIndex’, which in the Windows headers is a bunch of manifest constants, which in LuaJIT might be defined thus:

ffi.cdef[[
	// Used for GetSystemMetrics
static const int	SM_CXSCREEN = 0;
static const int	SM_CYSCREEN = 1;
static const int	SM_CXVSCROLL = 2;
static const int	SM_CYHSCROLL = 3;
static const int	SM_CYCAPTION = 4;
static const int	SM_CXBORDER = 5;
static const int	SM_CYBORDER = 6;
]] 

 
Great. Then I can simply do

local value = ffi.C.GetSystemMetrics(ffi.C.SM_CXSCREEN)

 
Fantastic, I’m in business!

So, this meets the second objective of bare minimum binding. But, it’s not a very satisfying programming experience for the LuaJIT developer. How about just a little bit of sugar? Well, I don’t want to violate the same second objective of non-wrapperness, so I’ll create a separate thing in the tests directory. The systemmetrics.lua file contains a bit of an exploration in getting of system metrics.

It starts out like this:

local ffi = require("ffi")
local errorhandling = require("win32.core.errorhandling_l1_1_1");

ffi.cdef[[
int GetSystemMetrics(int nIndex);
]]

local exports = {}

local function SM_toBool(value)
	return value ~= 0
end

Then defines something like this:

exports.names = {
    SM_CXSCREEN = {value = 0};
    SM_CYSCREEN = {value = 1};
    SM_CXVSCROLL = {value = 2};
    SM_CYHSCROLL = {value = 3};
    SM_CYCAPTION = {value = 4};
    SM_CXBORDER = {value = 5};
    SM_CYBORDER = {value = 6};
    SM_CXDLGFRAME = {value = 7};
    SM_CXFIXEDFRAME = {value = 7};
    SM_CYDLGFRAME = {value = 8};
    SM_CYFIXEDFRAME = {value = 8};
    SM_CYVTHUMB = {value = 9};
    SM_CXHTHUMB = {value = 10};
    SM_CXICON = {value = 11};
    SM_CYICON = {value = 12};
    SM_CXCURSOR = {value = 13};
    SM_CYCURSOR = {value = 14};
    SM_CYMENU = {value = 15};
    SM_CXFULLSCREEN = {value = 16};
    SM_CYFULLSCREEN = {value = 17};
    SM_CYKANJIWINDOW = {value = 18, converter = SM_toBool};
    SM_MOUSEPRESENT = {value = 19, converter = SM_toBool};
    SM_CYVSCROLL = {value = 20};
    SM_CXHSCROLL = {value = 21};
    SM_DEBUG = {value = 22, converter = SM_toBool};
    SM_SWAPBUTTON = {value = 23, converter = SM_toBool};
    SM_RESERVED1 = {value = 24, converter = SM_toBool};
    SM_RESERVED2 = {value = 25, converter = SM_toBool};
    SM_RESERVED3 = {value = 26, converter = SM_toBool};
    SM_RESERVED4 = {value = 27, converter = SM_toBool};
}

And finishes with a flourish like this:

local function lookupByNumber(num)
	for key, entry in pairs(exports.names) do
		if entry.value == num then
			return entry;
		end
	end

	return nil;
end

local function getSystemMetrics(what)
	local entry = nil;
	local idx = nil;

	if type(what) == "string" then
		entry = exports.names[what]
		idx = entry.value;
	else
		idx = tonumber(what)
		if not idx then 
			return nil;
		end
		
		entry = lookupByNumber(idx)

        if not entry then return nil end
	end

	local value = ffi.C.GetSystemMetrics(idx)

    if entry.converter then
        value = entry.converter(value);
    end

    return value;
end

-- Create C definitions derived from the names table
function exports.genCdefs()
    for key, entry in pairs(exports.names) do
        ffi.cdef(string.format("static const int %s = %d", key, entry.value))
    end
end

setmetatable(exports, {
	__index = function(self, what)
		return getSystemMetrics(what)
	end,
})

return exports

All of this allows you to do a couple of interesting things. First, what if you wanted to print out all the system metrics. This same technique can be used to put all the metrics into a table to be used within your program.

local sysmetrics = require("systemmetrics");

local function testAll()
    for key, entry in pairs(sysmetrics.names) do
        local value, err = sysmetrics[key]
        if value ~= nil then
            print(string.format("{name = '%s', value = %s};", key, value))
        else
            print(key, err)
        end
    end
end

OK, so what? Well, the systemmetrics.names is a dictionary matching a symbolic name to the value used to get a particular metric. And what’s this magic with the ‘sysmetrics[key]’ thing? Well, let’s take a look back at that hand waving from the systemmetrics.lua file.

setmetatable(exports, {
	__index = function(self, what)
		return getSystemMetrics(what)
	end,
})

Oh, I see now, it’s obvious…

So, what’s happening here with the setmetatable thing is, Lua has a way of setting some functions on a table which will dictate the behavior they will exhibit in certain situations. In this case, the ‘__index’ function, if it exists, will take care of the cases when you try to look something up, and it isn’t directly in the table. So, in our example, doing the ‘sysmetrics[key]’ thing is essentially saying, “Try to find a value with the string associated with ‘key’. If it’s not found, then do whatever is associated with the ‘__index’ value”. In this case, ‘__index’ is a function, so that function is called, and whatever that returns becomes the value associated with that key.

I know, it’s a mouth full, and metatables are one of the more challenging aspects of Lua to get your head around, but once you do, it’s a powerful concept.

How about another example which will be a more realistic and typical case.

local function testSome()
    print(sysmetrics.SM_MAXIMUMTOUCHES)
end

In this case, the exact same mechanism is at play. In Lua, there are two ways to get a value out of a table. The first one we’ve already seen, where the ‘[]’ notation is used, as if the thing were an array. In the ‘testSome()’ case, the ‘.’ notation is being utilized. This is accessing the table as if it were a data structure, but it’s exactly the same as trying to access as an array, at least as far as the invocation of the ‘__index’ function is concerned. The ‘SM_MAXIMUMTOUCHES’ is taken as a string value, so it’s the same as doing: sysmetrics[‘SM_MAXIMUMTOUCHES’], and from the previous example, we know how that works out.

Now, there’s one more thing to note from this little escapade. The implementation of the helper function:

local function getSystemMetrics(what)
	local entry = nil;
	local idx = nil;

	if type(what) == "string" then
		entry = exports.names[what]
		idx = entry.value;
	else
		idx = tonumber(what)
		if not idx then 
			return nil;
		end
		
		entry = lookupByNumber(idx)

        if not entry then return nil end
	end

	local value = ffi.C.GetSystemMetrics(idx)

    if entry.converter then
        value = entry.converter(value);
    end

    return value;
end

There’s all manner of nonsense in here. The ‘what’ can be either a string or something that can be converted to a number. This is useful because it allows you to pass in symbolic names like “SM_CXBLAHBLAHBLAH” or a number 123. That’s great depending on what you’re interacting with and how the values are held. You might have some UI for example where you just want to use the symbolic names and not deal with numbers.

The other thing of note is that ‘entry.converter’ bit at the end. If you look back at the names table, you’ll notice that some of the entries have a ‘converter’ field associated with them. this is an optional function that can be associated with the entries. If it exists, it is called, with the value from the system called passed to it. In most cases, what the system returns is a number (number of mouse buttons, size of screen, etc). In some cases, the value returned is ‘0’ for false, and ‘non-zero’ for true. Well, as a Lua developer, I’d rather just get a bool in those cases where it’s appropriate, and this helper function is in a position to provide that for me. This is great because it allows me to not have to check the documentation to figure it out.

There’s one more tiny gem hidden in all this madness.

function exports.genCdefs()
    for key, entry in pairs(exports.names) do
        ffi.cdef(string.format("static const int %s = %d", key, entry.value))
    end
end

What does this do exactly? Simply, it generates those constants in the ffi.C space, so that you can still do this:

ffi.C.GetSystemMetrics(ffi.C.SM_MAXIMUMTOUCHES)

So, there you have it. You can go with the raw traditional sort of ffi binding, or you can spice things up a bit and make things a bit more useful with a little bit of effort. I like doing the latter, because I can generate the more traditional binding from the table of names that I’ve created. That’s a useful thing for documentation purposes, and in general.

I have stuck to my objectives, and this little example just goes to prove how esoteric minute details can be turned into approachable things of beauty with a little bit of Lua code.


Note To Self – VS Code seems reasonable

No secret, I still work for Microsoft…

Over the past 17 years of working for the company, my go-to editor had largely been Visual Studio.  Since about 2000, it was Visual C#.  Then around 2011, I switched up, and started doing a lot of Javascript, Lua, and other languages, and my editor went from Notepad++, to a combination of Sublime Text and vim.

Most recently, I’ve had the opportunity to try and enable some editing on Windows 10 tablets, and I chose a new editor, Visual Studio Code.  I am by no means a corporate apologist, but I will certainly point out when I think my company is doing something good.  Visual Studio Code is an easy replacement for Sublime Text, at least for my needs and tastes.  I’ve been trying it out on and off for the past few months, and it just keeps improving.

Like all modern editors, it has an ‘add-on’ capability, which has a huge community of add-on builders adding on stuff.  Of course, there’s some lua syntax highlighting, which makes it A number one in my book already.  But, there are other built in features that I like as well.  It has a simple and sane integration with git repositories right out of the box.  So, I just open up my favorite projects, start editing, and it shows which files are out of sync.  A couple of clicks, type in my credentials, and the sync/push happens.  I’m sure there’s an extension for that in all modern editors, including Sublime Text, but here it’s just built into the base editor.

One item that struck me as a pleasant surprise the other day was built in support for markdown language.  I was refreshing the documentation files for schedlua, and I was putting in code block indicators (“`lua).  After I put in one such indicator, I noticed the quoted code suddenly had the lua syntax highlighting!  Yah, well, ok, getting excited about not much.  But, I had never seen that with Sublime Text, so it was new for me.  That was one of those features that just made me go ‘sold’.

The editor has other features such as being able to run a command line from within the editor and such, but it’s not a full blown IDE like Visual Studio, which is good because the tablets I’m running it on don’t have 4 – 8Gb of RAM to run Visual Studio comfortably.  So, it’s just enough editor to replace the likes of Sublime Text.  I also like the fact that it’s backed by a large company that is dedicated to continue to improve it over time with regular updates.  The community that’s being built up around add-ons seems fairly robust, which is also another good sign.  Given Microsoft’s current penchant for Open Sourcing things, I would not be surprised if it showed up available on GitHub some day in the future, which would just make it that much more interesting.

So, for now (future self), I will be using VS Code as my editor on Windows, MacOS, and Linux.  It has the stability and feature set that I need, and it continues to evolve, adding more stability and features that I find to be useful.

 


Drawproc – Like Processing, but in C++

Triangle Strips using drawproc

trianglestrip

So, what’s the code look like to create this gem?

#include "drawproc.h"

int x;
int y;
float outsideRadius = 150;
float insideRadius = 100;


void setup() {
	size(640, 360);
	background(204);
	x = width / 2;
	y = height / 2;
}


void drawStrip()
{
	int numPoints = int(map(mouseX, 0, width, 6, 60));
	float angle = 0;
	float angleStep = 180.0 / numPoints;

	beginShape(GR_TRIANGLE_STRIP);
	for (int i = 0; i <= numPoints; i++) {
		float px = x + cos(radians(angle)) * outsideRadius;
		float py = y + sin(radians(angle)) * outsideRadius;
		angle += angleStep;
		vertex(px, py);
		px = x + cos(radians(angle)) * insideRadius;
		py = y + sin(radians(angle)) * insideRadius;
		vertex(px, py);
		angle += angleStep;
	}
	endShape();
}

void draw() {
  background(204);
  drawStrip();
}

If you’ve done any coding in Processing, you can look at the example that inspired this bit of code here: Triangle Strip

What’s notable about it is the similarity to the Java or even the JavaScript version (if processing.js). It takes about a 10 minute conversion to go from Processing to using drawproc. So, what is drawproc?

Drawproc is an application and library which facilitates the creation of interactive graphics. It is the culmination of taking the work from graphicc and encapsulating in such a way that makes it easy to use in multiple situations.

So, how does it work?  Basically, there is the drawproc.exe application.  This application contains a main(), and a primary event loop which takes care of capturing mouse and keyboard events, and issuing “draw()” calls.  Previously (Dyanmic Programming in C) I explained how that dynamic bit of machinery works. All that machinery is at work here with the addition of one more dynamic programming item.

bool InitializeInstance(const char *moduleName)
{

	// Get pointers to client setup and loop routines
	clientModule = LoadLibrary(moduleName);

	printf("modH: 0x%p\n", clientModule);

	SetupHandler procAddr = (SetupHandler)GetProcAddress(clientModule, "setup");
	printf("proc Address: 0x%p\n", procAddr);

	if (procAddr != NULL) {
		setSetupRoutine(procAddr);
	}

	LoopHandler loopAddr = (LoopHandler)GetProcAddress(clientModule, "draw");
	printf("loop Addr: 0x%p\n", loopAddr);

	if (loopAddr != NULL) {
		setLoopRoutine(loopAddr);
	}

	if ((procAddr == nullptr) && (loopAddr == nullptr))
	{
		return false;
	}

	gClock = dproc_clock_new();

	return true;
}

When invoking drawproc, you give a name of a module, which is a .dll file compiled against the .exe. Typical execution looks like this:

c:\tools>drawproc trianglestrip.dll

That ‘trianglestrip.dll’ is passed along to the InitializeInstance() call, the module is loaded, and the ‘setup()’ and ‘draw()’ functions are looked for. If neither of them is found, or the .dll doesn’t load, then the program quits. At this point, everything is the same as if you had linked the drawing module into the drawproc.exe program directly. The advantage is you have a simple small (~200K for debug version) executable (drawproc.exe) which is very slow changing. Then you have the modules, which can be numerous, and dynamic. You can create modules independently of the drawproc.exe and run as you wish. You could even write a single module which loads .lua, or any other embedded scripting environment, and write your code using that scripting language instead.

How do you create these modules? Well, you just write your code, make reference to the header files within drawproc, and use drawproc.lib as the library reference. All the relevant symbols within drawproc are exported, so this just works. At the end of the day, the drawproc.exe looks just like any other .dll that might be out there.

In case you’re still reading, here’s another picture.

SineConsoleBanate CAD 2011

This one is interesting because it’s actually an animation (SineConsole).  A few years back, when I was experimenting with BanateCAD, I had done something similar, all in Lua Banate CAD 2011.

Why bother with all this though?  Why C? What’s the point?  I had this interesting conversation last week with a co-worker.  We were discussing whether people who are coming into becoming software programmers would be better served by learning C#, or C/C++.  I think my answer was C#, simply because it seems more in fashion and more applicable to other dynamic languages than does C/C++.  But, here we’re doing a lot of ‘dynamic’ with standard C/C++.  Really the answer to that question is “you will need to learn and use many languages, frameworks, and tools in your software development.  Learning some C will likely serve you well, but be prepared to learn many different things.’

drawproc being written in C/C++ is great because it makes programming graphics fairly simple (because of the Processing mimicry).  Using the Processing API makes the graphics stuff really easy.  At the same time, since it’s written in C/C++, gaining access to the lowest level stuff of the platform is really easy as well.  For example, integrating with the Microsoft Kinect sensor is as easy as just using the Microsoft Provided SDK directly.  No shim, no translation layer, no ‘binding’ to get in the way.  That’s a very good thing.  Also, as time goes on, doing the accelerated this and that, throwing in networking and the like will be a relative no brainer.

So, there you have it.  drawproc is a new standalone tool which can be used for fiddling about with graphics.  For those who are into such things, it’s a nice tool to play with.


William Does Linux on Azure!

What?

You see, it’s like this.  As it turns out, a lot of people want to run code against a Linux kernel in the cloud.  Even though Windows might be a fine OS for cloud computing, the truth is, many customers are simply Linux savvy.  So, if we want to make those customers happy, then Linux needs to become a first class citizen in the Azure ecosystem.

Being a person to jump on technological and business related grenades, I thought I would join the effort within Microsoft to make Linux a fun place to be on Azure.  What does that mean?  Well, you can already get a Linux VM on Azure pretty easily, just like with everyone else.  But what added value is there coming from Microsoft so this isn’t just a simple commodity play?  Microsoft does in fact have a rich set of cloud assets, and not all of them are best accessed from a Linux environment.  This might mean anything from providing better access to Azure Active Directory, to creating new applications and frameworks altogether.

One thing is for sure.  As the Windows OS heads for the likes of the Raspberry Pi, and Linux heads for Azure, the world of computing is continuing to be a very interesting place.


Goodbye to colleagues

July 17 2014, some have called it Black Thursday at Microsoft.

I’ve been with the company for more than 15 years now, and I was NOT given the pink slip this time around.

Over those years, I have worked with tons of people, helped develop some careers, shipped lots of software, and generally had a good time.  Some of my colleagues were let go.  I actually feel fairly sad about it.  This is actually the second time I’ve known of colleagues being let go.  These are not people who are low performers.  In fact, last time around, the colleague found another job instantly within the company.

I remember back in the day Apple Computer would go through these fire/hire binges.  They’d let go a bunch of people, due to some change in direction or market, and then within 6 months end up hiring back just as many because they’d figured out something new which required those skilled workers.

In this case, it feels a bit different.  New head guy, new directions, new leadership, etc.

I’ve done some soul searching over this latest cull.  It’s getting lonely in My old Microsoft.  When you’ve been there as long as I have, the number of people you started with becomes very thin.  So, what’s my motivation?

It’s always the same I think.  I joined the company originally to work on the birth of XML.  I’ve done various other interesting things since then, and they all have the same pattern.  Some impossible task, some new business, some new technical challenge.

This is just the beginning of the layoffs, and I don’t know if I’ll make the next cull, but until then, I’ll be cranking code, doing the impossible, lamenting the departure of some very good engineering friends.  Mega corp is gonna do what mega corp’s gonna do.  I’m and engineer, and I’m gonna do some more engineering.

 


Fast Apps, Microsoft Style

Pheeeuuww!!

That’s what I exclaimed at least a couple of times this morning as I sat at a table in a makeshift “team room” in building 43 at Microsoft’s Redmond campus. What was the exclamation for? Well, over the past 3 months, I’ve been working on a quick strike project with a new team, and today we finally announced our “Public Preview“.  Or, if you want to get right to the product: Cloud App Discovery

I’m not a PM or marketing type, so it’s best to go and read the announcement for yourself if you want to get the official spiel on the project.  Here, I want to write a bit about the experience of coming up with a project, in short order, in the new Microsoft.

It all started back in January for me.  I was just coming off another project, and casting about for the next hardest ‘mission impossible’ to jump on.  I had a brief conversation with a dev manager who posed the question; “Is it possible to reestablish the ‘perimeter’ for IT guys in this world of cloud computing”?  An intriguing question.  The basic problem was, if you go to a lot of IT guys, they can barely tell you how many of the people within their corporation are using SalesForce.com, let alone DropBox from a cafe in Singapore.  Forget the notion of even trying to control such access.  The corporate ‘firewall’ is almost nothing more than a quartz space heater at this point, preventing very little, and knowing about even less.

So, with that question in mind, we laid out 3 phases of development.  Actually, they were already laid out before I joined the party (by a couple of weeks), so I just heard the pitch.  It was simple, the first phase of development is to see if we can capture network traffic, using various means, and project it up to the  cloud where we could use some machine learning to give an admin a view of what’s going on.

Conveniently sidestepping any objections actual employees might have with this notion, I got to thinking on how it could be done.

For my part, we wanted to have something sitting on the client machine (a windows machine that the user is using), which will inspect all network traffic coming and going, and generate some reports to be sent up to the cloud.  Keep in mind, this is all consented activity, the employee gets to opt in to being monitored in this way.  All in the open and up front.

At the lowest level, my first inclination was to use a raw socket to create a packet sniffer, but Windows has a much better solution these days, built for exactly this purpose.  The Windows Filter Platform, allows you to create a ‘filter’ which you can configure to callout to a function whenever there is traffic.  My close teammate implemented that piece, and suddenly we had a handle on network packets.

We fairly quickly decided on an interface between that low level packet sniffing, and the higher level processor.  It’s as easy as this:

 

int WriteBytes(char *buff, int bufflen);
int ReadBytes(char *buff, int bufflen, int &bytesRead);

I’m paraphrasing a bit, but it really is that simple. What’s it do? Well, the fairly raw network packets are sent into ‘WriteBytes’, some processing is done, and a ‘report’ becomes available through ‘ReadBytes’. The reports are a JSON formatted string which then gets turned into the appropriate thing to be sent up to the cloud.

The time it took from hearing about the basic product idea, to a prototype of this thing was about 3 weeks.

What do I do once I get network packets? Well, the network packets represent a multiplexed stream of packets, as if I were a NIC. All incoming, outgoing, all TCP ports. Once I receive some bytes, I have to turn it back into individual streams, then start doing some ‘parsing’. Right now we handle http and TLS. For http, I do full http parsing, separating out headers, reading bodies, and the like. I did that by leveraging the http parsing work I had done for TINN already. I used C++ in this case, but it’s all relatively the same.

TLS is a different story. At this ‘discovery’ phase, it was more about simple parsing. So, reading the record layer, decoding client_hello and server_hello, certificate, and the like. This gave me a chance to implement TLS processing using C++ instead of Lua. One of the core components that I leveraged was the byte order aware streams that I had developed for TINN. That really is the crux of most network protocol handling. If you can make herds or tails of what the various RFCs are saying, it usually comes down to doing some simple serialization, but getting the byte ordering is the hardest part. 24-bit big endian integers?

At any rate, http parsing, fairly quick. TLS client_hello, fast enough, although properly handling the extensions took a bit of time. At this point, we’d be a couple months in, and our first partners get to start kicking the tires.

For such a project, it’s very critical that real world customers are involved really early, almost sitting in our design meetings. They course corrected us, and told us what was truly important and annoying about what we were doing, right from day one.

From the feedback, it becomes clear that getting more information, like the amount of traffic flowing through the pipes is as interesting as the meta information, so getting the full support for flows becomes a higher priority. For the regular http traffic, no problem. The TLS becomes a bit more interesting. In order to deal with that correctly, it becomes necessary to suck in more of the TLS implementation. Read the server_hello, and the certificate information. Well, if you’re going to read in the cert, you might as well get the subject common name out so you can use that bit of meta information. Now comes ASN.1 (DER) parsing, and x509 parsing. That code took about 2 weeks, working “nights and weekends” while the other stuff was going on. It took a good couple of weeks not to integrate, but to write enough test cases, with real live data, to ensure that it was actually working correctly.

The last month was largely a lot of testing, making sure corner cases were dealt with and the like. As the client code is actually deployed to a bunch of machines, it really needed to be rock solid, no memory leaks, no excessive resource utilization, no CPU spiking, just unobtrusive, quietly getting the job done.

So, that’s what it does.

Now, I’ve shipped at Microsoft for numerous years. The fastest cycles I’ve usually dealt with are on the order of 3 months. That’s usually for a product that’s fairly mature, has plenty of engineering system support, and a well laid out roadmap. Really you’re just turning the crank on an already laid out plan.

This AppDiscovery project has been a bit different. It did not start out with a plan that had a 6 month planning cycle in front of it. It was a hunch that we could deliver customer value by implementing something that was challenging enough, but achievable, in a short amount of time.

So, how is this different than Microsoft of yore? Well, yes, we’ve always been ‘customer focused’, but this is to the extreme. I’ve never had customers this involved in what I was doing this early in the development cycle. I mean literally, before the first prototypical bits are even dry, the PM team is pounding on the door asking “when can I give it to the customers?”. That’s a great feeling actually.

The second thing is how much process we allowed ourselves to use. Recognizing that it’s a first run, and recognizing that customers might actually say “mehh, not interested”, it doesn’t make sense to spin up the classic development cycle which is meant to maintain a product for 10-14 years. A much more streamlined lifecycle which favors delivering quality code and getting customer feedback, is what we employed. If it turns out that customers really like the product, then there’s room to fit the cycle to a cycle that is more appropriate for longer term support.

The last thing that’s special is the amount of leveraging Open Source we are allowing ourselves these days. Microsoft has gone full tilt on OpenSource support. I didn’t personally end up using much myself, but we are free to use it elsewhere (with some legal guidelines). This is encouraging, because for crypto, I’m looking forward to using things like SipHash, and ChaCha20, which don’t come natively with the Microsoft platform.

Overall, as Microsoft continues to evolve and deliver ‘customer centric’ stuff, I’m pretty excited and encouraged that we’ll be able to use this same model time and again to great effect. Microsoft has a lot of smart engineers. Combined with some new directives about meeting customer expectations at the market, we will surely be cranking out some more interesting stuff.

I’ve implemented some interesting stuff while working on this project, some if it I’ll share here.


Microsoft Part II

I joined Microsoft in 1998 to work on MSXML. One of the reasons I joined way back then is because MS was in trouble with the DOJ, and competitors were getting more interesting. I thought “They’re either going down, or they’re going to resurge, either way, it will be a fun ride”.

Here it is, more than 15 years later, and I find my sentiment about the same. Microsoft has been in trouble the past few years. Missing a few trends, losing our way, catching our breath as our competitors run farther and faster ahead of us…

In the past 4 years, I’ve been associated with the rise of Azure, and most recently associated with our various identity services. In the past couple of months, I’ve been heads down working in an internal startup, which is about to deliver bits to the web. That’s 2 months from conception to delivery of a public preview of a product. That’s fairly unheard of for our giant company.

But, today, I saw a blizzard of news that made me think ye olde company has some life yet left in it.

The strictly Microsoft related news…
Windows Azure Active Directory Premium
C# Goes Open Source
TypeScript goes 1.0
Windows 8.1 is FREE for devices less than 9″!!

Of all of these, I think the Windows 8.1 going for free is probably the most impactful from a ‘game changer’ perspective. Android is everywhere, probably largely because it is ‘free’. I can’t spit in the wind without hitting a new micro device that runs Android, and doesn’t run Windows. Perhaps this will begin to change somewhat.

Then there’s peripheral news like…
intel Galileo board ($99) is fully programmable from Visual Studio
Novena laptop goes for crowd funding

The Novena laptop is very interesting because it’s a substantial offering created by a couple of hardcore engineers. It is clearly a MUST HAVE machine for any self respecting hard/software hacker. It’s not the most powerful laptop in the world, and that’s beside the point. What it does represent is that some good engineers, hooked up with a solid supply chain, can produce goods that are almost price competitive with commodity goods. That and the fact that this is just an extraordinary hack machine.

I find the Galileo interesting because other than some third party support for Arduino programming from MSVC, this is a serious support drive for small things, from Microsoft. Given the previous news about the ‘free’, this Galileo support bodes well. You could conceivably get a $99 ‘computer’ with some form of Windows OS, and use it at the heart of your robot, quadcopter, art display, home automation thing…

Of course, the rest of the tinker market is heading even lower priced with things like the Teensy 3.1 at around $20. No “OS” per se, but surely capable hardware that could benefit from a nicely integrated programming environment and support from Microsoft. But, you don’t want Windows on such a device. You want to leverage some core technologies that Microsoft has in-house, and just apply it in various places. Wouldn’t it be great if all of Microsoft’s internal software was made available as installable packages…

Then there’s the whole ‘internet of things’ angle. Microsoft actually has a bunch of people focused in this space, but there’s no public offerings as yet. We’re Microsoft though, so you can imagine what the outcomes might look like. Just imagine lots of tiny little devices all tied to Microsoft services in some way, including good identities and all that.

Out on the fringe, non-Microsoft, there is Tessel.io, with their latest board back from manufacturing. I micro controller that runs node.js (and typescript for that matter), which is WiFi connected. That is bound to have a profound impact for those who are doing quick and dirty web connected physical computing.

Having spent the past few weeks coding in C++, I have been feeling the weight of years of piled on language cruft. I’ve been longing for the simplicity of the Lua language, and my beloved TINN, but that will just have to wait a few more weeks. In the meanwhile, I did purchase a Mojo FPGA board, in the hopes that I will once again get into FPGA programming, because “hardware is the new software”.

At the end of the day, I am as excited about the prospects of working at Microsoft as I was in 1998. My enthusiasm isn’t constrained by the possibilities of what Microsoft itself might do, rather I am overjoyed at the pace of development and innovation across the industry. There are new frontiers opening up all the time. New markets to explore, new waves to catch. It’s not all about desktops, browsers, office suites, search engines, phones, and tablets. Every day, there’s a new possibility, and the potential for a new application. Throw in 3D printing, instant manufacturing, and a smattering of BitCoin, and we’re living in a braver new world every day!!


Revisiting C++

I was a C++ expert twice in the past. The first time around was because I was doing some work for Taligent, and their whole operating system was written in C++. With that system I got knee deep into the finer details of templates, and exceptions, to a degree that will likely never be seen on the planet earth.

The second time around, was because I was programming on the BeOS. Not quite as crazy as the Taligent experience, but C/C++ were all the rage.

Then I drifted into Microsoft, and C# was born. For the past 15 years, it’s been a slow rise to dominance with C# in certain quarters of Microsoft. It just so happens that this corresponds to the rise of the virus attacks on Windows, as well as the shift in programming skills of college graduates. In the early days of spectacular virus attacks, you could attribute most of them to buffer overruns, which allowed code to run on the stack. This was fairly easily plugged by C#, and security coding standards.

Today, I am working on a project where once again I am learning C++. This time around it’s C++ 11, which is decidedly more mature than the C++ I learned while working on Taligent. It’s not so dramatically different as say the difference between Lisp and Cobol, but it gained a lot of stuff over the years.

I thought I would jot down some of the surface differences I have noticed since I’ve been away.

First, to compare C++ to Lua, there are some surface differences. Most of the languages I program in today have their roots in Algol, so they largely look the same. But, there are some simple dialect differences. C++ is full of curly braces ‘{}’, semi-colons ‘;’, and parenthesis ‘()’. Oh my god with the parens and semis!! With Lua, parens are optional, semis are optional, and instead of curlies, there are ‘do’, ‘end’, or simply ‘end’. For loops are different, array indices are different (unless you’re doing interop with the FFI), and do/while is repeat/until.

These are all minor differences, like say the differences between Portuguese and Spanish. You can still understand the other if you speak one. Perhaps not perfectly, but there is a relatively easy translation path.

Often times in language wars, these are the superficial differences that people talk about. Meh, not interesting enough to drive me one way or another.

But then, there’s this other stuff, which is truly the essence of the differences. Strong typing/duck typing, managed memory, dynamic code execution. I say ‘Lua’ here, but really that could be a standin for C#, node.js, Python, or Ruby. Basically, there are a set of modern languages which exhibit a similar set of features which are different enough from C/C++ that there is a difference in the programming models.

To illustrate, here’s a bit of C++ code that I have written recently. The setup is this, I receive a packet of data, typically the beginning of a HTTP conversation. From that packet of data, I must be able to ‘parse’ the thing, determine whether it is http/https, pull out headers, etc. I need to build a series of in-place parsers, which keep the amount of memory allocated to a minimum, and work fairly quickly. So, the first piece is this thing called a AShard_t:

#pragma once

#include "anl_conf.h"

class  DllExport AShard_t  {
public:
	uint8_t *	m_Data;
	size_t	m_Length;
	size_t	m_Offset;

	// Constructors
	AShard_t();
	AShard_t(const char *);
	AShard_t(uint8_t *data, size_t length, size_t offset);

	// Virtual Destructor
	virtual ~AShard_t() {};

	// type cast
	operator uint8_t *() {return getData();}

	// Operator Overloads
	AShard_t & operator= (const AShard_t & rhs);

	// Properties
	uint8_t *	getData() {return &m_Data[m_Offset];};
	size_t		getLength() {return m_Length;};

	// Member functions
	AShard_t &	clear();
	AShard_t &	first(AShard_t &front, AShard_t &rest, uint8_t delim) const;
	bool		indexOfChar(const uint8_t achar, size_t &idx) const;
	bool		indexOfShard(const AShard_t &target, size_t &idx);
	bool 		isEmpty() const;
	void		print() const;
	bool		rebase();
	char *		tostringz() const;
	AShard_t &	trimfrontspace();

};

OK, so it’s actually a fairly simple data structure. Assuming you have a buffer of data, a shard is just a pointer into that buffer. It contains the pointer, an offset, and a length. You might say that the pointer/offset combo is redundant, you probably don’t need both. The offset could be eliminated, assuming the pointer is always at the base of the structure. But, there might be a design choice that makes this useful later.

At any rate, there’s a lot going on here for such a simple class. First of all, there’s that ‘#pragma once’ at the top. Ah yes, good ol’ C preprocessor, needs to be told not to load stuff it’s already loaded before. There’s there’s class vs struct, not to be confused with ‘typedef struct’. Public/Protected/Private, copy constructor or ‘operator=’. And heaven forbid you forget to make a public default constructor. You will not be able to create an array of these things without it!

These are not mere dialectual differences, these are the differences between Spanish and Hungarian. You MUST know about the default constructor thing, or things just won’t work.

As far as implementation is concerned, I did a mix of things here, primarily because the class is so small. I’ve inserted some simple “string” processing right into the class, because I found them to be constantly useful. ‘first’, ‘indexOfChar’, and ‘indexOfShard’ turn out to be fairly handy when you’re trying to parse through something in place. ‘first’ is like in Lisp, get the first element off the list of elements. In this case you can specify a single character delimiter. ‘indexOfChar’, is like strchr() function from C, except in this case it’s aware of the length, and it doesn’t assume a ‘null’ terminated string. ‘indexOfShard’ is like ‘strstr’, or ‘strpbrk’. With these in hand, you can do a lot of ‘tokenizing’.

Here’s an example of parsing a URL:

bool parseUrl(const AShard_t &uriShard)
{
  AShard_t shard = uriShard;
  AShard_t rest;
	
  AShard_t scheme;
  AShard_t url;
  AShard_t authority;
  AShard_t hostname;
  AShard_t port;
  AShard_t resquery;
  AShard_t resource;
  AShard_t query;

  // http:
  shard.first(scheme, rest, ':');

  // the 'rest' represents the resource, which 
  // includes the authority + query
  // so try and separate authority from query if the 
  // query part exists
  shard = rest;
  // skip past the '//'
  shard.m_Offset += 2;
  shard.m_Length -= 2;

  // Now we have the url separated from the scheme
  url = shard;

  // separate the authority from the resource based on '/'
  url.first(authority, rest, '/');
  resquery = rest;

  // Break the authority into host and port
  authority.first(hostname, rest, ':');
  port = rest;

  // Back to the resource.  Split it into resource/query
  parseResourceQuery(resquery, resource, query);


  // Print the shards
  printf("URI: "); uriShard.print();
  printf("  Scheme: "); scheme.print();
  printf("  URL: "); url.print();
  printf("    Authority: "); authority.print();
  printf("      Hostname: "); hostname.print();
  printf("      Port: "); port.print();
  printf("    Resquery: "); resquery.print();
  printf("      Resource: "); resource.print();
  printf("      Query: "); query.print();
  printf("\n");

  return true;
}

AShard_t url0("http://www.sharmin.com:8080/resources/gifs/bunny.gif?user=willynilly&password=funnybunny");
parseUrl(url0);

Of course, I’m leaving out error checking, but even for this simple tokenization, it’s fairly robust because in most cases, if a ‘first’ fails, you’ll just gen an empty ‘rest’, but definitely not a crash.

So, how does this fair against my beloved LuaJIT? Well, at this level things are about the same. In Lua, I could create exactly the same structure, using a table, and perform exactly the same operations. Only, if I wanted to do it without using the ffi, I’d have to stuff the data into a Lua string object (which causes a copy), then use the lua string.char, count from 1, etc. totally doable, and probably fairly optimized. There is a bit of a waste though because in Lua, everything interesting is represented by a table, so that’s a much bigger data structure than this simple AShard_t. It’s bigger in terms of memory footprint, and it’s probably slower in execution because it’s a generalized data structure that can serve many wonderful purposes.

For memory management, at this level of structure, things are relatively easy. Since the shard does not copy the data, it doesn’t actually do any allocations, so there’s relatively little to cleanup. The most common use case for shards is that they’ll either be stack based, or they’ll be stuffed into a data structure. In either case, their lifetime is fairly short and well managed, so memory management isn’t a big issue. If they are dynamically allocated, then of course there’s something to be concerned with.

Well, that touches the ice berg. I’ve re-attached to C++, and so far the gag reflex hasn’t driven me insane, so I guess it’s ok to continue.

Next, I’ll explore how insanely great the world becomes when shards roam the earth.


Device Iteration with Functional Programming

One of the great pleasures I have in life is learning something new. There’s nothing greater than those ‘light bulb goes on’ moments as you realize something and gain a much deeper understanding than you had before.

Well, a little while ago, there was an announcement of this thing called Lua Fun.  Lua Fun is a large set of functions which make functional programming in Lua really easy.  It has the usual suspects such as map, reduce, filter, each, etc.  If you read the documentation, you get a really good understanding of how iterators work in Lua, and more importantly, how LuaJIT is able to fold and manipulate things in hot loops such that the produced code is much tighter than anything I could possibly write in C, or any other language I so happen to use.

So, now I’m a fan of Lua Fun, and I would encourage anyone who’s both into Lua, and functional programming to take a look.

How to use it?  I’ve been enumerating various types of things in the Windows system of late.  Using the MMDevice subsystem, I was able to get an enumeration of the audio devices (that took a lot of COM work).  What about displays, and disk drives, and USB devices, and…  Yes, each one of those things has an attendant API which will facilitate monitoring said category.  But, is there one to rule them all?  Well yes, as it turns out, in most cases what these various APIs are doing is getting information out of the System Registry, and just presenting it reasonably.

There is a way to enumerate all the devices in the system.  You know, like when you bring up the Device Manager in Windows, and you see a tree of devices, and their various details.  The stage is set, how do you do that?  I created a simple object that does the grunt work of enumerating the devices in the system.

The DeviceRecordSet is essentially a query. Creating an instance of the object just gives you a handle onto making query requests. Here is the code:

local ffi = require("ffi")
local bit = require("bit")
local bor = bit.bor;
local band = bit.band;

local errorhandling = require("core_errorhandling_l1_1_1");
local SetupApi = require("SetupApi")
local WinNT = require("WinNT")


local DeviceRecordSet = {}
setmetatable(DeviceRecordSet, {
	__call = function(self, ...)
		return self:create(...)
	end,
})

local DeviceRecordSet_mt = {
	__index = DeviceRecordSet,
}


function DeviceRecordSet.init(self, rawhandle)
	print("init: ", rawhandle)

	local obj = {
		Handle = rawhandle,
	}
	setmetatable(obj, DeviceRecordSet_mt)

	return obj;
end

function DeviceRecordSet.create(self, Flags)
	Flags = Flags or bor(ffi.C.DIGCF_PRESENT, ffi.C.DIGCF_ALLCLASSES)

	local rawhandle = SetupApi.SetupDiGetClassDevs(
		nil, 
        nil, 
        nil, 
        Flags);

	if rawhandle == nil then
		return nil, errorhandling.GetLastError();
	end

	return self:init(rawhandle)
end

function DeviceRecordSet.getNativeHandle(self)
	return self.Handle;
end

function DeviceRecordSet.getRegistryValue(self, key, idx)
	idx = idx or 0;

	did = ffi.new("SP_DEVINFO_DATA")
	did.cbSize = ffi.sizeof("SP_DEVINFO_DATA");

--print("HANDLE: ", self.Handle)
	local res = SetupApi.SetupDiEnumDeviceInfo(self.Handle,idx,did)

	if res == 0 then
		local err = errorhandling.GetLastError()
		--print("after SetupDiEnumDeviceInfo, ERROR: ", err)
		return nil, err;
	end

	local regDataType = ffi.new("DWORD[1]")
	local pbuffersize = ffi.new("DWORD[1]",260);
	local buffer = ffi.new("char[260]")

	local res = SetupApi.SetupDiGetDeviceRegistryProperty(
            self:getNativeHandle(),
            did,
			key,
			regDataType,
            buffer,
            pbuffersize[0],
            pbuffersize);

	if res == 0 then
		local err = errorhandling.GetLastError();
		--print("after GetDeviceRegistryProperty, ERROR: ", err)
		return nil, err;
	end

	--print("TYPE: ", regDataType[0])
	if (regDataType[0] == 1) or (regDataType[0] == 7) then
		return ffi.string(buffer, pbuffersize[0]-1)
	elseif regDataType[0] == ffi.C.REG_DWORD_LITTLE_ENDIAN then
		return ffi.cast("DWORD *", buffer)[0]
	end

	return nil;
end


function DeviceRecordSet.devices(self, fields)
	fields = fields or {
		{ffi.C.SPDRP_DEVICEDESC, "description"},
		{ffi.C.SPDRP_MFG, "manufacturer"},
		{ffi.C.SPDRP_DEVTYPE, "devicetype"},
		{ffi.C.SPDRP_CLASS, "class"},
		{ffi.C.SPDRP_ENUMERATOR_NAME, "enumerator"},
		{ffi.C.SPDRP_FRIENDLYNAME, "friendlyname"},
		{ffi.C.SPDRP_LOCATION_INFORMATION , "locationinfo"},
		{ffi.C.SPDRP_LOCATION_PATHS, "locationpaths"},
		{ffi.C.SPDRP_PHYSICAL_DEVICE_OBJECT_NAME, "objectname"},
		{ffi.C.SPDRP_SERVICE, "service"},
	}

	local function closure(fields, idx)
		local res = {}

		local count = 0;
		for _it, field in ipairs(fields) do
			local value, err = self:getRegistryValue(field[1], idx)
			if value then
				count = count + 1;
				res[field[2]] = value;
			end
		end

		if count == 0 then
			return nil;
		end
				
		return idx+1, res;
	end

	return closure, fields, 0
end

return DeviceRecordSet

The ‘getRegistryValue()’ function is the real workhorse of this object. That’s what gets your values out of the system registry. The other function of importance is ‘devices()’. This is an iterator.

There are a couple of things of note about this iterator. First of all, it does not require ‘up values’ to be held onto. All that means is that everything the iterator needs to operate is carried in the return values from the function. The ‘state’ if you will, is handed in fresh every time the ‘closure()’ is called. This is the key to creating an iterator that will work well with Lua Fun.

By default, this iterator will return quite a few (but not all) fields related to each object, and it will return all the objects. This is ok, because there are typically less than 150 objects in any given system.

Now, I want to do various queries against this set without much fuss. This is where Lua Fun, and functional programming in general, really shines.

First, a little setup:

--test_enumdevices.lua
local ffi = require("ffi")
local DeviceRecordSet = require("DeviceRecordSet")
local serpent = require("serpent")
local Functor = require("Functor")

local fun = require("fun")()
local drs = DeviceRecordSet();

local function printIt(record)
	print("==========")
	each(print, record)
	print("----------")
end

This creates an instance of the DeviceRecordSet object, which will be used in the queries. Already the printIt() function is utilizing Lua Fun. The ‘each()’ function will take whatever it’s handed, and perform the function specified. In this case, the ‘record’ will be a table. So, each will iterate over the table entries and print each one of them out. This is the equivalent of doing:

for k,v in pairs(record)
print(k, v)
end

I think that simply typing ‘each’ is a lot simpler and pretty easy to understand.

How about a query then?

-- show everything for every device
each(printIt, drs:devices())

In this case, the ‘each’ is applied to the results of the ‘devices()’ iterator. For each record coming from the devices iterator, the printIt function will be called, which will in turn print out all the values in the record. That’s pretty nice.

What if I don’t want to see all the fields in the record, I just want to see the objectname, and description fields. Well, this is a ‘map’ operation, or a projection in database parlance, so:

-- do a projection on the fields
local function projection(x)
  return {objectname = x.objectname, description = x.description}
end
each(printIt, map(projection, drs:devices()))

Working from the inside out, for each record coming from the devices() iterator, call the ‘projection’ function. The return value from the projection function becomes the new record for this iteration. For each of those records, call the printIt function.

Using ‘map’ is great as you can reshape data in any way you like without much fuss.

Lastly, I want to see only the records that are related to “STORAGE”, so…

-- show only certain records
local function enumeratorFilter(x)
	return x.enumerator == "STORAGE"
end

each(printIt, filter(enumeratorFilter, drs:devices()))

Here, the ‘filter’ iterator is used. So, again, for each of the records coming from the ‘devices()’ enumerator, call the ‘enumeratorFilter’ function. If this function returns ‘true’ for the record, then it is passed along as the next record for the ‘each’. If ‘false’, then it is skipped, and the next record is tried.

This is pretty powerful, and yet simple stuff. The fact that iterators create new iterators, in tight loops, makes for some very dense and efficient code. If you’re interested in why this is so special in LuaJIT, and not many other languages, read up on the Lua Fun documentation.

I’ve killed two birds with one stone. I have finally gotten to the root of all device iterators. I have also learned how to best write iterators that can be used in a functional programming way. Judicious usage of this mechanism will surely make a lot of my code more compact and readable, as well as highly performant.

 


Asynchronous DNS lookups on Windows

I began this particular journey because I wanted to do DNS lookups asynchronously on Windows. There is of course a function for that:

DnsQueryEx

The problem I ran into is that unlike the various other Windows functions I’ve done with async, this one does not use an IO Completion Port. Instead it uses a mechanism called APC (Asynchronouse Procedure Call). With this little bit of magic, you pass in a pointer to a function which will be called in your thread context, kind of in between when other things are happening. Well, given the runtime environment I’m using, I don’t think this quite works out. Basically, I’d have a function being called leaving the VM in an unknown state.

So, I got to digging. I figured, how hard can it be to make calls to a DNS server directly? After all, it is nothing more than a network based service with a well known protocol. Once I could make a straight networking call, then I could go back to leveraging IO Completion Ports just like I do for all other IO.

You can view the DNS system as nothing more than a database to which you pose queries. You express your queries using some nice well defined protocol, which is ancient in origin, and fairly effective considering how frequently DNS queries are issued. Although I could man up and write the queries from scratch, Windows helps me here by providing functions that will format the query into a buffer for me.

But, before I get into that, what do the queries look like? What am I looking up? Well, a Domain Name Server serves up translations of names to other names and numbers. For example, I need to find the IP address of http://www.bing.com. I can look for CNAMES (an alias), or ‘A’ records (direct to an IP address. This gets esoteric and confusing, so a little code can help:

-- Prepare the DNS request
local dwBuffSize = ffi.new("DWORD[1]", 2048);
local buff = ffi.new("uint8_t[2048]")

local wID = clock:GetCurrentTicks() % 65536;
        
local res = windns_ffi.DnsWriteQuestionToBuffer_UTF8( 
  ffi.cast("DNS_MESSAGE_BUFFER*",buff), 
  dwBuffSize, 
  ffi.cast("char *",strToQuery), 
  wType, 
  wID, 
  true )

DnsWriteQuestionToBuffer_UTF8 is the Windows function which helps me to write a DNS query into a buffer, which will then be send to the actual dns server.

wType, represents the type of record you want to be returned. The values might be something like:

wType = ffi.C.DNS_TYPE_A
wType = ffi.C.DNS_TYPE_MX  - mail records
wType = ffi.C.DNS_TYPE_CNAME

There are about a hundred different types that you can query for. The vast majority of the time though, you either looking for ‘A’, or ‘CNAME’ records.

The wID is just a unique identifier for the particular query so that if you’re issuing several on the same channel, you can check the response to ensure they match up.

OK. Now I have a DNS query stuffed into a buffer, how do I make the query and get the results?

-- Send the request.
local IPPORT_DNS = 53;
local remoteAddr = sockaddr_in(IPPORT_DNS, AF_INET);
remoteAddr.sin_addr.S_addr = ws2_32.inet_addr( "209.244.0.3");

-- create the UDP socket
local socket, err = NativeSocket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

-- send the query
local iRes, err = socket:sendTo(
  ServerAddress, ffi.sizeof(ServerAddress), 
  buff, dwBuffSize[0]);

This little bit of code shows the socket creation, and the actual ‘sendTo’ call. Of note, the “209.244.0.3” represents the IP address of a well known public DNS server. In this case it is hosted by Level 3, which is a internet services provider. There are of course calls you can make to figure out which DNS server your machine is typically configured to use, but this way the query will always work, no matter which network you are on.

Notice the socket is a UDP socket.

At this point, we’re already running cooperatively due to the fact that within TINN, all IO is done cooperatively, without the programmer needing to do much special.

Now to receive the query response back:

   -- Try to receive the results
    local RecvFromAddr = sockaddr_in();
    local RecvFromAddrSize = ffi.sizeof(RecvFromAddr);
    local cbReceived, err = self.Socket:receiveFrom(RecvFromAddr, RecvFromAddrSize, buff, 2048);

Basically just wait for the server to send back a response. Of course, like the sendTo, the receiveFrom works cooperatively, so that if the developer issues several ‘spawn’ commands, each query could be running in its own task, working cooperatively.

Once you have the response, you can parse out the results. The results come back as a set of records. There are of course functions which will help you to parse these records out. The key here is that the record type is indicated, and its up to the developer to pull out the relevant information.

The complete DNSNameServer class is here:

local ffi = require("ffi")

local Application = require("Application")
local windns_ffi = require("windns_ffi")
local NativeSocket = require("NativeSocket")
local ws2_32 = require("ws2_32")
local Stopwatch = require("StopWatch")

local clock = Stopwatch();

-- DNS UDP port
local IPPORT_DNS = 53;

local DNSNameServer = {}
setmetatable(DNSNameServer, {
    __call = function(self, ...)
        return self:create(...)
    end,
})
local DNSNameServer_mt = {
    __index = DNSNameServer,
}

function DNSNameServer.init(self, serveraddress)
    local socket, err = NativeSocket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

    if not socket then
        return nil, err
    end

    local obj = {
        Socket = socket,
        ServerAddress = serveraddress,
    }
    setmetatable(obj, DNSNameServer_mt)

    return obj;
end

function DNSNameServer.create(self, servername)
    local remoteAddr = sockaddr_in(IPPORT_DNS, AF_INET);
    remoteAddr.sin_addr.S_addr = ws2_32.inet_addr( servername );

    return self:init(remoteAddr)
end

-- Construct DNS_TYPE_A request, send it to the specified DNS server, wait for the reply.
function DNSNameServer.Query(self, strToQuery, wType, msTimeout)
    wType = wType or ffi.C.DNS_TYPE_A
    msTimeout = msTimeout or 60 * 1000  -- 1 minute


    -- Prepare the DNS request
    local dwBuffSize = ffi.new("DWORD[1]", 2048);
    local buff = ffi.new("uint8_t[2048]")

    local wID = clock:GetCurrentTicks() % 65536;
        
    local res = windns_ffi.DnsWriteQuestionToBuffer_UTF8( ffi.cast("DNS_MESSAGE_BUFFER*",buff), dwBuffSize, ffi.cast("char *",strToQuery), wType, wID, true )

    if res == 0 then
        return false, "DnsWriteQuestionToBuffer_UTF8 failed."
    end

    -- Send the request.
    local iRes, err = self.Socket:sendTo(self.ServerAddress, ffi.sizeof(self.ServerAddress), buff, dwBuffSize[0]);
    

    if (not iRes) then
        print("Error sending data: ", err)
        return false, err
    end

    -- Try to receive the results
    local RecvFromAddr = sockaddr_in();
    local RecvFromAddrSize = ffi.sizeof(RecvFromAddr);
    local cbReceived, err = self.Socket:receiveFrom(RecvFromAddr, RecvFromAddrSize, buff, 2048);

    if not cbReceived then
        print("Error Receiving Data: ", err)
        return false, err;
    end

    if( 0 == cbReceived ) then
        return false, "Nothing received"
    end

    -- Parse the DNS response received with DNS API
    local pDnsResponseBuff = ffi.cast("DNS_MESSAGE_BUFFER*", buff);
    windns_ffi.DNS_BYTE_FLIP_HEADER_COUNTS ( pDnsResponseBuff.MessageHead );

    if pDnsResponseBuff.MessageHead.Xid ~= wID then        
        return false, "wrong transaction ID"
    end

    local pRecord = ffi.new("DNS_RECORD *[1]",nil);

    iRes = windns_ffi.DnsExtractRecordsFromMessage_W( pDnsResponseBuff, cbReceived, pRecord );
    
    pRecord = pRecord[0];
    local pRecordA = ffi.cast("DNS_RECORD *", pRecord);
    
    local function closure()
        if pRecordA == nil then
            return nil;
        end

        if pRecordA.wType == wType then
            local retVal = pRecordA
            pRecordA = pRecordA.pNext

            return retVal;
        end

        -- Find the next record of the specified type
        repeat
            pRecordA = pRecordA.pNext;
        until pRecordA == nil or pRecordA.wType == wType
    
        if pRecordA ~= nil then
            local retVal = pRecordA
            pRecordA = pRecordA.pNext
            
            return retVal;
        end

        -- Free the resources
        if pRecord ~= nil then
            windns_ffi.DnsRecordListFree( pRecord, ffi.C.DnsFreeRecordList );
        end 

        return nil
    end

    return closure
end

function DNSNameServer.A(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_A) end
function DNSNameServer.MX(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_MX) end
function DNSNameServer.CNAME(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_CNAME) end
function DNSNameServer.SRV(self, domainToQuery) return self:Query(domainToQuery, ffi.C.DNS_TYPE_SRV) end

return DNSNameServer

Notice at the end there are some convenience functions for a few of the well known DNS record types. The ‘Query()’ function is generic, and will return records of any type. These convenience functions just make it easier.

And how to use it?

local ffi = require("ffi")
local DNSNameServer = require("DNSNameServer")
local core_string = require("core_string_l1_1_0")


--local serveraddress = "10.211.55.1"		-- xfinity
local serveraddress = "209.244.0.3" -- level 3

local domains = {
	"www.nanotechstyles.com",
	"www.adafruit.com",
	"adafruit.com",
	"adamation.com",
	"www.adamation.com",
	"microsoft.com",
	"google.com",
	"ibm.com",
	"oracle.com",
	"sparkfun.com",
	"apple.com",
	"netflix.com",
	"www.netflix.com",
	"www.us-west-2.netflix.com",
	"www.us-west-2.prodaa.netflix.com",
	"news.com",
	"hardkernel.org",
	"amazon.com",
	"walmart.com",
	"target.com",
	"godaddy.com",
	"luajit.org",
}



local function queryA()
	local function queryDomain(name)
		local dns = DNSNameServer(serveraddress) -- ms corporate
		print("==== DNS A ====> ", name)
		for record in dns:A(name) do
			local a = IN_ADDR();
    		a.S_addr = record.Data.A.IpAddress

    		print(string.format("name: %s\tIP: %s, TTL %d", name, a, record.dwTtl));
		end
	end

	for _, name in ipairs(domains) do 
		spawn(queryDomain, name)
		--queryDomain(name)
	end
end

local function queryCNAME()
	local dns = DNSNameServer(serveraddress) -- ms corporate
	local function queryDomain(name)
		print("==== DNS CNAME ====> ", name)
		for record in dns:CNAME(name) do
			print(core_string.toAnsi(record.pName), core_string.toAnsi(record.Data.CNAME.pNameHost))
		end
	end

	for _, name in ipairs(domains) do 
		queryDomain(name)
	end
end

local function queryMX()
	local function queryDomain(name)
		local dns = DNSNameServer(serveraddress) -- ms corporate
		print("==== DNS MX ====> ", name)
		for record in dns:MX(name) do
			print(core_string.toAnsi(record.pName), core_string.toAnsi(record.Data["MX"].pNameExchange))
		end
	end

	for _, name in ipairs(domains) do 
		spawn(queryDomain, name)
	end
end

local function querySRV()
	local dns = DNSNameServer(serveraddress) -- ms corporate
	for _, name in ipairs(domains) do 
		print("==== DNS SRV ====> ", name)
		for record in dns:SRV(name) do
			print(core_string.toAnsi(record.pName), core_string.toAnsi(record.Data.SRV.pNameTarget))
		end
	end
end

local function main()
  queryA();
  --queryCNAME();
  --queryMX();
  --querySRV();
end

run(main)

The function queryA() will query for the ‘A’ records, and print them out. Notice that it has knowledge of the giant union structure that contains the results, and it pulls out the specific information for ‘A’ records. It will create a new instance of the DNSNameServer for each query. That’s not as bad as it might seem. All it amounts to is creating a new UDP socket for each query. Since each query is spawned into its own task, they are all free to run and complete independently, which was the goal of this little exercise.

In the case of the CNAME query, there is only a single socket, and it is used repeatedly, serially, for each query.

The difference between the two styles is noticeable. For the serial case, the queries might ‘take a while’, because you have to wait for each result to come back before issuing the next query. In the cooperative case, you issue several queries in parallel, so the total time will only be as long as the longest query.

That’s a good outcome.

I like this style of programming. You go as low as you can to root out where the system might otherwise block, and you make that part cooperative. That way everything else above it is automatically cooperative. I also like the fact that it feels like I’m getting some parallelism, but I’m not using any of the typical primitives of parallelism, including actual threads, mutexes, and the like.

Well, that’s a hefty bit of code, and it serves the purpose I set out, so I’m a happy camper. Now, if I could just turn those unions into tables automatically…