TINN Reboot

I always dread writing posts that start with “it’s been a long time since…”, but here it is.

It’s been a long time since I did anything with TINN.  I didn’t actually abandon it, I just put it on the back burner as I was writing a bunch of code in C/C++ over the past year.  I did do quite a lot of experimental stuff in TINN, adding new interfaces, trying out new classes, creating a better coroutine experience.

The thing with software is, a lot of testing is required to ensure things actually work as expected and fail gracefully when they don’t.  Some things I took from the ‘experimental’ category are:

fun.lua – A library of functional routines specifically built for LuaJIT and it’s great handling of tail recursion.

msiterators.lua – Some handy iterators that split out some very MS specific string types

Now that msiterators is part of the core, it makes it much easier to do things like query the system registry and get the list of devices, or batteries, or whatever, in a simple table form.  That opens up some of the other little experiments, like enumerating batteries, monitors, and whatnot, which I can add in later.

There are not earth shattering, and don’t represent a year’s worth of waiting, but soon enough I’ll create a new package with new goodness in it.  This begs the question, what is TINN useful for?  I originally created it for the purpose of doing network programming, like you could do with node.  Then it turned into a way of doing Windows programming in general.  Since TINN provides scripted access to almost all the interesting low level APIs that are in Windows, it’s very handy for trying out how an API works, and whether it is good for a particular need.

In addition to just giving ready access to low level Windows APIs, it serves as a form of documentation as well.  When I look at a Windows API, it’s not obvious how to handle all the parameters.  Which ones do I allocate, which ones come from the system, which special function do I call when I’m done.  Since I read the docs when I create the interface, the wrapper code encapsulates that reading of the documentation, and thus acts as an encapsulated source of knowledge that’s sitting right there with the code.  Quite handy.

At any rate, TINN is not dead, long live TINN!


ReadFile – The Good, the bad, and the async

If you use various frameworks on any platform, you’re probably an arm’s length away from the nasty little quirks of the underlying operating system.  If you are the creator of such frameworks, the nasty quirks are what you live with on a daily basis.

In TINN, I want to be async from soup to nuts.  All tcp/udp, socket stuff is already that way.  Recently I’ve been adding async support for “file handles”, and let me tell you, you have to be very careful around these things.

In the core windows APIs, in order to read from a file, you do two things.  You first open a file using the CreateFile(), function.  This may be a bit confusing, because why would you use “create” to ‘open’ an existing file?  Well, you have to think of it like a kernel developer might.  From that perspective, what you’re doing is ‘create a file handle’.  While you’re doing this, you can tell the function whether to actually create the file if it doesn’t exist already, open only if it exists, open read-only, etc.

The basic function signature for CreateFile() looks like this:

HANDLE WINAPI CreateFile(
  _In_      LPCTSTR lpFileName,
  _In_      DWORD dwDesiredAccess,
  _In_      DWORD dwShareMode,
  _In_opt_  LPSECURITY_ATTRIBUTES lpSecurityAttributes,
  _In_      DWORD dwCreationDisposition,
  _In_      DWORD dwFlagsAndAttributes,
  _In_opt_  HANDLE hTemplateFile
);

Well, that’s a mouthful, just to get a file handle. But hay, it’s not much more than you’d do in Linux, except it has some extra flags and attributes that you might want to take care of. Here’s where the history of Windows gets in the way. There is a much simpler function “OpenFile()”, which on the surface might do what you want, but beware, it’s a lot less capable, a leftover from the MSDOS days. The documentation is pretty clear about this point “don’t use this, use CreateFile instead…”, but still, you’d have to wade through some documentation to reach this conclusion.

Then, the ReadFile() function has this signature:

BOOL WINAPI ReadFile(
  _In_         HANDLE hFile,
  _Out_        LPVOID lpBuffer,
  _In_         DWORD nNumberOfBytesToRead,
  _Out_opt_    LPDWORD lpNumberOfBytesRead,
  _Inout_opt_  LPOVERLAPPED lpOverlapped
);

Don’t be confused by another function, ReadFileEx(). That one sounds even more modern, but in fact, it does not support the async file reading that I want.

Seems simple enough. Take the handle you got from CreateFile(), and pass it to this function, including a buffer, and you’re done? Well yah, this is where things get really interesting.
Windows supports two forms of IO processing. Async, and synchronous. The Synchronous case is easy. You just make your call, and your thread will be blocked until the IO “completes”. That is certainly easy to uderstand, and if you’re a user of the standard C library, or most other frameworks, this is exactly the behaviour you can expect. Lua, by default, using the standard io library will do exactly this.

The other case is when you want to do async io. That is, you want to initiate the ReadFile() and get an immediate return, and handle the processing of the result later, perhaps with an alert on an io completion port.

Here’s the nasty bit. This same function can be used in both cases, but has very different behavior. It’s a subtle thing. If you doing synchronous, then the kernel will track the fileposition, and automatically update it for you. So, you can do consecutive ReadFile() calls, and read the file contents from beginning to end.

But… When you do things async, the kernel will not track your file pointer. Instead, you must do this on your own! When you do async, you pass in a instance of a OVERLAPPED structure, wich contains things like a pointer to the buffer to be filled, as well as the size of the buffer. This structure also contains things like the offset within the file to read from. By default, the offset is ‘0’, which will have you reading from the beginning of the file every single time.

typedef struct _OVERLAPPED {
    ULONG_PTR Internal;
    ULONG_PTR InternalHigh;
    union {
        struct {
            DWORD Offset;
            DWORD OffsetHigh;
        };

        PVOID Pointer;
    };

    HANDLE hEvent;
} OVERLAPPED, *LPOVERLAPPED;

You have to be very careful and diligent with using this structure, and the proper calling sequences. In addition, if you’re going to do async, you need to call CreateFile() with the appropriate OVERLAPPED flag. In TINN, I have created the NativeFile object, which pretty much deals with all this subtlety. The NativeFile object presents a basic block device interface to the user, and wraps up all that subtlety such that the interface to files is clean and simple.

-- NativeFile.lua

local ffi = require("ffi")
local bit = require("bit")
local bor = bit.bor;

local core_file = require("core_file_l1_2_0");
local errorhandling = require("core_errorhandling_l1_1_1");
local FsHandles = require("FsHandles")
local WinBase = require("WinBase")
local IOOps = require("IOOps")

ffi.cdef[[
typedef struct {
  IOOverlapped OVL;

  // Our specifics
  HANDLE file;
} FileOverlapped;
]]

-- A win32 file interfaces
-- put the standard async stream interface onto a file
local NativeFile={}
setmetatable(NativeFile, {
  __call = function(self, ...)
    return self:create(...);
  end,
})

local NativeFile_mt = {
  __index = NativeFile;
}

NativeFile.init = function(self, rawHandle)
	local obj = {
		Handle = FsHandles.FsHandle(rawHandle);
		Offset = 0;
	}
	setmetatable(obj, NativeFile_mt)

	if IOProcessor then
		IOProcessor:observeIOEvent(obj:getNativeHandle(), obj:getNativeHandle());
	end

	return obj;
end

NativeFile.create = function(self, lpFileName, dwDesiredAccess, dwCreationDisposition, dwShareMode)
	if not lpFileName then
		return nil;
	end
	dwDesiredAccess = dwDesiredAccess or bor(ffi.C.GENERIC_READ, ffi.C.GENERIC_WRITE)
	dwCreationDisposition = dwCreationDisposition or OPEN_ALWAYS;
	dwShareMode = dwShareMode or bor(FILE_SHARE_READ, FILE_SHARE_WRITE);
	local lpSecurityAttributes = nil;
	local dwFlagsAndAttributes = bor(ffi.C.FILE_ATTRIBUTE_NORMAL, FILE_FLAG_OVERLAPPED);
	local hTemplateFile = nil;

	local rawHandle = core_file.CreateFileA(
        lpFileName,
        dwDesiredAccess,
        dwShareMode,
    	lpSecurityAttributes,
        dwCreationDisposition,
        dwFlagsAndAttributes,
    	hTemplateFile);

	if rawHandle == INVALID_HANDLE_VALUE then
		return nil, errorhandling.GetLastError();
	end

	return self:init(rawHandle)
end

NativeFile.getNativeHandle = function(self)
  return self.Handle.Handle
end

-- Cancel current IO operation
NativeFile.cancel = function(self)
  local res = core_file.CancelIo(self:getNativeHandle());
end

-- Close the file handle
NativeFile.close = function(self)
  self.Handle:free();
  self.Handle = nil;
end

NativeFile.createOverlapped = function(self, buff, bufflen, operation, deviceoffset)
	if not IOProcessor then
		return nil
	end

	fileoffset = fileoffset or 0;

	local obj = ffi.new("FileOverlapped");

	obj.file = self:getNativeHandle();
	obj.OVL.operation = operation;
	obj.OVL.opcounter = IOProcessor:getNextOperationId();
	obj.OVL.Buffer = buff;
	obj.OVL.BufferLength = bufflen;
	obj.OVL.OVL.Offset = deviceoffset;

	return obj, obj.OVL.opcounter;
end

-- Write bytes to the file
NativeFile.writeBytes = function(self, buff, nNumberOfBytesToWrite, offset, deviceoffset)
	fileoffset = fileoffset or 0

	if not self.Handle then
		return nil;
	end

	local lpBuffer = ffi.cast("const char *",buff) + offset or 0
	local lpNumberOfBytesWritten = nil;
	local lpOverlapped = self:createOverlapped(ffi.cast("uint8_t *",buff)+offset,
		nNumberOfBytesToWrite,
		IOOps.WRITE,
		deviceoffset);

	if lpOverlapped == nil then
		lpNumberOfBytesWritten = ffi.new("DWORD[1]")
	end

	local res = core_file.WriteFile(self:getNativeHandle(), lpBuffer, nNumberOfBytesToWrite,
		lpNumberOfBytesWritten,
  		ffi.cast("OVERLAPPED *",lpOverlapped));

	if res == 0 then
		local err = errorhandling.GetLastError();
		if err ~= ERROR_IO_PENDING then
			return false, err
		end
	else
		return lpNumberOfBytesWritten[0];
	end

	if IOProcessor then
    	local key, bytes, ovl = IOProcessor:yieldForIo(self, IOOps.WRITE, lpOverlapped.OVL.opcounter);
--print("key, bytes, ovl: ", key, bytes, ovl)
	    return bytes
	end
end

NativeFile.readBytes = function(self, buff, nNumberOfBytesToRead, offset, deviceoffset)
	offset = offset or 0
	local lpBuffer = ffi.cast("char *",buff) + offset
	local lpNumberOfBytesRead = nil
	local lpOverlapped = self:createOverlapped(ffi.cast("uint8_t *",buff)+offset,
		nNumberOfBytesToRead,
		IOOps.READ,
		deviceoffset);

	if lpOverlapped == nil then
		lpNumberOfBytesRead = ffi.new("DWORD[1]")
	end

	local res = core_file.ReadFile(self:getNativeHandle(), lpBuffer, nNumberOfBytesToRead,
		lpNumberOfBytesRead,
		ffi.cast("OVERLAPPED *",lpOverlapped));

	if res == 0 then
		local err = errorhandling.GetLastError();

--print("NativeFile, readBytes: ", res, err)

		if err ~= ERROR_IO_PENDING then
			return false, err
		end
	else
		return lpNumberOfBytesRead[0];
	end

	if IOProcessor then
    	local key, bytes, ovl = IOProcessor:yieldForIo(self, IOOps.READ, lpOverlapped.OVL.opcounter);

    	local ovlp = ffi.cast("OVERLAPPED *", ovl)
    	print("overlap offset: ", ovlp.Offset)

--print("key, bytes, ovl: ", key, bytes, ovl)
	    return bytes
	end

end

return NativeFile;

This is enough of a start. If you want to simply open a file:

local NativeFile = require("NativeFile")
local fd = NativeFile("sample.txt");

From there you can use readBytes(), and writeBytes(). If you want to do streaming, you can feed this into the new and improved Stream class like this:

local NativeFile = require("NativeFile") 
local Stream = require("stream") 
local IOProcessor = require("IOProcessor")

local function main()

  local filedev, err = NativeFile("./sample.txt", nil, OPEN_EXISTING, FILE_SHARE_READ)

  -- wrap the file block device with a stream
  local filestrm = Stream(filedev)

  local line1, err = filestrm:readLine();  
  local line2, err = filestrm:readLine();  
  local line3, err = filestrm:readLine()

  print("line1: ", line1, err)  
  print("line2: ", line2, err)  
  print("line3: ", line3, err) 
end

run(main)

The Stream class looks for readBytes() and writeBytes(), and can provide the higher level readLine(), writeLine(), read/writeString(), and a few others. This is great because it can be fed by anything that purports to be a block device, which could be anything from an async file, to a chunk of memory.

And that’s about it for now. There are subtleties when dealing with async file access in windows. Having a nice abstraction on top of it gives you all the benefits of async without all the headaches.

 


Creating Udp Echo Service in TINN

These days, networking applications utilize more than a single protocol at a time. My current server, which is a software router of sorts, needs to support TCP/IP as well as Udp channels at the same time. On top of the TCP is HTTP, but that’s already been covered.

Here I present the support for the Udp protocol. Udp differs from TCP in a few ways, key being the lack of “connection”. Every single packet is individually addressed and sent to the intended recipient. Of course you can cache the DNS lookup, so that the delivery of the packets themselves is blazing fast. There’s no redundancy, no ack, no error recovery.

When TCP/IP/UDP were first created, the error rate was probably much higher than it is today. These days, depending on the network, Udp might be a perfectly reasonable choice. The trick, from a TINN perspective, is to make programming with either protocol look relatively the same. For the most part, this just means using the same mechanism for async calls.

Here’s what the server code looks like, minus the error recovery logic:

local ffi = require("ffi");

local IOProcessor = require("IOProcessor");
local IOCPSocket = require("IOCPSocket");

-- Setup the server socket
local socket, err = IOCPSocket:create(AF_INET, SOCK_DGRAM, 0);
local success, err = socket:bindToPort(9090);

-- Setup buffers to be used to receive data
local bufflen = 1500;
local buff = ffi.new("uint8_t[?]", bufflen);
local from = sockaddr_in();
local fromLen = ffi.sizeof(from);


-- The primary application loop
local loop = function()

  while true do
    local bytesread, err = socket:receiveFrom(from, fromLen, buff, bufflen);

    if not bytesread then
      return false, err;
    end

    -- echo back to sender
    local bytessent, err = socket:sendTo(from, fromLen, buff, bufflen);
  end
end

run(loop);

And that’s about all there is to it. In this particular case, a single packet is received, and that packet is immediately sent back to whomever sent it. In this The receiveFrom(), and sendTo(), do in fact use IOCompletion ports, and, for a more complex server that actually does work, you might formulate this differently, utilizing multiple receive buffers, and spawning a task for each packet. But, this is the most basic form of doing Udp handling with TINN.

The socket:receiveFrom() implementation is pretty much the same as that for socket:receive(), except for the addition of the address information so you can see who sent the message, and so you can subsequently return the packet to the source.

This code is not particularly hard, and if you were programming in ‘C’, it would look pretty much the same. The key benefit though comes from the automatic semi-concurrency which is possible, without really changing the code that much. This is what makes it easier to integrate and handle.


The Challenge of Writing Correct System Calls

If you are a veteran of using Windows APIs, you might be familiar with some patterns. One of them is the dual return value:

int somefunc();

Documentation: somefunc(), will return some value other than 0 upon success. On failure, it will return 0, and you can then call GetLastError() to find out which error actually occured…

In some cases, 0 means error. In some cases, 0 means success, and anything else is actually the error. In some cases they return BOOL, and some BOOLEAN (completely different!).

Another pattern is the “pass me a buffer, and a size…”

int somefunc(char *buff, int sizeofbuff)

Documentation: Pass in a buffer, and the size of the buffer. If the ‘sizebuff’ == 0, then the return value will indicate how many bytes need to be allocated in the buffer, so you can call the function again.

A slight variant is this one:

int somefunc(char *buff, int * sizeofbuff)

In this case, the return value of the function will indicate whether there was an error or not. If there was an error such as: ERROR_INSUFFICIENT_BUFFER, then the ‘sizeofbuff’ was stuffed with the actual size needed to fulfill the request.

This can be very confusing to say the least. What makes it more confusing, at least in the Windows world, is that there is no single way this is done. Windows APIs have existed since the beginning of time, so there is as much variety in the various APIs as there are programmers who have worked on them over the years.

How to bring sanity to this world? I’ll examine just one case where I use Lua to make a more sane picture. I want to get the current system time. In kernel32, there are a few date/time formatting functions, so I’ll use one of those. Here is the ffi to the one I want:

ffi.cdef[[
int
GetTimeFormatEx(
    LPCWSTR lpLocaleName,
    DWORD dwFlags,
    const SYSTEMTIME *lpTime,
    LPCWSTR lpFormat,
    LPWSTR lpTimeStr,
    int cchTime
);
]]

That’s one hefty function to get a time printed out in a nice way. There are pointers to unicode strings, pointers to structures that contain the system time, size of buffers, buffers in unicode…

In the end, I want to be able to do this: GetTimeFormat(), and have it print: “7:54 AM”

Alrighty then. Can’t be too hard…

local GetTimeFormat = function(lpFormat, dwFlags, lpTime, lpLocaleName)
  dwFlags = dwFlags or 0;
	
  --lpFormat = lpFormat or "hh':'mm':'ss tt";
  if lpFormat then
    lpFormat = k32.AnsiToUnicode16(lpFormat);
  end

  -- first call to figure out how big the string needs to be
  local buffsize = k32Lib.GetTimeFormatEx(
    lpLocaleName,
    dwFlags,
    lpTime,
    lpFormat,
    lpDataStr,
    0);

  -- buffsize should be the required size
  if buffsize < 1  then
    return false,  k32Lib.GetLastError();
  end

  local lpDataStr = ffi.new("WCHAR[?]", buffsize);
  local res = k32Lib.GetTimeFormatEx(
    lpLocaleName,
    dwFlags,
    lpTime,
    lpFormat,
    lpDataStr,
    buffsize);


  if res == 0 then
    return false, Lib.GetLastError();
  end

  -- We have a widechar, turn it into ASCII
  return k32.Unicode16ToAnsi(lpDataStr);
end

Not too bad, if a bit redundant. There are a couple of things of note, which are easy to miss if you’re not paying close attention.

First of all, I’m following the convention that any system function that succeeds should return the value, and if it fails, it should return false, and an error.

First thing to do is deal with default parameters. The dwFlags parameter is an integer, so if it has not been specified, a default value of ‘0’ will be used. If you don’t do this, then a ‘nil’ will be passed to the system function, and that will surely not work.

The time value can be passed in. If it is, it will be used. If not, then the nil in this position will result in using current system time. Same goes for localeName, and lpFormat. If they are nil, then the system default values will be used, according to the function call documentation.

The next important thing is, turning the lpFormat string into a unicode string if it was specified. Lua, by default, deals in straight ANSII 7/8 bit strings, not unicode, so by default, I assume what’s been specified is a standard Lua ansii string, so I convert it to unicode.

And finally, the first function call to the system. In this first call, I want to get the size of the buffer needed, so I pass in ‘0’ as the size of the buffer. The return value of the function will be the size of the buffer needed to fill in the string. Of course, if the return value is ‘0’, then the ‘GetLastError()’ function must be called to figure out what was wrong. In this case, it could be that one of the parameters specified was wrong, or something else. But, bail out at any rate.

Now that I know how big the buffer needs to be (in unicode characters, not in bytes?), I allocate the appropriate buffer, and make the call again, this time passing in the specified buffer.

Last step, take the unicode string that was returned, and turn it back into an ansii string so the rest of Lua can be happy with it.

There are a couple more error conditions that could possibly be handled, like checking the types of the passed in parameters, or the size of the needed buffer might change between the two system calls, but this is a ‘good enough’ approach.

It’s 38 lines of boilerplate code to ensure the simplicity and relative correctness of a single system call. With literally hundreds of very interesting system calls in the Win32 system, you can imagine how challenging it can get to do these things right.

Of course, this is why libraries exist, because someone has actually gone through and done all the challenging work to get things right. I find that doing this work in Lua is pretty easy. The biggest challenge is reading and interpreting the documentation of the API. Sometimes it’s clear, sometimes it’s not. Once conquered though, it sure does make programming in Windows a lot easier. I suspect the same it true of any language/os binding.


Code Consolidation

I have these two projects:

LJIT2Win32 – Provides LuaJIT FFI access to various of Microsoft’s most useful APIs

and

LJIT2WinCNG – Provides LuajJIT FFI access to Microsoft’s latest crypto APIs

Windows is a large sprawling API set which represents the very history of the last 30 years of personal computing.  As such, you find many different styles, many different approaches, redundancies, inefficiencies, etc.

I want to program on Windows, and I want to do it in such a way that is fairly consistent, predicatable, robust, and reliable.  These libraries take a layered approach to providing exactly this.  I had originally started with another project, BanateCoreWin32, in which I had dumped a whole bunch of experimental stuff, then I took a more systematic approach to create LJIT2Win32.  When I started into the Crypto stuff, I thought it warranted its own project, so I developed the LJIT2WinCNG project.  I haven’t made much changes to this last one, so I thought it was good enough to join the mainstream.  So, any future changes, can all be consolidated in one place.

I have some more Windows specific code related to crypto and networking that I will eventually merge into this codebase as well.

Microsoft itself has provided various frameworks to core Win32 functions over the years.  Everything from MFC, ATL, to .net and Silverlight.  With each iteration and library comes a whole new ecosystem and environment.  What I’ve done in my approach is not to really introduce too much overhead or new concepts.  At the lowest level, it’s just FFI interfaces to core .dll functions.  You could program at this level and be very happy.  Your C/C++ code would port almost seamlessly.  I do provide wrappers and higher level functions where it makes sense.

I’ve written about API pain previously, and I try to apply the lessons learned as much as possible.  I will say that once you get a more uniform API to work against, it’s amazing how powerful Windows really is.  I’d say Linux gives you a lot more control of some rather lower level stuff.  Windows has been around long enough that device drivers and other software libraries often times are available only on Windows, and slowly show up in other environments.  Having easy, garbage collected interfaces to this stuff really makes programming in Windows fairly straight forward.

At any rate, some consolidation has occured, now I can continue to evolve a single codebase.