My Head In The Cloud – putting my code where my keyboard is

I have written a lot about the greatness of LuaJIT, coding for the internet, async programming, and the wonders of Windows. Now, I have finally reached a point where it’s time to put the code to the test.

I am running a service in Azure: http://nanotechstyles.cloudapp.net

This site is totally fueled by the work I have done with TINN. It is a static web page server with a couple of twists.

First of all, you can access the site through a pretty name:
http://www.nanotechstyles.com

If you just hit the site directly, you will get the static front page which has nothing more than an “about” link on it.

If you want to load up a threed model viewing thing, hit this:
http://www.nanotechstyles.com/threed.html

If you want to see what your browser is actually sending to the server, then hit this:
http://www.nanotechstyles.com/echo

I find the echo thing to be interesting, and I try hitting the site using different browsers to see what they produce.  This kind of feedback makes it relatively easy to do rapid turnarounds on the webpage content, challenging my assumptions and filling in the blanks.

The code for this web server is not very complex.  It’s the same standard ‘main’ that I’ve used in the past:

local resourceMap = require("ResourceMap");
local ResourceMapper = require("ResourceMapper");
local HttpServer = require("HttpServer")

local port = arg[1] or 8080

local Mapper = ResourceMapper(resourceMap);

local obj = {}

local OnRequest = function(param, request, response)
	local handler, err = Mapper:getHandler(request)


	-- recycle the socket, unless the handler explictly says
	-- it will do it, by returning 'true'
	if handler then
		if not handler(request, response) then
			param.Server:HandleRequestFinished(request);
		end
	else
		print("NO HANDLER: ", request.Url.path);
		-- send back content not found
		response:writeHead(404);
		response:writeEnd();

		-- recylce the request in case the socket
		-- is still open
		param.Server:HandleRequestFinished(request);
	end

end

obj.Server = HttpServer(port, OnRequest, obj);
obj.Server:run()

In this case, I’m dealing with the OnRequest() directly, rather than using the WebApp object.  I’m doing this because I want to do some more interactions at this level that the standard WebApp may not support.

Of course the ‘handlers’ are where all the fun is. I guess it makes sense to host the content of the site up on the site for all to see and poke fun at.

My little experiment here is to give my code real world exposure, with the intention of hardening it, and gaining practical experience on what a typical web server is likely to see out in the wild.

So, if you read this blog, go hit those links. Soon enough, perhaps I will be able to serve up my own blog using my own software. That’s got a certain circular reference to it.


ReadFile – The Good, the bad, and the async

If you use various frameworks on any platform, you’re probably an arm’s length away from the nasty little quirks of the underlying operating system.  If you are the creator of such frameworks, the nasty quirks are what you live with on a daily basis.

In TINN, I want to be async from soup to nuts.  All tcp/udp, socket stuff is already that way.  Recently I’ve been adding async support for “file handles”, and let me tell you, you have to be very careful around these things.

In the core windows APIs, in order to read from a file, you do two things.  You first open a file using the CreateFile(), function.  This may be a bit confusing, because why would you use “create” to ‘open’ an existing file?  Well, you have to think of it like a kernel developer might.  From that perspective, what you’re doing is ‘create a file handle’.  While you’re doing this, you can tell the function whether to actually create the file if it doesn’t exist already, open only if it exists, open read-only, etc.

The basic function signature for CreateFile() looks like this:

HANDLE WINAPI CreateFile(
  _In_      LPCTSTR lpFileName,
  _In_      DWORD dwDesiredAccess,
  _In_      DWORD dwShareMode,
  _In_opt_  LPSECURITY_ATTRIBUTES lpSecurityAttributes,
  _In_      DWORD dwCreationDisposition,
  _In_      DWORD dwFlagsAndAttributes,
  _In_opt_  HANDLE hTemplateFile
);

Well, that’s a mouthful, just to get a file handle. But hay, it’s not much more than you’d do in Linux, except it has some extra flags and attributes that you might want to take care of. Here’s where the history of Windows gets in the way. There is a much simpler function “OpenFile()”, which on the surface might do what you want, but beware, it’s a lot less capable, a leftover from the MSDOS days. The documentation is pretty clear about this point “don’t use this, use CreateFile instead…”, but still, you’d have to wade through some documentation to reach this conclusion.

Then, the ReadFile() function has this signature:

BOOL WINAPI ReadFile(
  _In_         HANDLE hFile,
  _Out_        LPVOID lpBuffer,
  _In_         DWORD nNumberOfBytesToRead,
  _Out_opt_    LPDWORD lpNumberOfBytesRead,
  _Inout_opt_  LPOVERLAPPED lpOverlapped
);

Don’t be confused by another function, ReadFileEx(). That one sounds even more modern, but in fact, it does not support the async file reading that I want.

Seems simple enough. Take the handle you got from CreateFile(), and pass it to this function, including a buffer, and you’re done? Well yah, this is where things get really interesting.
Windows supports two forms of IO processing. Async, and synchronous. The Synchronous case is easy. You just make your call, and your thread will be blocked until the IO “completes”. That is certainly easy to uderstand, and if you’re a user of the standard C library, or most other frameworks, this is exactly the behaviour you can expect. Lua, by default, using the standard io library will do exactly this.

The other case is when you want to do async io. That is, you want to initiate the ReadFile() and get an immediate return, and handle the processing of the result later, perhaps with an alert on an io completion port.

Here’s the nasty bit. This same function can be used in both cases, but has very different behavior. It’s a subtle thing. If you doing synchronous, then the kernel will track the fileposition, and automatically update it for you. So, you can do consecutive ReadFile() calls, and read the file contents from beginning to end.

But… When you do things async, the kernel will not track your file pointer. Instead, you must do this on your own! When you do async, you pass in a instance of a OVERLAPPED structure, wich contains things like a pointer to the buffer to be filled, as well as the size of the buffer. This structure also contains things like the offset within the file to read from. By default, the offset is ‘0’, which will have you reading from the beginning of the file every single time.

typedef struct _OVERLAPPED {
    ULONG_PTR Internal;
    ULONG_PTR InternalHigh;
    union {
        struct {
            DWORD Offset;
            DWORD OffsetHigh;
        };

        PVOID Pointer;
    };

    HANDLE hEvent;
} OVERLAPPED, *LPOVERLAPPED;

You have to be very careful and diligent with using this structure, and the proper calling sequences. In addition, if you’re going to do async, you need to call CreateFile() with the appropriate OVERLAPPED flag. In TINN, I have created the NativeFile object, which pretty much deals with all this subtlety. The NativeFile object presents a basic block device interface to the user, and wraps up all that subtlety such that the interface to files is clean and simple.

-- NativeFile.lua

local ffi = require("ffi")
local bit = require("bit")
local bor = bit.bor;

local core_file = require("core_file_l1_2_0");
local errorhandling = require("core_errorhandling_l1_1_1");
local FsHandles = require("FsHandles")
local WinBase = require("WinBase")
local IOOps = require("IOOps")

ffi.cdef[[
typedef struct {
  IOOverlapped OVL;

  // Our specifics
  HANDLE file;
} FileOverlapped;
]]

-- A win32 file interfaces
-- put the standard async stream interface onto a file
local NativeFile={}
setmetatable(NativeFile, {
  __call = function(self, ...)
    return self:create(...);
  end,
})

local NativeFile_mt = {
  __index = NativeFile;
}

NativeFile.init = function(self, rawHandle)
	local obj = {
		Handle = FsHandles.FsHandle(rawHandle);
		Offset = 0;
	}
	setmetatable(obj, NativeFile_mt)

	if IOProcessor then
		IOProcessor:observeIOEvent(obj:getNativeHandle(), obj:getNativeHandle());
	end

	return obj;
end

NativeFile.create = function(self, lpFileName, dwDesiredAccess, dwCreationDisposition, dwShareMode)
	if not lpFileName then
		return nil;
	end
	dwDesiredAccess = dwDesiredAccess or bor(ffi.C.GENERIC_READ, ffi.C.GENERIC_WRITE)
	dwCreationDisposition = dwCreationDisposition or OPEN_ALWAYS;
	dwShareMode = dwShareMode or bor(FILE_SHARE_READ, FILE_SHARE_WRITE);
	local lpSecurityAttributes = nil;
	local dwFlagsAndAttributes = bor(ffi.C.FILE_ATTRIBUTE_NORMAL, FILE_FLAG_OVERLAPPED);
	local hTemplateFile = nil;

	local rawHandle = core_file.CreateFileA(
        lpFileName,
        dwDesiredAccess,
        dwShareMode,
    	lpSecurityAttributes,
        dwCreationDisposition,
        dwFlagsAndAttributes,
    	hTemplateFile);

	if rawHandle == INVALID_HANDLE_VALUE then
		return nil, errorhandling.GetLastError();
	end

	return self:init(rawHandle)
end

NativeFile.getNativeHandle = function(self)
  return self.Handle.Handle
end

-- Cancel current IO operation
NativeFile.cancel = function(self)
  local res = core_file.CancelIo(self:getNativeHandle());
end

-- Close the file handle
NativeFile.close = function(self)
  self.Handle:free();
  self.Handle = nil;
end

NativeFile.createOverlapped = function(self, buff, bufflen, operation, deviceoffset)
	if not IOProcessor then
		return nil
	end

	fileoffset = fileoffset or 0;

	local obj = ffi.new("FileOverlapped");

	obj.file = self:getNativeHandle();
	obj.OVL.operation = operation;
	obj.OVL.opcounter = IOProcessor:getNextOperationId();
	obj.OVL.Buffer = buff;
	obj.OVL.BufferLength = bufflen;
	obj.OVL.OVL.Offset = deviceoffset;

	return obj, obj.OVL.opcounter;
end

-- Write bytes to the file
NativeFile.writeBytes = function(self, buff, nNumberOfBytesToWrite, offset, deviceoffset)
	fileoffset = fileoffset or 0

	if not self.Handle then
		return nil;
	end

	local lpBuffer = ffi.cast("const char *",buff) + offset or 0
	local lpNumberOfBytesWritten = nil;
	local lpOverlapped = self:createOverlapped(ffi.cast("uint8_t *",buff)+offset,
		nNumberOfBytesToWrite,
		IOOps.WRITE,
		deviceoffset);

	if lpOverlapped == nil then
		lpNumberOfBytesWritten = ffi.new("DWORD[1]")
	end

	local res = core_file.WriteFile(self:getNativeHandle(), lpBuffer, nNumberOfBytesToWrite,
		lpNumberOfBytesWritten,
  		ffi.cast("OVERLAPPED *",lpOverlapped));

	if res == 0 then
		local err = errorhandling.GetLastError();
		if err ~= ERROR_IO_PENDING then
			return false, err
		end
	else
		return lpNumberOfBytesWritten[0];
	end

	if IOProcessor then
    	local key, bytes, ovl = IOProcessor:yieldForIo(self, IOOps.WRITE, lpOverlapped.OVL.opcounter);
--print("key, bytes, ovl: ", key, bytes, ovl)
	    return bytes
	end
end

NativeFile.readBytes = function(self, buff, nNumberOfBytesToRead, offset, deviceoffset)
	offset = offset or 0
	local lpBuffer = ffi.cast("char *",buff) + offset
	local lpNumberOfBytesRead = nil
	local lpOverlapped = self:createOverlapped(ffi.cast("uint8_t *",buff)+offset,
		nNumberOfBytesToRead,
		IOOps.READ,
		deviceoffset);

	if lpOverlapped == nil then
		lpNumberOfBytesRead = ffi.new("DWORD[1]")
	end

	local res = core_file.ReadFile(self:getNativeHandle(), lpBuffer, nNumberOfBytesToRead,
		lpNumberOfBytesRead,
		ffi.cast("OVERLAPPED *",lpOverlapped));

	if res == 0 then
		local err = errorhandling.GetLastError();

--print("NativeFile, readBytes: ", res, err)

		if err ~= ERROR_IO_PENDING then
			return false, err
		end
	else
		return lpNumberOfBytesRead[0];
	end

	if IOProcessor then
    	local key, bytes, ovl = IOProcessor:yieldForIo(self, IOOps.READ, lpOverlapped.OVL.opcounter);

    	local ovlp = ffi.cast("OVERLAPPED *", ovl)
    	print("overlap offset: ", ovlp.Offset)

--print("key, bytes, ovl: ", key, bytes, ovl)
	    return bytes
	end

end

return NativeFile;

This is enough of a start. If you want to simply open a file:

local NativeFile = require("NativeFile")
local fd = NativeFile("sample.txt");

From there you can use readBytes(), and writeBytes(). If you want to do streaming, you can feed this into the new and improved Stream class like this:

local NativeFile = require("NativeFile") 
local Stream = require("stream") 
local IOProcessor = require("IOProcessor")

local function main()

  local filedev, err = NativeFile("./sample.txt", nil, OPEN_EXISTING, FILE_SHARE_READ)

  -- wrap the file block device with a stream
  local filestrm = Stream(filedev)

  local line1, err = filestrm:readLine();  
  local line2, err = filestrm:readLine();  
  local line3, err = filestrm:readLine()

  print("line1: ", line1, err)  
  print("line2: ", line2, err)  
  print("line3: ", line3, err) 
end

run(main)

The Stream class looks for readBytes() and writeBytes(), and can provide the higher level readLine(), writeLine(), read/writeString(), and a few others. This is great because it can be fed by anything that purports to be a block device, which could be anything from an async file, to a chunk of memory.

And that’s about it for now. There are subtleties when dealing with async file access in windows. Having a nice abstraction on top of it gives you all the benefits of async without all the headaches.

 


Bit Twiddling Again? – How I finally came to my senses

Right after I published my last little missive, I saw an announcement that VNC is available on the Chrome browser. Go figure…

It’s been almost a year since I wrote about stuff related to serialization: Serialization Series

Oh, what a difference a year makes!  I was recently implementing code to support WebSocket client and server, so I had reason to revisit this topic.  For WebSocket, the protocol specifies things at the bit level, and in bigendian order.  This poses some challenges for the little-endian machines that I use.  That’s some extreme bit twiddling, and although I did revise my low level BitBang code, that’s not what I’m writing about today.

I have another bit of code that just deals with bytes as the smallest element.  This is the BinaryStream object.  BinaryStream allows me to simply read numeric values out of a stream.  It takes care of handling the big/littleendian nature of things.  The BinaryStream wrapps any other stream, so you can do things like this:

local mstream = MemoryStream.new(1024);
local bstream = BinaryStream.new(mstream, true);

bstream:WriteInt16(0x00ff);
bstream:WriteInt16(0xff00);

mstream:Seek(0);

print(bstream:ReadInt16())
print(bstream:ReadInt16())

This quite handy for doing all the things related to packing/unpacking bytes of memory. Of course there are plenty of libraries that do this sort of thing, but this is the one that I use.

The revelation for me this time around had to do with the nature of my implementation. In my first incarnation of these routines, I was doing byte swapping manually, like this:

function BinaryStream:ReadInt16()

  -- Read two bytes
  -- return nil if two bytes not read
  if (self.Stream:ReadBytes(types_buffer.bytes, 2, 0) <2)
    then return nil
  end

  -- if we don't need to do any swapping, then
  -- we can just return the Int16 right away
  if not self.NeedSwap then
    return types_buffer.Int16;
  end

  local tmp = types_buffer.bytes[0]
  types_buffer.bytes[0] = types_buffer.bytes[1]
  types_buffer.bytes[1] = tmp

  return types_buffer.Int16;
end

Well, this works, but… It’s the kind of code I would teach to someone who was new to programming, not necessarily the best, but shows all the detail.

Given Lua’s nature, I could have done the byte swapping like this:

types_buffer.bytes[0], type_buffer.bytes[1] = types_buffer.bytes[1], types_buffer.bytes[0]

Yep, yes sir, that would work. But, it’s still a bit clunky.

I have recently also been implementing some TLS related stuff, and in TLS there are 24-bit (3 byte) integers. In order to read them, I really want a generic integer reader:

function BinaryStream:ReadIntN(n)
  local value = 0;

  if self.BigEndian then
    for i=1,n do
      value = lshift(value,8) + self:ReadByte()
    end
  else
    for i=1,n do
      value = value + lshift(self:ReadByte(),8*(i-1))
    end
  end

  return value;
end

print(bstream:ReadIntN(3))

Well, this will work if there’s 1, 2, 3, or 4 byte integers. Can’t work beyond that because the bit operations only work up to 32 bits. But, ok, that makes things a lot easier, and reduces the amount of code I have to write, and puts all the endian stuff in one place.

Then there’s 64-bit, float, and double.

In these cases, the easiest thing is to use a union structure:

ffi.cdef[[
typedef union  {
  int64_t		Int64;
  uint64_t	UInt64;
  float 		Single;
  double 		Double;
  uint8_t bytes[8];
} bstream_types_t

function BinaryStream:ReadBytesN(buff, n, reverse)
  if reverse then
    for i=n,1,-1 do
      buff[i-1] = self:ReadByte()
    end
  else
    for i=1,n do
      buff[i-1] = self:ReadByte()
    end
  end
end

function BinaryStream:ReadInt64()
  self:ReadBytesN(self.valunion.bytes, 8, self.NeedSwap)
  return tonumber(self.valunion.Int64);
end

function BinaryStream:ReadSingle()
  self:ReadBytesN(self.valunion.bytes, 4, self.NeedSwap)
  return tonumber(self.valunion.Single);
end

function BinaryStream:ReadDouble()
  self:ReadBytesN(self.valunion.bytes, 8, self.NeedSwap)
  return tonumber(self.valunion.Double);
end

Of course, with Lua, the 64-bit int is limited to only 52 bits, but the technique will work in general. The Single and Double, you just need to get the bytes in the right order and everything is fine. Whether this is compatible with another ordering of the bytes or not depends on the other application, but at least this is self consistant.

This incarnation uses functions a lot more than the last incarnation. This is the big revelation for me. In the past, I was thinking like a ‘C’ programmer, and essentially trying to do what I would do in assembly language. Well, I realize this is not necessarily the best way to go with LuaJIT. Also, I was trying to optimize by getting stuff into a buffer, and messing around with it from there, assuming getting stuff from the underlying stream is expensive. Well, that simply might not be a good assumption, so I relaxed it.

With this newer implementation, I was able to drop 200 lines of code, out of 428. That’s a pretty good savings. This in and of itself might be worthwhile because the code will be more easily maintained, due to smaller and simpler implementation.

So, every day, I see and hear things about either my own code, or someone else’s and I try to apply what I’ve learned to my own cases. I’m happy to rewrite code, when it results in smaller tighter, more accurate coding.

And there you have it.


Tech Preview 2013

I’ve been champing at the bit to write this post. A new year, a sunny day! I read something somewhere about how futurists got the whole prediction game wrong. Rather than trying to describe the new and exciting things that would be showing up in oh so many years in the future, they should be describing what would be removed from our current context. With that in mind, here are some short to medium term predictions.

More wires will be removed from our environment. The keyboard and mouse wires will be gone, replaced by wireless, bluetooth, or otherwise. The ubiquitous s-video, and various component audio/video cables will disappear. They’ll be replaced with a single wire in the form of hdmi, or they’ll disappear altogether, being replaced by wireless audio/video transmission.

The “desktop Computer” will disappear. Basic “airplay” capability will be baked into monitors of all stripes, so that the compute component of anyone’s environment will consist of nothing more than a little computicle, combined with whatever input devices the user so happens to need for their particular task. “Touch” sensors, such as the Leap Motion, will find their way into more interesting and interactive input activities.

Power Consumption related to computing will continue to shrink. Since cell phones seem to be the focus of current compute innovation, this genre will drive the compute world. With the likes of the Raspberry Pi, and the Odroid-U2 becoming popular forms of System on Chip compute platforms, the raw requirements for a compute experience will be reduced from 80watts to about 5 watts.

A bit longer term…

Drivers will be removed from the driving experience. With BMW, Google, and others experimenting with driverless vehicles, this will eventually become the preferred method of transportation. Particularly with an aging population in places, it will likely be safer to have seniors driven around in nice Prius like pods, rather than having them drive themselves.

“Data Centers” will become irrelevant. This is a bit of a stretch, but the thinking goes like this. A Data Center is a concentration of communications, power consumption, compute capability, and ultimately data storage. They are essentially the Timesharing/mainframe model of 50 years back done up in large centralized format. Why would I use a data center though. If I had fast internet (100 Mbps or better) to my home/business, would I need a data center? If I have enough compute power in my home, in the form of 3 or 4 communications servers, do I need a data center? If I have 16Tb of storage sitting under my desk at home, do I need a data center? In short, if I eliminate the redundancy, uptime guarantees, etc, that a data center gives me, I can probably supply the same from my home/small business, at an equally affordable cost. As compute and storage costs continue to decrease, and power consumption of my devices goes with them, the tipping point for data center value will change, and doing it from home will become more appealing.

Trending… breaking from the “things that will be removed”, and applying a more traditional “what will come” filter…

“Data” will become less important as “Connections” becomes more important. In recent years, blogs were interesting, then tweeting, the micro form of the blog, became more interesting. At the same time, Facebook started to emerge, which is kind of like a stretched out tweet/blog. Now along comes Pinterest. And in the background, there’s been Google. Pinterest represents a new form of communication. Gone are the words, replaced by connections. I can infer that I’ll be interested in something you’ll be interested in by seeing how many times I’ve liked other things that you’ve been interested in. If I’m an advertiser, tracking down the most pinned people, and following what they’re pinning is probably a better indicator of relevance than anything Google or Facebook have to offer. The connections are more important than any data you can actually read. In fact, you probably won’t read much, you’ll look at pictures, and tiny captions, and possibly follow some links. If there’s a prediction here, it is that the likes of Google, Facebook, and Twitter, will be fallen by the likes of Pinterest and others who are driving in the “graph” space.

3D Printing. As this year will see the likely emergence of the FormsLab printer, as well as continued evolution of the ABS and PLA based printers, content for these printers will become more interesting. The price point for the printers will continue to hover around $1500, for the truly “plug and play” variety, as opposed to the DIY variety.

For 3D content design, the Raspberry Pi, and ODroid-U2, will be combined with the LeapMotion input device to create truly spectacular and easy to use 3D design tools, for less than $200 for a hardware/software combination.

As computicles become cheaper, the premium for software will continue to decrease. There will be a consolidation of the hard/software offering. When a decent computicle costs $35 (Raspberry Pi), then it’s hard to justify software costing any more than $5. If you’re a software manufacturer, you will consider creating packages that include both the hardware and software in order to get any sort of premium.

Ubiquitous computicles will see the emergence of a new hardware “hub”. Basically a wifi connected device that attaches to various peripherals such as the LeapMotion, a Kinect, a HDMI based screen. It will managed the various interactive inputs from the various other computicles located within the home/office. Rather than being a primary compute device itself, it will act as a coordinator of many devices in a given environment.

That’s about it for this year. No doom and gloom zombie scenarios as far as I can see. Some esoteric trending on many fronts, some empire breaking trends, some evolution of technologies which will make our lives a little easier to live.


The Esoteric Year – 2012

Well, here it is, the end of another year. December 21st came and went, and the earth’s crust did not open up and swallow us all, unless it did and we’re all now in purgatory waiting to sort things out.

For my part, I spent the year relearning how to program. That’s an odd statement coming from someone who’s been programming professionally for about 25 years. Towards the end of 2011, I decided to pick up the Lua language as a tool for doing rapid prototyping and whatnot. I chose it because it’s a fairly minimalist language, very tiny runtime, runs everywhere, etc.

My editor of choice for much of the year was SciTe. At some point, I switched away from that and started using NotePad++. Finally, I’m settling into Sublime Text 2, because it works on the Mac, as well as Windows.

For most of the time, I did not use any IDE, or any specialized debugging tools, other than a ‘print()’ function. This is pretty much the same way that I programmed before the days of GDB. It imposes a certain amount of discipline, and forces you to program in a very succinct way.

I found that I was more thoughtful about my designs. I worried more about very basic data structures, and getting some very small things right. I wrote a lot of code, and a lot of it has references in my Summaries page. I visited things as basic as endianness when reading and writing integers to streams. I explored interfacing to the Kinect, playing with Windows Cryptography, Next Generation (Win CNG), and I played around a lot with the Raspberry Pi.

My primary development environment is Windows, and there’s a lot of positives I experience from being on that platform. I have found that Lua has been the tool that makes exploration of any platform the most productive. You can easily create simple interfaces to underlying COM and C interfaces which otherwise require quite a lot of code.

One nice thing is that I discovered some very esoteric libraries related to hash functions, data structures, and the like. I put some of those discoveries into my LAPHLIBS project. You can find things like a minimal CRC32 function which does not require a giant table of integers like most implementations. I had initially started writing my own version of atoi, because I wasn’t happy with the native one provided by Lua, but then the LuaJIT version improved dramatically, so I gave up on my own. Similarly with random number generator. LuaJIT has a great one that works across all platforms LuaJIT runs on.

Another path that I really explored deeply was networking. With all the excitement of Node.js and the underlying LibUV library, I wanted to see how hard it was to do the same purely in Lua. Well, it turns out that doing the whole epoll thing on Linux, or the equivalent in Windows, is pretty straight forward using LuaJIT and its FFI. So, no need for LibUV, at least not for interfacing with epoll or io completion ports. Just go to them directly from LuaJIT, and you’re all set. I think this actually makes for a more compact and maintainable system, but it’s definitely not mainstream.

One of the benefits I found in using LuaJIT is portability to very small devices. As it turns out, Node.js is fairly capable, even on routers that have only 64Mb of RAM. But, LuaJIT is even faster on such small devices. It turns out that many router manufacturers include it as part of their distributions. That makes for interesting theatre.

Lastly, I explored the world of the tiny devices. The most recent exploration is looking at the Odroid-U2. A very tiny device indeed. It runs Linux and Android, and I think can be the basis of many small and powerful computing projects. It may turn out that something like Microsoft’s Drawbridge shows up on these devices, bringing a wealth of Win32 programs along with it, or it might just be a Linux and Android world. Or, it might be something else altogether. One thing is for sure, computing is definitely getting interesting.

In short, it’s been an interesting year, and I’ve explored a lot of esoteric stuff. I believe now I am actually in a position to start stitching things together in meaningful ways, which is essentially a reset on my programming capabilities to fit with the modern world.


Final Approach – Residential Gateway cleanup

From my last outing, I was figuring out how to optimize my residential internet equipment. From my little experiments, I had determined that my Vonage box was wrongly placed in the chain of things. Since it has been moved out to just being another device, rather than a bottleneck, things are humming right along.

One of the other findings from my tests was that my AC adapter ethernet was really not that fast. From my gigabit ethernet connection, on my desktop machine, I really wasn’t getting anywhere close to my money’s worth. I recently changed this situation somewhat.

First, I put my HP C6380 printer on wireless, instead of having it attached to the PC. That’s a good thing as all devices in the house can now print, without having having the PC turned on. Of course, the printer is old enough that not all devices have the appropriate drivers to make the thing go. Next up was the desktop machine itself. Ideally, I’d plug it directly into a gig port on the switch, which is just on the other side of my office wall, so close I can smell it, but alas, I can not punch holes in the wall, so I have to settle for alternatives.

For the moment, I’ve stuck a fairly inexpensive wifi usb dongle into the machine, this one from Adafruit: OurLink

This is one of those micro dongles that’s really only meant for microcontrollers, laptops, and the like, but I figured it would be worth a try on the desktop machine so I could compare to what I was getting out of the AC Adapter ethernet connection.  So, what did I get with Speedtest?

2ms (ping)   37.35Mbps (download)   32.25Mbps (upload)

So, that’s better than what I was seeing going through the AC Powerline ethernet adapter, but a far cry from what this machine is capable of doing. But, the experiment was worth it. So, I’ve disconnected the machine from the small router in the office.

The only two pieces of equipment remaining tethered are my Buffalo disk storage device, and my Raspberry Pi. The Odroid-X is already wireless. I’d like to get the Raspberry Pi speaking wireless, but thus far, having it deal with a Wifi Dongle, as well as a wireless keyboard/mouse, hasn’t worked out too well. Perhaps a recent board will do better at the task. Until then, I’ll just keep it tethered.

Lessons learned from this little venturing, there are a lot more wires that can be eliminated from my work environment. HDMI has taken care of the VGA/Audio cable. Same for my TV. WiFi has taken care of my ethernet cables. Wireless keyboard and mouse are taking care of those pesky mouse/keyboard cables. It’s getting downright uncluttered around my desk these days.

Well, that’s a good way to reflect back upon a year. Removing wires, becoming that much more tidy and efficient.


Residential Internet

I live in the Seattle area.  Furthermore, I live in an apartment building.  Recently there was a flyer in the elevator that said “Condo Internet.  The fastest residential internet available”.  It promised speeds of 100Mbps (megabits per second), for only $60/mo, and even gigabit if you’re willing to pay $120/month.

I had Comcast, and that was giving me a super duper 12Mbps/download, 5mbps/upload, at roughly $85/mo.  Being college educated, I did the math and figured going with Condo Internet was worth a shot.

They were in our building one night, doing sign ups, so I had them do their thing and install stuff.  It turns out there is a wiring closet in the apartment, and a single ethernet wire comes into there.  That wire goes down to a closet a couple floors below ours, and I guess they installed some nice router hardware in that closet, bringing the 100Mbps into our apartment.

It worked right off the bat.  I put our older netgear router in the wiring closet, and that seemed to work.  But, the netgear, being a few years old, was not really up to the task of handling 100Mbps, let alone multiple attached gigabit connections.  Wanting to get full spectrum out of my new deal, I replaced the router in the closet with a simple switch.

First thing, I purchased a Netgear ProSAFE 8-port Gigabit Desktop Switch (GS108NA).  If you work at some company that has offices, you might have one of these sitting under your desk.  This has gigabit connections all the way through.  So, the single blue wire from the closet gets connected to this device.  Then 4 other blue ethernet wires get connected to this switch.  Those wires terminate at RJ45 connectors which are in each of the major rooms in the apartment.  Good bit of forward thinking on the apartment builder’s part.  The only way it could have been better is if there were multiple lines coming into the apartment.

Alright, at this point, I’ve got 4 hard wired gigabit connections to the closet, and each of those will get whatever speed Condo Internet so happens to be giving us.

Great!  and I speed tested while they were in the apartment, and I was in fact seeing roughly 85Mbps, through the older Netgear router (RangeMax Duo WNDR3300).

Great!  I’m all set.  Super fast internet access is now at my disposal… Or is it?

When they did the hookup, I had taken everything off the local home LAN.  So, I started putting things back together.  Since I had replaced the NetGear router with a gigabit switch, I figured I’d upgrade the wireless router as well.  So, I purchased an ASUS RTN66U.  I had actually already purchased it for another project, but I figured I’d put it to use at home.  The Condo Internet people recommended the Apple AirPort Extreme, because it seems to have superior wireless, but, not wanting to spend the extra $180, I just stuck with the ASUS, which is no slouch, and was second on their list anyway.

These things are supposed to be able to give you 300Mbps speeds on wireless.  I certainly never saw anything like that with the NetGear.

I also have Vonage for internet phone.  The setup in the past has been, Vonage, first in the chain, then the router plugs into the Vonage box.  So, I did that again.  Then, hard wired to the router are: XBox, Blue-Ray Player, Ethernet over Power Line (for those places where I don’t have RJ45 or wireless).

It all seemed to work fine.  But, I’m paying for 100Mbps, so I run the SpeedTest thing. Well, low and behold, I wasn’t getting anywhere near the speeds I was seeing before! What the heck?  Was I just pinked?  Doing a bit of debugging, I went back and plugged the hard wire into the different RJ45 jacks around the house, and I was getting full 100Mbps.  But, not with anything connected to the router, including wireless.

Hmmm, what a mystery.  Well, how about I start from basics.  Remove the Vonage from the loop and see what happens.  Pow!!  That was it.  With the Vonage in the loop, best download speeds were sucking along at a paltry 5Mbps download, and 1Mbps upload!!  Hah, I was probably getting that even with ComCast.  Basically, rate limiting my own internet access because I was using this dumb Vonage hardware.

Once the Vonage was out of the loop, things looked like this:

 

Device            PING        Download      Upload
Wireless
iPad II           9ms           39            21
Surface           0ms           40            32
Dell Latitude     4ms          101           149
Macbook Pro       3ms           96           207

Wired
Dell Latitude     2ms          101           345
Desktop (AC)      4ms           30            32

The Download/Upload numbers are in megabits per second. A couple of things of note. The Dell Latitude laptop, when plugged into the ASUS router, got speeds way above advertised. At 345Mbps, that’s much better than what I’ve ever seen in any work environment I’ve ever been in. Certainly it’s better than anything I’ve ever seen with Comcast.

The Windows Surface performs roughly the same as an iPad 2. I don’t have a 3 or 4 to test with, so that’s that.

The Dell and MacBook Pro perform roughly the same on downloads, but the MacBook edges out the Dell for uploads.

The Desktop machine is a sad story. Right now I’m using these Netgear ethernet over AC adapters, to connect the gigabit ethernet port on the desktop machine to the rest of the network. Unfortunately, there was not an RJ45 in the room where this machine is. As the paltry numbers indicate, I’d be better off just getting a nice fast wireless adapter for that machine, and call it a day.

All in all, for $60/month, I am very pleased with my Condo Internet connection. It’s funny when you consider the state of most business internet connections. I’ve seen ads for 45Mbps internet for $200/month, and more. For that price, I could be getting gigabit speeds at home.

The other lesson I learned here is that sometimes it’s worthing throwing out some ancient technology (very old home networking equipment) to take advantage of new stuff.

This gives me another though. Through my newfound ISP, I can also get a static IP address, and they fully support IPV6 all the way through. Given that, and equipment like the synology box, I wonder if the internet of things is just around the corner in our house…


Screencast of the Raspberry Pi

It’s one of those innevitabilities.  Start with fiddling about with low level graphics system calls, do some screen capture, then some single file saving, and suddenly enough you’ve got screen capture movies!  Assuming WordPress does this right.

If you’ve been following along, the relevant code looks like this:

-- Create the resource that will be used
-- to copy the screen into.  Do this so that
-- we can reuse the same chunk of memory
local resource = DMXResource(displayWidth, displayHeight, ffi.C.VC_IMAGE_RGB888);

local p_rect = VC_RECT_T(0,0,displayWidth, displayHeight);
local pixdata = resource:CreateCompatiblePixmap(displayWidth, displayHeight);

local framecount = 120

for i=1,framecount do
	-- Do the snapshot
	Display:Snapshot(resource);

	local pixeldata, err = resource:ReadPixelData(pixdata, p_rect);
	if pixeldata then
		-- Write the data out
		local filename = string.format("screencast/desktop_%06d.ppm", i);
		print("Writing: ", filename);

		WritePPM(filename, pixeldata);
	end
end

In this case, I’m capturing into a bitmap that is 640×320, which roughly matches the aspect ratio of my wide monitor.

This isn’t the fastest method of capturing on the planet. It actually takes a fair amount of time to save each image to the SD card in my Pi. Also, I might be able to eliminate the copy (ReadPixelData), if I can get the pointer to the memory that the resource uses.

This little routine will generate a ton of .ppm image files stored in the local ‘screencast’ directory.

From there, I use ffmpeg to turn the sequence of images into a movie:

ffmpeg -i desktop_0x%06.ppm  desktop.mp4

If you’re a ffmpeg guru, you can set all sorts of flags to change the framerate, encoder, and the like. I just stuck with defaults, and the result is what you see here.

So, the Pi is capable. It’s not the MOST capable, but it can get the job done. If I were trying to do this in a production environment, I’d probably attach a nice SSD drive to the USB port, and stream out to that. I might also choose a smaller image format such as YUV, which is easier to compress. As it is, the compression was getting about 9fps, which ain’t too bad for short clips like this.

One nice thing about this screen capture method is that it doesn’t matter whether you’re running X Windows, or not. So, you’re not limited to things that run in X. You can capture simple terminal sessions as well.

I’m rambling…

This works, and it can only get better from here.

It is part of the LJIT2RPi project.


Simplified API Development – Part 2, First Steps

The series so far:

Simplified API Development – Part I, Ground Rules

To recap the ground rules:

1) All functions will have a return value

2) The first value is either true/false, or the return value of the function

So, I’ll take a step back.  What functions are we talking about exactly?  Well, let’s take a look at networking APIs just for kicks.  Of course I’m going to use the LuaJIT FFI mechanism to demonstrate, but realistically you could use any modern programming language that has multiple return value capabilities.

First up is the FFI layer.  In LuaJIT, the FFI layer is the most basic interop to whatever C library functions you are trying to call.  In order to create the interop layer, you need to identify a few key components.  First of all, which library (.dll) are your functions located in?  In the case of winsock, the primary .dll is ws2_32.dll.  This library is located on every modern day Windows machine.  If you wanted to, you could do a ‘dumpbin /EXPORTS ws2_32.dll’ and see all the functions that are exported from that library.

Next, you have to identify the header files that contain the function prototypes that you’re going to be binding to.  In the case of WinSock, the header files are numerous, and include: WinTypes.h, WinBase.h, mstcpip.h, inaddr.h, in6addr.h, ws2tcpip.h, ws2def.h, winsock2.h… There’s probably a couple more, but these contain the bulk of what you’ll actually use.

These various headers contain all the enums, constants, typedefs, and function prototypes that are expressed within the library.  For interop with through FFI, these headers need to be massaged a bit before they can be used.  Here’s a very simple case, the various functions utilize some specific data type aliases, so you either need to change the function prototypes, or you need to express those aliases.  If you choose to express the aliases, then you’ll have code like this:

 

local ffi = require "ffi"

ffi.cdef[[
typedef uint8_t		u_char;
typedef uint16_t 	u_short;
typedef uint32_t        u_int;
typedef unsigned long   u_long;
typedef uint64_t 	u_int64;
typedef uintptr_t	SOCKET;
typedef uint16_t 	ADDRESS_FAMILY;
typedef unsigned int    GROUP;
]]

Then there’s the various constants. In typical ‘C’ fashion, the header files are littered with #define statements such as:

#define SOCK_STREAM     1    // stream socket
#define SOCK_DGRAM      2    // datagram socket
#define SOCK_RAW        3    // raw-protocol interface

That’s great for C programming, as it’s just lexical sugar. With LuaJIT, you might want to put a little bit more structure around it to get optimal performance:

-- Socket Types
ffi.cdef[[
struct SocketType {
static const int SOCK_STREAM     = 1;    // stream socket
static const int SOCK_DGRAM      = 2;    // datagram socket
static const int SOCK_RAW        = 3;    // raw-protocol interface
static const int SOCK_RDM        = 4;    // reliably-delivered message
static const int SOCK_SEQPACKET  = 5;    // sequenced packet stream
};
]]

I’ll come back to this a bit later.

Next, there’s the interfaces to the functions themselves. In the case of networking, I’ll just focus on the traditional Berkeley socket calls:

-- Berkeley Sockets calls
ffi.cdef[[
u_long	htonl(u_long hostlong);
u_short htons(u_short hostshort);
u_short ntohs(u_short netshort);
u_long	ntohl(u_long netlong);

unsigned long inet_addr(const char* cp);
char* inet_ntoa(struct   in_addr in);

int inet_pton(int Family, const char * szAddrString, const void * pAddrBuf);
const char * inet_ntop(int Family, const void *pAddr, intptr_t strptr, size_t len);

SOCKET socket(int af, int type, int protocol);

SOCKET accept(SOCKET s,struct sockaddr* addr,int* addrlen);

int bind(SOCKET s, const struct sockaddr* name, int namelen);

int closesocket(SOCKET s);

int connect(SOCKET s, const struct sockaddr * name, int namelen);

int getsockname(SOCKET s, struct sockaddr* name, int* namelen);

int getsockopt(SOCKET s, int level, int optname, char* optval,int* optlen);

int ioctlsocket(SOCKET s, long cmd, u_long* argp);

int listen(SOCKET s, int backlog);

int recv(SOCKET s, char* buf, int len, int flags);

int recvfrom(SOCKET s, char* buf, int len, int flags, struct sockaddr* from, int* fromlen);

int select(int nfds, fd_set* readfds, fd_set* writefds, fd_set* exceptfds, const struct timeval* timeout);

int send(SOCKET s, const char* buf, int len, int flags);

int sendto(SOCKET s, const char* buf, int len, int flags, const struct sockaddr* to, int tolen);

int setsockopt(SOCKET s, int level, int optname, const char* optval, int optlen);

int shutdown(SOCKET s, int how);



int gethostname(char* name, int namelen);

struct hostent* gethostbyaddr(const char* addr,int len,int type);
struct hostent* gethostbyname(const char* name);

int GetNameInfoA(const struct sockaddr * sa, DWORD salen, char * host, DWORD hostlen, char * serv,DWORD servlen,int flags);
int getaddrinfo(const char* nodename,const char* servname,const struct addrinfo* hints,PADDRINFOA * res);
void freeaddrinfo(PADDRINFOA pAddrInfo);
]]

At this point, you have all you need to interop with the socket interfaces of whatever platform you’re running on, in this case Windows.

local ffi = require "ffi"
local lib = ffi.load("ws2_32")
local SocketType = ffi.new("struct SocketType");
local Family = ffi.new("struct FamilyType");
local Protocol = ffi.new("struct Protocol");

local sock = lib.socket(Family.AF_INET, SocketType.SOCK_STREAM, Protocol.IPPROTO_TCP);

That’s a good start, and if you don’t want to get any more hand holding from the Lua environment, you’re done.

Of particular note, there is no “C” side to this. The LuaJIT environment dynamically loads the winsock library into the running address space, and makes the calls to the appropriate functions magically. There’s not a ton of marshalling going on, because the FFI layer represents data structures directly in their ‘C’ native form. If you have more complex structures on the script side, you’ll have to do some marshalling. There’s no separate “interop library” that needs to be compiled and shipped with your executable, nothing to get in the way, you’re just done. This is quite different from the native Lua language, and JavaScript, which require you to write some C code to perform interop with these native libraries. Same goes for languages such as C#, where you have to create the data structures for marshalling, and create the P/Invoke signatures for each and every function. A very time consuming and error prone exercise to be sure.

At any rate, this is what’s required for basic interop. If the API is as simple as the Berkeley sockets API, then you might not want to go any further, but, as we’ll see, going just a little bit further will make life a lot easier when you start composing your application.

Until next time…
 


LuaJIT to Khronos

I’ve been sitting on some code for quite some time.   Hording it as it were, for my own devices.  As my current desktop machine seems to be failing more frequently, I thought it would be a good time to do some spring cleaning and put up some more code.

The current round has to do with things related to the Khronos Group of APIs.  The Khronos Group is one of those industry bodies setup for collaboration across multiple companies.  Probably the most famous of the APIs they’ve dealt with to date is the OpenGL API.  The followon to that was the OpenGL ES API.  Then along came one of their own the OpenVG, which deals with Vector graphics.  Rounding out the set, you have OpenCL for distributed computing, and OpenMAX for Audio/Video stuff.

Since the group was originally founded by companies interested in GPUs, those are the APIs that are the most mature.  Well, recently, playing with the Raspberry Pi, I found that OpenGL ES/EGL and OpenVG, OpenMAX are the only way to get at hardware acceleration on the device.  There are already a couple of examples of OpenVG running on the Pi.  One Example of using OpenVG on the Pi was done by Anthony Starks.  If for no other reason, you’ve got to check out the code because of the author’s name…

Anthony Starks has several examples of how to use OpenVG, and how to bind and use the Go language to do some really nice stuff.  Well, no reason for those Go programmers to have all the fun, so I decided to make a really convenient binding to LuaJIT for OpenVG.  But, why stop there.  Why not all the APIs?  Well, I already had the OpenCL, and OpenGL laying about, so I’ve put them all together into a single repository, the LJIT2Khronos project.

One of the first things that I realized way back, was that these things can only be useful if they can be demonstrated.  Well, one of the first things you need to do to use any of these APIs is establish a connection to the windowing system.  In this case, I had to include some bindings to Windows APIs as well.  There is a Win32 directory, which contains some basic bindings for User32, GDI, Kernel32, Windows types and the like.  More than enough to get a basic window up on the screen, and certainly enough to get a window handle and device context that is required for the various APIs.

What do you get for your troubles?

Let’s say you want to create a window, which has a ‘frame rate’ of 3 frames per second, and a routine you specify will be called, and rendering will occur…

local NativeWindow = require "User32Window"
ocal EGL = require "egl_utils"

local OpenVG = require "OpenVG"
local OpenVGUtils = require "OpenVG_Utils"
local ogm = require "OglMan"
local RenderClass = require"Drawing"

-- Setup the "display" object
local dpy = EglDisplay.new(nil, EGL.EGL_OPENVG_API);

local screenWidth = 640;
local screenHeight = 480;

-- Create the renderer class which
-- will handle drawing tasks
local Renderer = RenderClass.new(dpy, screenWidth, screenHeight);

local tick = function(ticker, tickCount)
	print("Tick: ", tickCount);

	Renderer:Begin();

	Renderer:Background(0, 0, 0);	  -- Black background

	drawEllipses();
	drawRectangles();
	drawLines();

	Renderer:End();
end

-- Create a window
local winParams = {
	ClassName = "EGLWindow",
	Title = "EGL Window",
	Origin = {10,10},
	Extent = {screenWidth, screenHeight},
	FrameRate = 3,

	OnTickDelegate = tick;
};

-- create an EGL window surface
local win = NativeWindow.new(winParams)
assert(win, "Window not created");

local surf = dpy:CreateWindowSurface(win:GetHandle())

-- Make the context current
dpy:MakeCurrent();

glViewport(0,0,screenWidth,screenHeight);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();

local ratio = screenWidth/screenHeight;
glFrustum(-ratio, ratio, -1, 1, 1, 10);

-- Now, finally do some drawing
win:Run();

-- free up the display
dpy:free();

I’ll be the first to admit, this is still quite a lot to type to do some very basic rendering, but this is way less typing that you’d have to type on your own. I’ll write up a separate article that goes into more depth of how to use this OpenVG stuff, for now, suffice to say, it will work on whatever environment has the OpenVG and EGL libraries available (at least Windows, Raspberry Pi, Linux in general).

But of course, there’s more. This repository also include the OpenGL bindings, and the OpenCL stuff as well. Those bindings are fairly mature, at least I’ve written a couple of apps using them.

So, there you have it. Some fairly complete bindings to these various Khronos Group APIs. Getting them off my machine, and into the interwebs gives me some relief. As well, I expect to make full use of them across the multiple environments in which they are available.


Follow

Get every new post delivered to your Inbox.

Join 45 other followers