Optimize At the Right Time and Level

It takes a long time to do the simplest things correctly. You want to receive some bytes on a socket?
int recv(SOCKET s,char *buf,int len,int flags);

No problem right? How about that socket? How did that get setup? Is it right for low latency, big buffers, small buffers? what? And what about that buffer? and lord knows which are the best flags to use? First question I guess is, ‘for what purpose’?

For the longest time, I have known that my IP sockets were not the most performant. Through rounds of trying to get low level networking, all the way up through http handling working correctly, I deliberately made some things extremely brain dead simple.

One of those things was my readline() function. this is the workhorse of http processing. Previously, I would literally read a single character at a time, looking for those line terminators. Which meant making that recv() call a bazillion times during normal http processing.

Well, it has worked great, reduces the surface area to help flesh out various bugs, and allowed me to finally implement the IO Completion Port based networking stack that I now have. It works well enough. But, it’s time to optimize. I am now about to add various types of streams such as compressed, encrypted, and the like. For that I need to actually buffer up chunks at a time, not read single characters at a time.

So, I finally optimized this code.

function IOCPNetStream:refillReadingBuffer()
	print("IOCPNetStream:RefillReadingBuffer(): ",self.ReadingBuffer:BytesReadyToBeRead());

	-- if the buffer already has data in it
	-- then just return the number of bytes that
	-- are currently ready to be read.
	if self.ReadingBuffer:BytesReadyToBeRead() > 0 then
		return self.ReadingBuffer:BytesReadyToBeRead();
	end

	-- If there are currently no bytes ready, then we need
	-- to refill the buffer from the source
	local err
	local bytesread

	bytesread, err = self.Socket:receive(self.ReadingBuffer.Buffer, self.ReadingBuffer.Length)

	print("-- LOADED BYTES: ", bytesread, err);

	if not bytesread then
		return false, err;
	end

	if bytesread == 0 then
		return false, "eof"
	end

	self.ReadingBuffer:Reset()
	self.ReadingBuffer.BytesWritten = bytesread

	return bytesread
end

Basically, whenever some bytes are needed within the stream, the refillReadingBuffer() function is called. If there are still bytes sitting in the buffer, then those bytes are used. If it’s empty, then new bytes are read in. The buffer can be whatever size you want, so if your app is better off reading 1500 bytes at a time, then make it so. If you find performance is best at 4096 bytes per read, then set it up that way.

Now that pre-fetching is occuring, all operations related to reading bytes from the source MUST go through this buffer, or bytes will be missed. So, the readLine() function looks like this now:

function IOCPNetStream:readByte()
	-- First see if we can get a byte out of the
	-- Reading buffer
	local abyte,err = self.ReadingBuffer:ReadByte()

	if abyte then
		return abyte
	end

	-- If we did not get a byte out of the reading buffer
	-- try refilling the buffer, and try again
	local bytesread, err = self:refillReadingBuffer()

	if bytesread then
		abyte, err = self.ReadingBuffer:ReadByte()
		return abyte, err
	end

	-- If there was an error
	-- then return that error immediately
	print("-- IOCPNetStream:readByte, ERROR: ", err)
	return false, err
end

function IOCPNetStream:readOneLine(buffer, size)
	local nchars = 0;
	local ptr = ffi.cast("uint8_t *", buff);
	local bytesread
	local byteread
	local err

	while nchars < size do
		byteread, err = self:readByte();

		if not byteread then
			-- err is either "eof" or some other socket error
			break
		else
			if byteread == 0 then
				break
			elseif byteread == LF then
				--io.write("LF]\n")
				break
			elseif byteread == CR then
				-- swallow it and continue on
				--io.write("[CR")
			else
				-- Just a regular character, so save it
				buffer[nchars] = byteread
				nchars = nchars+1
			end
		end
	end

	if err and err ~= "eof" then
		return nil, err
	end

	return nchars
end

The readLine() is still reading a byte at a time, but this time, instead of going to the underlying socket directly, it’s calling the stream’s version of readByte(), which in turn is reading a single byte from the pre-fetched buffer. That’s a lot faster than making the recv() system call, and when the buffer runs out of bytes, it will be automatically refilled, until there is a socket error.

Well, that’s just great. The performance boost is one thing, but there is another benefit. Now that I am reading in controllable chunks at a time, I could up the size to say 16K for the first chunk read for an http request. That would likely get all the data for whatever headers there are included in the buffer up front. From there I could do simple pointer offset/size objects, instead of creating actual strings for every line (creates copies). That would be a great optimization.

I can wait for further optimizations. Getting the pre-fetch working, and continuing to work with IOCP is a more important optimization at this time.

And so it goes. Don’t optimize too early until you really understand the performance characteristics of your system. Then, only belatedly, do what minimal amount you can to achieve the specific goal you are trying to achieve before moving on.


Exposing Yourself on the Interwebs – Baby Steps

I have begun a little experiement. Over the past year, I have written quite a bit of code related to networking. I have prototyped a lot of different things, and actually used some of it in a production environment. I have written http parser, websocket implementation, xml parser, and myriad wrappers for standard libraries.

So, now the experiment. I want to expose a few web services, running from my home, running on nothing but code that I have written (except for core OS code). How hard could it be?

Yesterday, I packaged up a bit of TINN and put it on my desktop machine to run a very simple http static content server. It’s a Windows box, and of course I could simply run IIS, but that’s a bit of a cheat. So, I started the service:

tinn main.lua

And that’s that. According to my intentions, the only content that should be served up is stuff that’s sitting within the ‘./wwwroot’ directory relative to where I started the service running. This is essentially the server that I outlined previously.

I am an average internet consumer when it comes to home network setup. I have an ASUS router that’s pretty fast and decent with its various security holes and strengths. At home I am sitting behind it’s “firewall” protection. But, I do want to expose myself, so what do I do?

Well, I must change the configuration on the router. First of all, I need to get a DNS entry that will point a certain URL to my router. Luckily, the ASUS router has a dynamic DNS service built right in. So, I choose a name (I’ll show that later), and simply select a button, and “Apply”. OK. Now my router is accessible on a well known url/ip: chosenname.asuscomm.com I confirm this by typing that into my we browser, and sure enough, I can connect to my browser over the internet. I am prompted for the admin password, and I’m in!

So, the first scary thought is, I hope I chose a password that is relatively strong. I hope I didn’t use the default ‘password’, like so many people do.

Alright. Now I know my router, and thus my network in general, can be accessed through a well known public url. The next thing I need to do is set a static IP address for my web server machine. This isn’t strictly necessary, but as I’m about to enable port forwarding, it will just be easier to use a static IP within my home domain. I set it up as: 192.168.1.4 The HP printer is 1, the Synology box is 2, and everything else gets random numbers.

Next is port forwarding. What I want is to have the web server machine, which is listening on port 8080, receive any traffice coming from the well known url headed to port 8080. I want the following URL to land on this machine and be handled by the web server code that’s running:

http://chosenname.asuscomm.com:8080/index.htm

So, I set that configuration in the router, and press ‘Apply’…

Back to my browser, type in that URL and voila! It works!

Now I take a pause at this point and ask myself a few questions. First of all, am I really confident enough in my programming skills to expose myself to the wide open internet like this? Second, I ask myself if my brother, or mother could have worked their way through a similar exercise?

Having gotten this far, I’m feeling fairly confident, so I let it run overnight to see what happens. Mind you, I’m not accessing it myself at night, but I wanted to see what would happen just having my router and server hanging out there on the internet.

I cam back in the morning, and checked the console output to see what happened, if anything. What I saw was this:

NO RESPONSE BODY: ./wwwroot/HNAP1

Hah! It happened twice, then never more. Well, that HNAP1 trick is a particular vulnerability to home routers which are configured by default to do automatic configuration stuff. D-Link routers, in particular, are vulnerable to an attack whereby they can be compromised through a well scripted Soap exchange, starting from here.

I’ve turned off that particular feature of my router, so, I think I luckily dodged that particular bullet.

The funny thing is though, I didn’t advertise my url, and I didn’t tell anyone that there would be an http server hanging out on port 8080. This happened within 8 hours of my service going live. So, it tells you what a teaming pool hackedness the internet truly is.

The other thing I have learned thus far is that I need a nice logging module. I just so happen to be printing out the URL of each request that comes in, but I should like to have the IP address of the requester, and some more interesting information that you typically find in web logs. So, I’ll have to add that module.

Having started down this path, I have another desire as well. My desktop machine is way too loud, and consumes too much power to be an always on web server. So, I’ve ordered the parts to build a nice Shuttle PC which will serve this purpose. It’s a decent enough machine. 256Gb SSD, i7, onboard video. I don’t need it to be a gaming rig, nor an HTPC, nor serve any other purpose. It just needs to run whatever web services I come up with, and it must run Windows. This goes towards the purpose built argument I made about the Surface 2. A machine specific to a specific job, without concern for any other purpose it might have. You could argue that I should just purchase a router that has a built in web server, or just use the Synology box, which will do this just fine. But, my criteria is that I want to write code, tinker about, and it must run Windows.

And so it begins. I’ve got the basic server up and running, and I’m already popular enough to be attacked. Now I am confident to add some features and content over time to make it more interesting.


Hurry Up and Wait – TINN Timing

Moving right along. First I needed to do basic networking. Starting at the lowest level of socket interaction, advancing up the stack through specific TCP and HTTP uses. Then back down to UDP.

With basic async networking covered, the next thing that comes up is timing. The general socket IO is covered. You can basically build an entire service using nothing more than asynchronous socket calls. But, most servers are more interesting than that. There are situations where you’ll want to cancel out an async operation if it’s taking too long, or you might want to perform some operation over time, repeatedly. So, clearly the TINN system needs some concept of time management.

Here’s the kind of code I would like to write in order to do something every 500 milliseconds:

require ("IOProcessor");

local test_wait = function(interval)
  while true do
    wait(interval);
    print(string.format("interval: %d", IOProcessor.Clock:Milliseconds()));
  end
end

run(test_wait)

Basically, there is a new ‘wait()’ function. You give it a number of milliseconds, and it will suspend the coroutine you’re currently in for the given amount of time. This capability comes courtesy of some changes to the base scheduler. The changes are the following:

wait = function(millis)
  local nextTime = IOProcessor.Clock:Milliseconds() + millis;
  return IOProcessor:yieldUntilTime(nextTime);
end

IOProcessor.yieldUntilTime = function(self, atime)
  if self.CurrentFiber ~= nil then
    self.CurrentFiber.DueTime = atime;
    tabutils.binsert(self.FibersAwaitingTime, self.CurrentFiber, compareTaskDueTime);

    return self:yield();
  end
  return false;
end

The yieldUntilTime() function will take the currently running fiber (coroutine) and put it into the list of FibersAwaitingTime. This is simply a table which is maintained in sorted order from lowest to highest. Once a fiber is placed on this list, it is no longer in the list of currently active fibers. it will sit on this list until it’s DueTime has passed.

The main scheduling loop will step through the fibers that are sitting in the AwaitingTime list using the following:

IOProcessor.stepTimeEvents = function(self)
  local currentTime = self.Clock:Milliseconds();

  -- traverse through the fibers that are waiting
  -- on time
  local nAwaiting = #self.FibersAwaitingTime;
  for i=1,nAwaiting do

    local fiber = self.FibersAwaitingTime[1];
    if fiber.DueTime <= currentTime then
      -- put it back into circulation
      fiber.DueTime = 0;
      self:scheduleFiber(fiber);

      -- Remove the fiber from the list of fibers that are
      -- waiting on time
      table.remove(self.FibersAwaitingTime, 1);
    end
  end
end

Basically, step through the list of fibers that are waiting for their wait time to expire. For all those that qualify, put them back into the list of active fibers by calling the ‘shcheduleFiber()’ function.

This begins to get very interesting I think. Of course once you create a timer, or even async i/o, you probably also want the ability to cancel such operations. But, that’s another matter.

Doing this little exercise, the scheduler begins to take on some more of the attributes of what you might find in the core OS. The big difference is that everything is done in the Lua space, and does not rely on OS primitives, so it is actually somewhat portable to other architectures. That’s a nice idea to have such a nice scheduler available across multiple LuaJIT environments.

While going through this exercise, I also started looking at other features of schedulers, thinking about priorities, and other factors that might go into a really good scheduler. So, at least one thing becomes apparent to me. Having this scheduler readily available and transformable in the Lua space makes it relatively easy to try out different scheduling techniques that will match the particular situation at hand. There is even the possibility of changing the scheduling algorithm dynamically based on attributes of the running system.

Exciting times.


Creating Udp Echo Service in TINN

These days, networking applications utilize more than a single protocol at a time. My current server, which is a software router of sorts, needs to support TCP/IP as well as Udp channels at the same time. On top of the TCP is HTTP, but that’s already been covered.

Here I present the support for the Udp protocol. Udp differs from TCP in a few ways, key being the lack of “connection”. Every single packet is individually addressed and sent to the intended recipient. Of course you can cache the DNS lookup, so that the delivery of the packets themselves is blazing fast. There’s no redundancy, no ack, no error recovery.

When TCP/IP/UDP were first created, the error rate was probably much higher than it is today. These days, depending on the network, Udp might be a perfectly reasonable choice. The trick, from a TINN perspective, is to make programming with either protocol look relatively the same. For the most part, this just means using the same mechanism for async calls.

Here’s what the server code looks like, minus the error recovery logic:

local ffi = require("ffi");

local IOProcessor = require("IOProcessor");
local IOCPSocket = require("IOCPSocket");

-- Setup the server socket
local socket, err = IOCPSocket:create(AF_INET, SOCK_DGRAM, 0);
local success, err = socket:bindToPort(9090);

-- Setup buffers to be used to receive data
local bufflen = 1500;
local buff = ffi.new("uint8_t[?]", bufflen);
local from = sockaddr_in();
local fromLen = ffi.sizeof(from);


-- The primary application loop
local loop = function()

  while true do
    local bytesread, err = socket:receiveFrom(from, fromLen, buff, bufflen);

    if not bytesread then
      return false, err;
    end

    -- echo back to sender
    local bytessent, err = socket:sendTo(from, fromLen, buff, bufflen);
  end
end

run(loop);

And that’s about all there is to it. In this particular case, a single packet is received, and that packet is immediately sent back to whomever sent it. In this The receiveFrom(), and sendTo(), do in fact use IOCompletion ports, and, for a more complex server that actually does work, you might formulate this differently, utilizing multiple receive buffers, and spawning a task for each packet. But, this is the most basic form of doing Udp handling with TINN.

The socket:receiveFrom() implementation is pretty much the same as that for socket:receive(), except for the addition of the address information so you can see who sent the message, and so you can subsequently return the packet to the source.

This code is not particularly hard, and if you were programming in ‘C’, it would look pretty much the same. The key benefit though comes from the automatic semi-concurrency which is possible, without really changing the code that much. This is what makes it easier to integrate and handle.


How About that Web Server

Although echo servers are the “Hello, World” of network programs, http servers are much more useful.

local HttpServer = require("HttpServer");
local StaticService = require("StaticService");

-- a simple ping template
local pingtemplate = [[
<html>
  <head>
    <title>HTTP Server</title>
  </head>
  <body>ping</body>
</html>
]]

-- Called every time there is a new http request
local OnRequest = function(param, request, response)
  if request.Url.path == "/ping" then
    response:writeHead("200")
    response:writeEnd(pingtemplate);
  else
    local filename = './wwwroot'..request.Url.path;
    StaticService.SendFile(filename, response);
  end
end

local server = HttpServer(8080, OnRequest);
server:run();

This looks a little bit like the echo service which was based on the raw socket server. Instead of “OnAccept”, this one implements “OnRequest”. The ‘request’ is an instance of a ‘WebRequest’ object, which contains all the stuff you’d expect in a WebRequest (resource, headers…). The routine is also handed a ‘WebResponse’ object as well. This is a simple convenience because it just wraps the netstream that is associated with the request object.

The HttpServer code itself looks like this:

local SocketServer = require("SocketServer")

local IOCPSocket = require("IOCPSocket")
local IOCPNetStream = require("IOCPNetStream");
local WebRequest = require("WebRequest");
local WebResponse = require("WebResponse");
local URL = require("url");

HttpServer = {}
setmetatable(HttpServer, {
  __call = function(self, ...)
    return self:create(...);
  end,
});

HttpServer_mt = {
  __index = HttpServer;
}

HttpServer.init = function(self, port, onRequest, onRequestParam)
  local obj = {
    OnRequest = onRequest;
    OnRequestParam = onRequestParam;
  };
  setmetatable(obj, HttpServer_mt);
	
  obj.SocketServer = SocketServer(port, HttpServer.OnAccept, obj);

  return obj;
end

HttpServer.create = function(self, port, onRequest, onRequestParam)
  return self:init(port, onRequest, onRequestParam);
end


HttpServer.OnAccept = function(self, sock)
  local socket = IOCPSocket:init(sock, IOProcessor);
  local stream, err = IOCPNetStream:init(socket);

  if self.OnRequest then
    local request, err  = WebRequest:Parse(stream);

    if request then
      request.Url = URL.parse(request.Resource);
      local response = WebResponse:OpenResponse(request.DataStream)
      self.OnRequest(self.OnRequestParam, request, response);
    else
      print("HandleSingleRequest, Dump stream: ", err)
    end
  else
    -- do nothing and let the socket close
  end
end

HttpServer.run = function(self)
  return self.SocketServer:run();
end

return HttpServer;

The ‘OnAccept()’ function this time around takes the unadorned socket, wraps it into a nicer socket object (so the io completion stuff can happen), and then uses the HttpRequest object to parse what’s on the stream. If the request is found the be intact, a response object is created and the two are handed off to the ‘OnRequest’ function if it exists.

This construct allows you to compose a webserver to meet your needs. You can spawn wherever you want, to run whichever part you want to run in parallel. At the top end, the consumer of this object won’t know the different, and can thus just handle the individual requests.

So, what’s so good about all this?
Well, first of all the TINN runtime, all up, is about 3Mb.
What you get for that is access to pretty much all the interesting stuff that Windows APIs have to offer. Whether it be network, OS, graphics, crypto, or multi-thread related, it’s all available right there in the little package.

This is good when you want to start creating simple REST based web services for this and that. For example, if you want to expose a webcam feed from your PC, or your PC acts as a hub for various wireless “internet of things” devices around your home, or whatever, you just write some lua code, without worrying about interop libraries, compiling, or anything else more interesting. Just a little script and away you go.


Name That Framework – Echo Service in two lines of code

… and here they are:

SocketServer = require("SocketServer");
SocketServer(9090):run(function(s, b, l)  s:send(b, l); end);

Back in the day there was this gameshow called “Name That Tune”, where contestants would be told a clue about a song, then they would bid on the fewest number of notes it would take for them to name the tune. Once the bids were fixed, the orchestra would play the number of notes, and the contestant would have to correctly guess the name of the tune.

So, above are two lines of code which implement a highly scalable “echo” service. Can you name the framework?

It’s TINN of course!

Here’s a more reasonable rendition of the same:

local SocketServer = require("SocketServer");

local function OnData(socket, buff, bufflen)
  socket:send(buff, bufflen);
end;

local server = SocketServer(9090);
server:run(OnData)

Simply put, a SocketServer is a generic service that will listen on a particular port that you specify. Whenever it receives any data on the port, it will call the supplied ‘OnData’ function. Each time ‘OnData’ is called, it could be with a different socket and data. You could build a fairly rudimentary http server on top of this if you like. what’s most important to me is the fact that you don’t have to write any of the underlying low level networking code. Nothing about accept, IO Completion ports, etc. Just, call me when some data comes in on this specified port.

The SocketServer code itself looks like this:

local ffi = require("ffi");
local IOProcessor = require("IOProcessor");
local IOCPSocket = require("IOCPSocket");

IOProcessor:setMessageQuanta(nil);

SocketServer = {}
setmetatable(SocketServer, {
  __call = function(self, ...)
    return self:create(...);
  end,
});

SocketServer_mt = {
  __index = SocketServer;
}

SocketServer.init = function(self, socket, datafunc)
--print("SocketServer.init: ", socket, datafunc)
  local obj = {
    ServerSocket = socket;
    OnData = datafunc;
  };

  setmetatable(obj, SocketServer_mt);

  return obj;
end

SocketServer.create = function(self, port, datafunc)
  port = port or 9090;

  local socket, err = IOProcessor:createServerSocket({port = port, backlog = 15});

  if not socket then
    print("Server Socket not created!!")
    return nil, err
  end

  return self:init(socket, datafunc);
end

-- The primary application loop
SocketServer.loop = function(self)
  local bufflen = 1500;
  local buff = ffi.new("uint8_t[?]", bufflen);

  while true do
    local sock, err = self.ServerSocket:accept();

    if sock then
      local socket = IOCPSocket:init(sock, IOProcessor);
      local bytesread, err = socket:receive(buff, bufflen);

      if not bytesread then
        print("RECEIVE ERROR: ", err);
      elseif self.OnData ~= nil then
        self.OnData(socket, buff, bytesread);
      else
        socket:closeDown();
        socket = nil
      end
    else
       print("Accept ERROR: ", err);
    end

    collectgarbage();
  end
end

SocketServer.run = function(self, datafunc)
  if datafunc then
    self.OnData = datafunc;
  end

  IOProcessor:spawn(self.loop, self));
  IOProcessor:run();
end

return SocketServer;

This basic server loop is good for a lot of little tiny tasks where you just need to put a listener on the front of something. No massive scaleout, not multi-threading, just good straight forward stuff. But, it’s already plumbed to go big too.

Here’s a slight modification:

SocketServer.handleAccepted = function(self, sock)
  local handleNewSocket = function()
    local bufflen = 1500;
    local buff = ffi.new("uint8_t[?]", bufflen);
    
    local socket = IOCPSocket:init(sock, IOProcessor);

    if self.OnAccepted then
    else
      local bytesread, err = socket:receive(buff, bufflen);
  
      if not bytesread then
        print("RECEIVE ERROR: ", err);
      elseif self.OnData ~= nil then
        self.OnData(socket, buff, bytesread);
      else
        socket:closeDown();
        socket = nil
      end
    end
  end

  return IOProcessor:spawn(handleNewSocket);
end

-- The primary application loop
SocketServer.loop = function(self)

  while true do
    local sock, err = self.ServerSocket:accept();

    if sock then
      self:handleAccepted(sock);
    else
       print("Accept ERROR: ", err);
    end

    collectgarbage();
  end
end

In the main loop, instead of doing the processing directly, call the ‘self:handleAccepted()’ function. That function in turn will spawn an internal function to actually handle the request. Everything else remains the same.

If you do it this way, then the ‘OnData’ will run cooperatively with other accepts that might be going on. Also, this highlights, in an invisible way, that the ‘accept()’ call is actually cooperative. Meaning, since IO completion ports are being used int he background, the accept call is actually async. As soon as it issues the accept, that coroutine will wait in place until another socket comes in. Meanwhile, the last socket that was being handled will get some time slice to do what it wants.

And thus, you get massive scale (thousands of potential connections) from using this fairly simple code.

Well, those are the basics. Now that I have plumbed TINN from the ground up to utilize the IO Completion Ports, I can start to build upon that. There are a couple of nice benefits to marrying IOCP and Lua coroutines. I’ll be exploring this some more, but it’s basically a match made in heaven.


Screen Capture for Fun and Profit

In Screen Sharing from a Browser I wrote about how relatively easy it is to display a continuous snapshot of a remote screen, and even send mouse and keyboard events back to it.  That was the essence of modern day browser based screen sharing.  Everything else is about compression for bandwidth management.

In this article, I’ll present the “server” side of the equation.  Since I’ve discovered the ‘sourcecode’ bracket in WordPress, I can even present the code with line numbers.  So, here in its entirety is the server side:

 


local ffi = require "ffi"

local WebApp = require("WebApp")

local HttpRequest = require "HttpRequest"
local HttpResponse = require "HTTPResponse"
local URL = require("url")
local StaticService = require("StaticService")

local GDI32 = require ("GDI32")
local User32 = require ("User32")
local BinaryStream = require("core.BinaryStream")
local MemoryStream = require("core.MemoryStream")
local WebSocketStream = require("WebSocketStream")
local Network = require("Network")

local utils = require("utils")
local zlib = require ("zlib")

local UIOSimulator = require("UIOSimulator")

--[[
	Application Variables
--]]
local ScreenWidth = User32.GetSystemMetrics(User32.FFI.CXSCREEN);
local ScreenHeight = User32.GetSystemMetrics(User32.FFI.CYSCREEN);

local captureWidth = ScreenWidth;
local captureHeight = ScreenHeight;

local ImageWidth = captureWidth;
local ImageHeight = captureHeight;
local ImageBitCount = 16;

local hbmScreen = GDIDIBSection(ImageWidth, ImageHeight, ImageBitCount);
local hdcScreen = GDI32.CreateDCForDefaultDisplay();

local net = Network();

--[[
	Application Functions
--]]
function captureScreen(nWidthSrc, nHeightSrc, nXOriginSrc, nYOriginSrc)
  nXOriginSrc = nXOriginSrc or 0;
  nYOriginSrc = nYOriginSrc or 0;

  -- Copy some of the screen into a
  -- bitmap that is selected into a compatible DC.
  local ROP = GDI32.FFI.SRCCOPY;

  local nXOriginDest = 0;
  local nYOriginDest = 0;
  local nWidthDest = ImageWidth;
  local nHeightDest = ImageHeight;
  local nWidthSrc = nWidthSrc;
  local nHeightSrc = nHeightSrc;

  GDI32.Lib.StretchBlt(hbmScreen.hDC.Handle,
    nXOriginDest,nYOriginDest,nWidthDest,nHeightDest,
    hdcScreen.Handle,
    nXOriginSrc,nYOriginSrc,nWidthSrc,nHeightSrc,
    ROP);

  hbmScreen.hDC:Flush();
end

-- Serve the screen up as a bitmap image (.bmp)
local getContentSize = function(width, height, bitcount, alignment)
  alignment = alignment or 4

  local rowsize = GDI32.GetAlignedByteCount(width, bitcount, alignment);
  local pixelarraysize = rowsize * math.abs(height);
  local filesize = 54+pixelarraysize;
  local pixeloffset = 54;

  return filesize;
end

local filesize = getContentSize(ImageWidth, ImageHeight, ImageBitCount);
local memstream = MemoryStream.new(filesize);
local zstream = MemoryStream.new(filesize);

local writeImage = function(dibsec, memstream)
  --print("printImage")
  local width = dibsec.Info.bmiHeader.biWidth;
  local height = dibsec.Info.bmiHeader.biHeight;
  local bitcount = dibsec.Info.bmiHeader.biBitCount;
  local rowsize = GDI32.GetAlignedByteCount(width, bitcount, 4);
  local pixelarraysize = rowsize * math.abs(height);
  local filesize = 54+pixelarraysize;
  local pixeloffset = 54;

  -- allocate a MemoryStream to fit the file size
  local streamsize = GDI32.GetAlignedByteCount(filesize, 8, 4);

  memstream:Seek(0);

  local bs = BinaryStream.new(memstream);

  -- Write File Header
  bs:WriteByte(string.byte('B'))
  bs:WriteByte(string.byte('M'))
  bs:WriteInt32(filesize);
  bs:WriteInt16(0);
  bs:WriteInt16(0);
  bs:WriteInt32(pixeloffset);

  -- Bitmap information header
  bs:WriteInt32(40);
  bs:WriteInt32(dibsec.Info.bmiHeader.biWidth);
  bs:WriteInt32(dibsec.Info.bmiHeader.biHeight);
  bs:WriteInt16(dibsec.Info.bmiHeader.biPlanes);
  bs:WriteInt16(dibsec.Info.bmiHeader.biBitCount);
  bs:WriteInt32(dibsec.Info.bmiHeader.biCompression);
  bs:WriteInt32(dibsec.Info.bmiHeader.biSizeImage);
  bs:WriteInt32(dibsec.Info.bmiHeader.biXPelsPerMeter);
  bs:WriteInt32(dibsec.Info.bmiHeader.biYPelsPerMeter);
  bs:WriteInt32(dibsec.Info.bmiHeader.biClrUsed);
  bs:WriteInt32(dibsec.Info.bmiHeader.biClrImportant);

  -- Write the actual pixel data
  memstream:WriteBytes(dibsec.Pixels, pixelarraysize, 0);
end

local getSingleShot = function(response, compressed)
  captureScreen(captureWidth, captureHeight);

  writeImage(hbmScreen, memstream);

  zstream:Seek(0);
  local compressedLen = ffi.new("int[1]", zstream.Length);
  local err = zlib.compress(zstream.Buffer,   compressedLen, memstream.Buffer, memstream:GetPosition() );

  zstream.BytesWritten = compressedLen[0];

  local contentlength = zstream.BytesWritten;
  local headers = {
    ["Content-Length"] = tostring(contentlength);
    ["Content-Type"] = "image/bmp";
    ["Content-Encoding"] = "deflate";
  }

  response:writeHead("200", headers);
  response:WritePreamble();
  return response.DataStream:WriteBytes(zstream.Buffer, zstream.BytesWritten);
end

local handleUIOCommand = function(command)

  local values = utils.parseparams(command)

  if values["action"] == "mousemove" then
    UIOSimulator.MouseMove(tonumber(values["x"]), tonumber(values["y"]))
  elseif values["action"] == "mousedown" then
    UIOSimulator.MouseDown(tonumber(values["x"]), tonumber(values["y"]))
  elseif values["action"] == "mouseup" then
    UIOSimulator.MouseUp(tonumber(values["x"]), tonumber(values["y"]))
  elseif values["action"] == "keydown" then
    UIOSimulator.KeyDown(tonumber(values["which"]))
  elseif values["action"] == "keyup" then
    UIOSimulator.KeyUp(tonumber(values["which"]))
  end
end

local startupContent = nil

local handleStartupRequest = function(request, response)
  -- read the entire contents
  if not startupContent then
    -- load the file into memory
    local fs, err = io.open("viewscreen2.htm")

    if not fs then
      response:writeHead("500")
      response:writeEnd();

      return true
    end

    local content = fs:read("*all")
    fs:close();

    -- perform the substitution of values
    -- assume content looks like this:
    -- <!--?hostip? -->:<!--?serviceport?-->
    local subs = {
      ["frameinterval"]	= 300,
      ["hostip"] 			= net:GetLocalAddress(),
      ["capturewidth"]	= captureWidth,
      ["captureheight"]	= captureHeight,
      ["imagewidth"]		= ImageWidth,
      ["imageheight"]		= ImageHeight,
      ["screenwidth"]		= ScreenWidth,
      ["screenheight"]	= ScreenHeight,
      ["serviceport"] 	= Runtime.config.port,
    }
    startupContent = string.gsub(content, "%<%?(%a+)%?%>", subs)
  end

  -- send the content back to the requester
  response:writeHead("200",{["Content-Type"]="text/html"})
  response:writeEnd(startupContent);

  return true
end

--[[
  Responding to remote user input
]]--
local handleUIOSocketData = function(ws)
  while true do
    local bytes, bytesread = ws:ReadFrame()

    if not bytes then
      print("handleUIOSocketData() - END: ", err);
      break
    end

    local command = ffi.string(bytes, bytesread);
    handleUIOCommand(command);
  end
end

local handleUIOSocket = function(request, response)
  local ws = WebSocketStream();
  ws:RespondWithServerHandshake(request, response);

  Runtime.Scheduler:Spawn(handleUIOSocketData, ws);

  return false;
end

--[[
  Primary Service Response routine
]]--
local HandleSingleRequest = function(stream, pendingqueue)
  local request, err  = HttpRequest.Parse(stream);

  if not request then
    -- dump the stream
    --print("HandleSingleRequest, Dump stream: ", err)
    return
  end

  local urlparts = URL.parse(request.Resource)
  local response = HttpResponse.Open(stream)
  local success = nil;

  if urlparts.path == "/uiosocket" then
    success, err = handleUIOSocket(request, response)
  elseif urlparts.path == "/screen.bmp" then
    success, err = getSingleShot(response, true);
  elseif urlparts.path == "/screen" then
    success, err = handleStartupRequest(request, response)
  elseif urlparts.path == "/favicon.ico" then
    success, err = StaticService.SendFile("favicon.ico", response)
  elseif urlparts.path == "/jquery.js" then
    success, err = StaticService.SendFile("jquery.js", response)
  else
    response:writeHead("404");
    success, err = response:writeEnd();
  end

  if success then
    return pendingqueue:Enqueue(stream)
  end
end

--[[
  Start running the service
--]]
local serviceport = tonumber(arg[1]) or 8080

Runtime = WebApp({port = serviceport, backlog=100})

Runtime:Run(HandleSingleRequest);

As a ‘server’ this code is responsible for handling a couple of things. First, it needs to act as a basic http server, serving up relatively static content to get things started. When the user specifies the url http://localhost/screen, the server will respond by sending back the browser code that I showed in the previous article. The function “handleStartupRequest()” performs this operation. The file ‘viewscreen2.htm’ is HTML, but it’s a bit of a template as well. You can delimit a piece to be replaced by enclosing it in a tag such as: . This tag can be replaced by any bit of code that you choose. In this case, I’m doing replacements for the size of the image, the size of the screen, the refreshinterval, and the hostid and port. This last is most important because without it, you won’t be able to setup the websocket.

The other parts are fairly straight forward. Of particular note is the ‘captureScreen()’ function. In Windows, since the dawn of man, there has been GDI for graphics. Good ol’ GDI still has the ability to capture the screen, or a single window, or a portion of the screen. this still works in Windows 8 as well. So, capturing the screen is nothing more that drawing into a DIBSection, and that’s that. Just one line of code.

The magic happens after that. Rather than handing the raw image back to the client, I want to send it out as a compressed BMP image. I could choose PNG, or JPG, or any other format browsers are capable of handling, but BMP is the absolute easiest to deal with, even if it is the most bulky. I figure that since I’m using zlib to deflate it before sending it out, that will be somewhat helpful, and it turns out this works just fine.

The rest of the machinery there is just to deal with being an http server. A lot is hidden behind the ‘WebApp’ and the ‘WebSocket’ classes. Those are good for another discussion.

So, all in, this is about 300 lines of code. Not too bad for a rudimentary screen sharing service. Of course, there’s a supporting cast that runs into the thousands of lines of code, but I’m assuming this as a given since frameworks such as Node and various others exist.

I could explain each and every line of code here, but I think it’s small enough and easy enough to read that won’t be necessary. I will point out that there’s not much difference between sending single snapshots one at a time vs having an open stream and presenting the screen as h.264 or WebM. For that scenario, you just need a library that can capture snapshots of the screen and turn them into the properly encoded video stream. Since you have the WebSocket, it could easily be put to use for that purpose, rather than just receiving the mouse and keyboard events.

Food for thought.