Spelunking Windows – Tokens for fun and profit
Posted: May 22, 2013 Filed under: LuaJIT, Microsoft, System Programming, TINN | Tags: lua, luajit, microsoft, shutdown, token Leave a comment »I want to shutdown/restart my machine programmatically. There’s an API for that:
-- kernel32.dll
BOOL
InitiateSystemShutdownExW(
LPWSTR lpMachineName,
LPWSTR lpMessage,
DWORD dwTimeout,
BOOL bForceAppsClosed,
BOOL bRebootAfterShutdown,
DWORD dwReason);
Wow, it’s that easy?!!
OK. So, I need the name of the machine, some message to display in a dialog box, a timeout, force app closure, reboot or not, and some reason why the shutdown is occuring. That sounds easy enough. So, I’ll just give it a call…
local status = core_shutdown.InitiateSystemShutdownExW( nil, -- nil, so local machine nil, -- no special message 10, -- wait 10 seconds false, -- don't force apps to close true, -- reboot after shutdown ffi.C.SHTDN_REASON_MAJOR_APPLICATION);
And what do I get for my troubles?
> error: (5) ERROR_ACCESS_DENIED
Darn, now I’m going to have to read the documentation.
In the Remarks of the documentation, it plainly states:
To shut down the local computer, the calling thread must have the SE_SHUTDOWN_NAME privilege.
Yah, ok, right then. What’s a privilege? And thus Alice went down into the rabbit’s hole…
As it turns out, there are quite a few concepts in Windows that are related to identity, security, authorization, and the like. As soon as you log into your machine, even if done programmatically, you get this thing called a ‘Token’ attached to your process. The easiest way to think of the token is it’s your electronic proxy and passport. Just like your passport, this token contains some basic identity information about who you are (name, identifying marks…). Some things in the system, such as being able to access a file, can be handled simply by knowing your name. These are simple access rights. But, other things in the system require a ‘visa’, meaning, not only does the operation have to know who you are, but it also needs to know you have the proper permissions to perform the operation you’re about to perform. It’s just like getting a visa stamped into your passport. If I want to travel to India, my passport alone is not enough. I need to get a visa as well. The same is true of this token thing. It’s not enough that I simply have an identity, I must also have a “privilege” in order to perform certain operations.
In addition to having a privilege, I must actually ‘activate’ it. So, yes, the system may have granted me the privilege, but it’s like super powers, you don’t want them to always be active. It’s like when you’re walking down the street in that foreign country you’re visiting. You don’t walk down the street flashing your fancy passport showing everyone the neat visas you have stamped in there. If you do, you’ll likely get a crowd following you trying to relieve you of said passport. So, you generally keep it to yourself, and only flash it when the need arises. So too with token privilege. Yes, you might have the ability to reboot the machine, but you don’t always want to have that privilege enabled, in case some nefarious software so happens to come along to exploit that fact.
Alright, that’s enough analogizing. How about some code. Well, it can be daunting to get your head around the various APIs associated with tokens. To begin with, there is a token associated with the process you’re currently running in, and there is a token associated with every thread you may launch from within that process as well. Generally, you want the process token if you’re single threaded. That’s one API call:
BOOL
OpenProcessToken (
HANDLE ProcessHandle,
DWORD DesiredAccess,
PHANDLE TokenHandle
);
This is one of those standard API calls where you pass in a couple of parameters (ProcessHandle, DesiredAccess), and a ‘handle’ is returned (TokenHandle). You then use the ‘handle’ to make subsequent calls to the various API functions. This is ripe for wrapping up in some nice data structure to deal with it.
I’ve created the ‘Token’ object, as the convenience point. One of the functions in there is this one:
getProcessToken = function(self, DesiredAccess)
DesiredAccess = DesiredAccess or ffi.C.TOKEN_QUERY;
local ProcessHandle = core_process.GetCurrentProcess();
local pTokenHandle = ffi.new("HANDLE [1]")
local status = core_process.OpenProcessToken (ProcessHandle, DesiredAccess, pTokenHandle);
if status == 0 then
return false, errorhandling.GetLastError();
end
return Token(pTokenHandle[0]);
end
One of the important things to take note of when you create a token is the DesiredAccess. What you can do with a token after it is created is somewhat determined by the access that you put into it when you create it. Here are the various options available:
static const int TOKEN_ASSIGN_PRIMARY =(0x0001); static const int TOKEN_DUPLICATE =(0x0002); static const int TOKEN_IMPERSONATE =(0x0004); static const int TOKEN_QUERY =(0x0008); static const int TOKEN_QUERY_SOURCE =(0x0010); static const int TOKEN_ADJUST_PRIVILEGES =(0x0020); static const int TOKEN_ADJUST_GROUPS =(0x0040); static const int TOKEN_ADJUST_DEFAULT =(0x0080); static const int TOKEN_ADJUST_SESSIONID =(0x0100);
For the case where we want to turn on a privilege that’s attached to the token, we will want to make sure the ‘TOKEN_ADJUST_PRIVILEGES’ access right is attached. It also does not hurt to add the ‘TOKEN_QUERY’ access as well. It’s probably best to use the least of these rights as is necessary to get the job done.
Setting a privilege on a token is another bit of work. It’s not hard, but it’s just one of those things where you have to read the docs, and look at a few samples on the internet in order to get it right. Assuming your token has the TOKEN_ADJUST_PRIVILEGES access right on it, you can do the following:
Token.enablePrivilege = function(self, privilege)
local lpLuid, err = self:getLocalPrivilege(privilege);
if not lpLuid then
return false, err;
end
local tkp = ffi.new("TOKEN_PRIVILEGES");
tkp.PrivilegeCount = 1;
tkp.Privileges[0].Luid = lpLuid;
tkp.Privileges[0].Attributes = ffi.C.SE_PRIVILEGE_ENABLED;
local status = security_base.AdjustTokenPrivileges(self.Handle.Handle, false, tkp, 0, nil, nil);
if status == 0 then
return false, errorhandling.GetLastError();
end
return true;
end
Well, that gets into some data structures, and introduces this thing called a LUID, and that AdjustTokenPrivileges function, and… I get tired just thinking about it. Luckily, once you have this function, it’s a fairly easy task to turn a privilege on and off.
OK. So, with this little bit of code in hand, I can now do the following:
local token = Token:getProcessToken(ffi.C.TOKEN_ADJUST_PRIVILEGES); token:enablePrivilege(Token.Privileges.SE_SHUTDOWN_NAME);
This just gets a token that is associated with the current process and turns on the privilege that allows us to successfully call the shutdown function.
In totality:
-- test_shutdown.lua
local ffi = require("ffi");
local core_shutdown = require("core_shutdown_l1_1_0");
local errorhandling = require("core_errorhandling_l1_1_1");
local Token = require("Token");
local function test_Shutdown()
local token = Token:getProcessToken();
token:enablePrivilege(Token.Privileges.SE_SHUTDOWN_NAME);
local status = core_shutdown.InitiateSystemShutdownExW(nil, nil,
10,false,true,ffi.C.SHTDN_REASON_MAJOR_APPLICATION);
if status == 0 then
return false, errorhandling.GetLastError();
end
return true;
end
print(test_Shutdown());
And finally we emerge back into the light! This will now actually work. It’s funny, when I got this to work correctly, I pointed out to my wife that my machine was rebooting without me touching it. She tried to muster a smile of support, but really, she wasn’t that impressed. But, knowing the amount of work that goes into such a simple task, I gave myself a pat on the back, and smiled inwardly at the greatness of my programming fu.
Tokens are a very powerful thing in Windows. Being able to master both the concepts, and the API calls themselves, gives you a lot of control over what happens with your machine.
Spelunking Windows – Exploring the file system
Posted: May 20, 2013 Filed under: LuaJIT, Microsoft, System Programming, TINN | Tags: file system, lua, luajit, windows Leave a comment »I’m working on programs that generally make your stuff available to you “from any device anywhere”. While some of the work is of the generic internet high performance server variety, some of it is much more esoteric. A lot of your “stuff” is in files located on your machine. Widows has a wild array of filesytem related APIs, and it can be a daunting task to try and wrap your head around it to achieve some given task.
I set out a task for myself to turn the file system into a relatively easy to query “database” to get lists of files that meet certain criteria based on their various attributes. Some queries that I’m interested in:
List all directories on my machine
List all files that contain “.lua”
List all files which are hidden
List all system files
List all compressed files
Of course, using the standard search capabilities that are built into Windows, you can perform some of these tasks. This is different from the generally useful ‘find’ command of the command shell as well, as that command is interested in searching the contents of the file.
Yes, there are numerous tools and ways to perform these tasks, so what I am exploring is how you would actually create such a tool from scratch if you were so inclined.
I will begin at the beginning. Exploring the file system doesn’t require that much. Just a few data structures, and 3 function calls. The key components are the following:
(Full Source Available here: https://github.com/Wiladams/TINN/tree/master/tests FileSystemItem.lua, test_filesystem.lua)
ffi.cdef[[
typedef struct _WIN32_FIND_DATAW {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
DWORD nFileSizeHigh;
DWORD nFileSizeLow;
DWORD dwReserved0;
DWORD dwReserved1;
WCHAR cFileName[ MAX_PATH ];
WCHAR cAlternateFileName[ 14 ];
} WIN32_FIND_DATAW, *PWIN32_FIND_DATAW, *LPWIN32_FIND_DATAW;
HANDLE
FindFirstFileExW(
LPCWSTR lpFileName,
FINDEX_INFO_LEVELS fInfoLevelId,
LPVOID lpFindFileData,
FINDEX_SEARCH_OPS fSearchOp,
LPVOID lpSearchFilter,
DWORD dwAdditionalFlags);
BOOL
FindNextFileW(HANDLE hFindFile,
LPWIN32_FIND_DATAW lpFindFileData);
BOOL
FindClose(HANDLE hFindFile);
]]
The three functions; FindFirstFileExW, FindNextFileW, FindClose; combine to form an ‘iteration’ set. The iteration begins with FindFirstFileExW, which also returns the first results, and continues with FindNextFileW. After all is done, you finish up with FindClose, to recover the system resources that were allocated for this little search. Although there are flags to be set to enhance the search capabilitie, I don’t actually want to use them as I can create much more interesting search filters from the Lua side.
First things first though. The ‘handle’ that is created when you call ‘FindFirstFileExW’ must be cleaned up with a matching ‘FindClose’. If you don’t do this, you’ll end up with a leaked handle, which is essentially a resource leak. You don’t want that, so a ‘smart pointer’ is created to deal with the lifetime of that thing.
ffi.cdef[[
typedef struct {
HANDLE Handle;
} FsFindFileHandle;
]]
local FsFindFileHandle = ffi.typeof("FsFindFileHandle");
local FsFindFileHandle_mt = {
__gc = function(self)
core_file.FindClose(self.Handle);
end,
__index = {
isValid = function(self)
return self.Handle ~= INVALID_HANDLE_VALUE;
end,
},
};
ffi.metatype(FsFindFileHandle, FsFindFileHandle_mt);
This little wrapper ensures that whenever the handle is no longer being referenced, it will automatically get cleaned up because the __gc method will call ‘FindClose()’, which is exactly what we want. Here is how it can be used:
local rawHandle = core_file.FindFirstFileExW(lpFileName, fInfoLevelId, lpFindFileData, fSearchOp, lpSearchFilter, dwAdditionalFlags); local handle = FsFindFileHandle(rawHandle);
Alright, so there’s a nicely wrapped handle to the beginning of the iterator. The full iterator looks like this:
-- Iterate over the subitems this item might contain
FileSystemItem.items = function(self, pattern)
pattern = pattern or self:getFullPath().."\\*";
local lpFileName = core_string.toUnicode(pattern);
--local fInfoLevelId = ffi.C.FindExInfoStandard;
local fInfoLevelId = ffi.C.FindExInfoBasic;
local lpFindFileData = ffi.new("WIN32_FIND_DATAW");
local fSearchOp = ffi.C.FindExSearchNameMatch;
local lpSearchFilter = nil;
local dwAdditionalFlags = 0;
local rawHandle = core_file.FindFirstFileExW(lpFileName,
fInfoLevelId,
lpFindFileData,
fSearchOp,
lpSearchFilter,
dwAdditionalFlags);
local handle = FsFindFileHandle(rawHandle);
local firstone = true;
local closure = function()
if not handle:isValid() then
return nil;
end
if firstone then
firstone = false;
return FileSystemItem({
Parent = self;
Attributes = lpFindFileData.dwFileAttributes;
Name = core_string.toAnsi(lpFindFileData.cFileName);
Size = (lpFindFileData.nFileSizeHigh * (MAXDWORD+1)) + lpFindFileData.nFileSizeLow;
});
end
local status = core_file.FindNextFileW(handle.Handle, lpFindFileData);
if status == 0 then
return nil;
end
return FileSystemItem({
Parent = self;
Attributes = lpFindFileData.dwFileAttributes;
Name = core_string.toAnsi(lpFindFileData.cFileName);
});
end
return closure;
end
Refer to the full source to see it in context.
Here is how I would use it:
local depthQuery = {}
depthQuery.traverseItems = function(starting, indentation, filterfunc)
indentation = indentation or "";
starting = starting or FileSystemItem({Name="c:"});
for item in starting:items() do
if filterfunc then
if filterfunc(item) then
if item.Name ~= '.' and item.Name ~= ".." then
io.write(indentation, item.Name, '\n');
end
end
else
if item.Name ~= '.' and item.Name ~= ".." then
io.write(indentation, item.Name, '\n');
end
end
if item:isDirectory() and item.Name ~= "." and item.Name ~= ".." then
depthQuery.traverseItems(item, indentation.." ", filterfunc);
end
end
end
-- Iterate all files/directories on 'c:' drive
depthQuery.traverseItems(FileSystemItem({Name="c:"}));
Scanning the entire c: drive on my recently new lenovo X1 carbon takes about 6 seconds, and there are a few hundred thousand files on it. Most of the time is spent on the string creation and io. On smaller directory searches, the scan is essentially instantaneous.
With these basics in hand, I can now do some more interesting queries.
local function passLua(item)
return item.Name:find(".lua", 1, true);
end
depthQuery.traverseItems(FileSystemItem({Name="c:"}), "", passLua);
In this case, I want to find all the files on my system that have ‘.lua’ in their name. The ‘passLua()’ function is very simple, just doing a string compare. Of course, you have the full power of Lua and any libraries at your disposal. You could even open up the file if you like, and read the contents and decide whether you wanted to pass it along or not. Your filter just returns ‘true’ to pass it along, or ‘false’ to block it.
The FileSystemItem object has some of the file’s properties readily available, so they can be a part of the filtering as well. If I wanted a list of all the directories on my ‘c:’ drive, I would do the following:
local function passDirectory(item)
return item:isDirectory();
end
depthQuery.traverseItems(FileSystemItem({Name="c:"}), "", passDirectory);
Of course, if you were using the .net frameworks, or Java, or Python, or any number of mature libraries in the world, you’d be thinking this was very simple work indeed, and what’s all the fuss? No fuss, really. The key here is showing how simple it really is to do these things, and create your own. I find it useful to do it in Lua because it’s easier than trying to do it with some more involved environments. Once the basic wrappers are in place, spelunking around becomes much easier.
Windows is a vast landscape of mature APIs which have evolved over time to meet the needs of diverse consumers over the years. The APIs are raw, and at times daunting. With a little bit of wrapping though, exploring them becomes much easier. In this particular case, by doing everything from the low level ffi to the higher level iterator, I’ve put in place some rope ladders, pitons, and other core exploration equipment. Now I can reap the benefits and do some more exploration with relative ease and safety.
When Is Software Engineering – Surely a database is required
Posted: May 14, 2013 Filed under: Lua, LuaJIT, Microsoft, Network Programming, System Programming, TINN | Tags: lua, query, tinn Leave a comment »So, I’ve gotten data, and presented it on a web page in JSON format. If that’s not engineering, I’m not sure what is, but way, surely a database of sorts must be involved.
There are plenty of times in my code where I need to quickly filter some ‘records’ performing some activity only on those records that meet a particular criteria. Given that Lua is table based, everything of interest becomes a ‘record’. This applies to “classes” as well as the more garden variety of ‘records’ that might be streaming out of an actual database, or in my recent example, a simple iterator over the services on my machine. It would be nice if I had some fairly straight forward way to deal with those records. What I need is an iterator based query processor.
The requirements are fairly simple. There are three things that are typical of record processors:
record source – The source of data. In my case, the source will be any iterator that feeds out simple key/value table structures.
projection – In database terminology, ‘projection’ is simply the list of fields that you want to actually present in the query results. I might have a record that looks like this:
{name = "William", address="1313 Mockingbird Lane", occupation="enng"}
I might want to just retrieve the name though, so the projection would be simply:
{name = "William"}
filter – I want the ability to only retrieve the records that meet a particular criteria.
I will ignore aggregate functions, such as groupby, sort, and the like as those do not work particularly well with a streaming interface. What follows is a simple implementation of a query processor that satisfies the needs I listed above:
-- Query.lua
--
--[[
the query function receives its parameters as a single table
params.source - The data source. It should be an iterator that returns
table values
params.filter - a function, that receives a single table value as input
and returns a single table value as output. If the record is 'passed' then
it is returned as the return value. If the record does not meet the filter
criteria, then 'nil' will be returned.
params.projection - a function to morph a single entry. It receives a single
table value as input, and returns a single table value as output.
The 'filter' and 'projection' functions are very similar, and in fact, the
filter can also be used to transform the input. They are kept separate
so that each can remain fairly simple in terms of their implementations.
--]]
local query = function(params)
if not params or not params.source then
return false, "source not specified";
end
local nextRecord = params.source;
local filter = params.filter;
local projection = params.projection;
local function closure()
local record;
if filter then
while true do
record = nextRecord();
if not record then
return nil;
end
record = filter(self, record);
if record then
break;
end
end
else
record = nextRecord();
end
if not record then
return nil;
end
if projection then
return projection(self, record);
end
return record;
end
return closure;
end
-- A simple iterator over a table
-- returns the embedded table entries
-- individually.
local irecords = function(tbl)
local i=0;
local closure = function()
i = i + 1;
if i > #tbl then
return nil;
end
return tbl[i];
end
return closure
end
-- given a key/value record, and a filter table
-- pass the record if every field in the filtertable
-- matches a field in the record.
local recordfilter = function(record, filtertable)
for key,value in pairs(filtertable) do
if not record[key] then
print("record does not have field: ", key)
return nil;
end
if tostring(record[key]) ~= tostring(value) then
print(record[key], "~=", value);
return nil;
end
end
return record;
end
return {
irecords = irecords,
recordfilter = recordfilter,
query = query,
}
The ‘query()’ function represents the bulk of the operation. The other two functions help in forming iterators and doing simple queries.
Here is one example of how it can be used:
-- test_query.lua
--
local JSON = require("dkjson");
local Query = require("Query");
local irecords = Query.irecords
local records = {
{name = "William", address="1313 Mockingbird Lane", occupation = "eng"},
{name = "Daughter", address="university", occupation="student"},
{name = "Wife", address="home", occupation="changer"},
}
local test_query = function()
local source = irecords(records);
local res = {}
for record in Query.query {
source = source,
projection = function(self, record)
return {name=record.name, address=record.address, };
end,
filter = function(self, record)
if record.occupation == "eng" then
return record;
end
end
} do
table.insert(res, record);
end
local jsonstr = JSON.encode(res, {indent=true});
print(jsonstr);
end
test_query();
Which results in the following:
[{
"name":"William",
"address":"1313 Mockingbird Lane"
}]
This uses the iterator, a specified filter, and projection. The query() function itself is an iterator, so it will iterate over the data source, and apply the filter and projection to each record, returning results. Nice and easy, very Lua like.
Now that I have a very rudimentary query processor, I can apply it to my web case. So, if I rewrite the web page that’s showing the services on my machine, and can deal with a little bit of query processing:
--[[
Description: A very simple demonstration of one way a static web server
can be built using TINN.
In this case, the WebApp object is being used. It is handed a routine to be
run for every http request that comes in (HandleSingleRequest()).
Either a file is fetched, or an error is returned.
Usage:
tinn staticserver.lua 8080
default port used is 8080
]]
local WebApp = require("WebApp")
local HttpRequest = require "HttpRequest"
local HttpResponse = require "HttpResponse"
local URL = require("url");
local StaticService = require("StaticService");
local SCManager = require("SCManager");
local JSON = require("dkjson");
local Query = require("Query");
local utils = require("utils");
local getRecords = function(query)
local mgr, err = SCManager();
local filter = nil;
local queryparts;
if query then
queryparts = utils.parseparams(query);
filter = function(self, record)
return Query.recordfilter(record, queryparts);
end
end
local res = {};
for record in Query.query {
source = mgr:services(),
filter = filter,
} do
table.insert(res, record);
end
return res;
end
local HandleSingleRequest = function(stream, pendingqueue)
local request, err = HttpRequest.Parse(stream);
if not request then
print("HandleSingleRequest, Dump stream: ", err)
return
end
local urlparts = URL.parse(request.Resource)
local response = HttpResponse.Open(stream)
if urlparts.path == "/system/services" then
local res = getRecords(urlparts.query);
local jsonstr = JSON.encode(res, {indent=true});
--print("echo")
response:writeHead("200")
response:writeEnd(jsonstr);
else
response:writeHead("404");
response:writeEnd();
end
-- recycle the stream in case a new request comes
-- in on it.
return pendingqueue:Enqueue(stream)
end
--[[ Configure and start the service ]]
local port = tonumber(arg[1]) or 8080
Runtime = WebApp({port = port, backlog=100})
Runtime:Run(HandleSingleRequest);
Here I have introduced the ‘getRecords()’ function, which takes care of getting the raw records from the list of services, and running the query to filter for the ones that I might want to see. In this case, a filter is created if the user specifies something interesting in the url. Without a filter, the url is simply:
http://localhost:8080/system/services
In which case you’ll get the list of all services on the machine, regardless of their current running state.
If you wanted to filter for only the services that were currently running, you would specify a URL such as this:
http://localhost:8080/system/services?State=RUNNING
And if you want to look for a particular service, by name, you would do:
http://localhost:8080/system/services?ServiceName=ACPI
[{
"ServiceType":"KERNEL_DRIVER",
"ProcessId":0,
"DisplayName":"Microsoft ACPI Driver",
"ServiceName":"ACPI",
"ServiceFlags":0,
"State":"RUNNING"
}]
Of course, you can also do simple combinations:
http://localhost:8080/system/services?State=RUNNING;ServiceType=KERNEL_DRIVER
This will return the list of all the kernel drivers that are currently running.
Of course, if you’re sitting on your local machine, you could bring up the TaskManager, export the list of services, import it into a real database/excel, and perform queries to your heart’s content…
This type of coding makes spelunking your system really easy. The fact that it’s available through a web interface opens up some possibilities in terms of display, interaction, and accessibility. Since the stream is just JSON, it could be fairly straight forward to present this information in a much more interesting form, perhaps by using d3 or webgl, or who knows what.
So, is this software engineering?
Having gone from a low level system call to a higher level web based interface with interactive query capabilities, I’d say it must be approaching the term. Perhaps the ‘engineering’ lies in the simplicity. Rather than this being a fairly large integrated system, it’s just a few lines of script code that ties together well.
I believe the “engineering”, and thus an “engineer” comes from being able to recognize the minimal amount of code necessary to get a job done. The “engineering” lies in the process of finding those minimal lines of code.
When Is Software Engineering – Hitting the Interwebs
Posted: May 13, 2013 Filed under: Lua, LuaJIT, Microsoft, Network Programming, System Programming | Tags: http, lua, luajit, services, system info Leave a comment »So far, I’ve tamed an OS API that gives me the list of services running on my machine. I’ve been able to slap an iterator on it so that I could more easily deal with the information within my scripting language. Surely this is engineering?
My daughter, who knows Python, is not impressed. Next step, surely if I can make this information readily available through a web interface, it will be “Engineering”…
local WebApp = require("WebApp")
local HttpRequest = require "HttpRequest"
local HttpResponse = require "HttpResponse"
local URL = require("url");
local StaticService = require("StaticService");
local SCManager = require("SCManager");
local JSON = require("dkjson");
local HandleSingleRequest = function(stream, pendingqueue)
local request, err = HttpRequest.Parse(stream);
if not request then
print("HandleSingleRequest, Dump stream: ", err)
return
end
local urlparts = URL.parse(request.Resource)
if urlparts.path == "/system/services" then
local mgr, err = SCManager();
local res = {}
for service in mgr:services() do
if service.Status.State == "RUNNING" then
table.insert(res, service);
end
end
local jsonstr = JSON.encode(res, {indent=true});
local response = HttpResponse.Open(stream)
response:writeHead("200")
response:writeEnd(jsonstr);
else
local filename = './wwwroot'..urlparts.path;
local response = HttpResponse.Open(stream);
StaticService.SendFile(filename, response)
end
-- recycle the stream in case a new request comes
-- in on it.
return pendingqueue:Enqueue(stream)
end
Runtime = WebApp({port = 8080, backlog=100})
Runtime:Run(HandleSingleRequest);
Surely this is engineering!! The business end of this code is right there with the familiar mgr:services() iterator. In this case, I take each of the records returned and stuff them into a table. Then, after all are returned, I turn that into a JSON string, and return it as the web result.
I only want to return the services that are in a “RUNNING” state, so I do that check before I actually stuff the record into the table. Well, there you have it. I’ve now gone from simply being able to make a simple system call, to being able to display the results of that call in a webpage accessible on my machine. Is this “Software Engineering” yet?
Does it become engineering because I wrote the code that consumes a lower level framework? Does it become engineering if I actually wrote the lower level framework? Is it only engineering if I wrote the OS that supports that lower level framework?
As Doctor Evil would say; “Somebody throw me a frickin bone…”
I’ll try to impress my daughter with this code, then I’ll try it out on the cocktail circuit. Perhaps someone will see this as software engineering…
But wait, there’s more! That ‘query’ where I filtered for only the “RUNNING” services looked a bit enemic, and what about changing the state of those services? Surely there’s room to do some engineering in there…
When Is Software Engineering – That’s just coding…
Posted: May 10, 2013 Filed under: Core Programming, Microsoft, Musings, System Programming | Tags: luajit, windows Leave a comment »Over the past couple of weeks, I’ve been thinking the following; Am I a “Software Engineer”? “Am I just a hacker?”, “What is software ‘Engineering’ exactly?”.
When I was in high school, about ready to go to college, I told my math professor that I was studying “Electrical Engineering and Computer Science”. With a puzzled look, he said “what is computer science?”. This was no dullard of a man. He was actually a wealthy man who took up teaching as his way of paying back society for the public school education he had received. He was a Berkeley grad, who had worked on the Manhattan project. He was a true “Engineer”. His question was essentially along the lines of; “I know what engineers do, and we use machines to help, and sometimes you have to program those machines, but is there really a whole “science” behind it?”
Granted, this was back in 1982, so times were a bit different with respect to computing, but this stuck in my mind.
Then, recently, I was discussing the same question with a colleague at work. The line of reasoning we were pursuing was, “How much of our time and engineering is truly “engineering”, and how much of it is simply “coding”. Coding is that mindless thing we do when we’re banging at our keyboards, looking up documentation on obscure functions, just trying to use some library that someone has provided. We didn’t really consider this “engineering” because it’s basically just a translation job. I’ve spent a fair bit of time with the Windows headers of late, and I can tell you, 90% of that time I would consider to be “coding”, but not “engineering”.
Here is one example. I want to get the list of services that are running on my local machine and project that out to the internet as a JSON structure, so I can easily display it in a web page. There’s a core function in Windows to get things started:
BOOL EnumServicesStatusExA( SC_HANDLE hSCManager, SC_ENUM_TYPE InfoLevel, DWORD dwServiceType, DWORD dwServiceState, LPBYTE lpServices, DWORD cbBufSize, LPDWORD pcbBytesNeeded, LPDWORD lpServicesReturned, LPDWORD lpResumeHandle, LPCSTR pszGroupName );
Well, that’s a mouthful. First of all you have to discover that this call actually exists, and figure out all the parameters. Luckily MSDN documentation exists, so at least you can read about it. You might even get lucky enough to find an example laying around on the internet to help show you how to use it.
Since I’m using a scripting language, my first task is to wrap this in an ffi.cdef[[]] so that I can call it easily. In order to do that, I have to put everything into ffi.cdef[[]], the structures, enums, function call prototypes, etc. After all is said and done, I can make a call that looks like this:
local InfoLevel = ffi.C.SC_ENUM_PROCESS_INFO;
local dwServiceType = dwServiceType or ffi.C.SERVICE_TYPE_ALL;
local dwServiceState = dwServiceState or ffi.C.SERVICE_STATE_ALL;
local lpServices = nil;
local cbBufSize = 0;
local pcbBytesNeeded = ffi.new("DWORD[1]");
local lpServicesReturned = ffi.new("DWORD[1]");
local lpResumeHandle = ffi.new("DWORD[1]");
local pszGroupName = nil;
local status = service_core.EnumServicesStatusExA(
self.Handle,
InfoLevel,
dwServiceType,
dwServiceState,
lpServices,
cbBufSize,
pcbBytesNeeded,
lpServicesReturned,
lpResumeHandle,
pszGroupName);
That’s only the beginning. I have to call it again after allocating some space to receive the results… Although there’s a lot of little bits and pieces here, is this ‘engineering’?
If 90% of our time is “coding”, I would say at least 75% of that time is spent on this type of activity. You’re trying to use some API in some library, and you’re doing a lot of scaffolding, and boilerplate work. What is the skill involved here? Well, an experienced coder will be more familiar with the various idioms involved, such as make this call to figure out how much space to allocate, then allocate, then make the call again. They’ll also be familiar with the fact that a return value of 0 == failure in this case, and that you need to call GetLastError() to figure out what could really be wrong, if anything. I might call myself an “engineering” for having gained the wisdom and years of experience necessary to perform this particular task efficiently, but I’m not sure I’d call it “software engineering”.
But, this is only the beginning of the journey on this particular task. Ultimately I need to get this information projected out to the internet. There will be iterators, queries, and other big words involved, and somewhere along the line, I might find the dividing line where software turns into engineering.
The Challenge of Writing Correct System Calls
Posted: April 29, 2013 Filed under: Core Programming, Microsoft, System Programming | Tags: datetime, lua, luajit, win32 1 Comment »If you are a veteran of using Windows APIs, you might be familiar with some patterns. One of them is the dual return value:
int somefunc();
Documentation: somefunc(), will return some value other than 0 upon success. On failure, it will return 0, and you can then call GetLastError() to find out which error actually occured…
In some cases, 0 means error. In some cases, 0 means success, and anything else is actually the error. In some cases they return BOOL, and some BOOLEAN (completely different!).
Another pattern is the “pass me a buffer, and a size…”
int somefunc(char *buff, int sizeofbuff)
Documentation: Pass in a buffer, and the size of the buffer. If the ‘sizebuff’ == 0, then the return value will indicate how many bytes need to be allocated in the buffer, so you can call the function again.
A slight variant is this one:
int somefunc(char *buff, int * sizeofbuff)
In this case, the return value of the function will indicate whether there was an error or not. If there was an error such as: ERROR_INSUFFICIENT_BUFFER, then the ‘sizeofbuff’ was stuffed with the actual size needed to fulfill the request.
This can be very confusing to say the least. What makes it more confusing, at least in the Windows world, is that there is no single way this is done. Windows APIs have existed since the beginning of time, so there is as much variety in the various APIs as there are programmers who have worked on them over the years.
How to bring sanity to this world? I’ll examine just one case where I use Lua to make a more sane picture. I want to get the current system time. In kernel32, there are a few date/time formatting functions, so I’ll use one of those. Here is the ffi to the one I want:
ffi.cdef[[
int
GetTimeFormatEx(
LPCWSTR lpLocaleName,
DWORD dwFlags,
const SYSTEMTIME *lpTime,
LPCWSTR lpFormat,
LPWSTR lpTimeStr,
int cchTime
);
]]
That’s one hefty function to get a time printed out in a nice way. There are pointers to unicode strings, pointers to structures that contain the system time, size of buffers, buffers in unicode…
In the end, I want to be able to do this: GetTimeFormat(), and have it print: “7:54 AM”
Alrighty then. Can’t be too hard…
local GetTimeFormat = function(lpFormat, dwFlags, lpTime, lpLocaleName)
dwFlags = dwFlags or 0;
--lpFormat = lpFormat or "hh':'mm':'ss tt";
if lpFormat then
lpFormat = k32.AnsiToUnicode16(lpFormat);
end
-- first call to figure out how big the string needs to be
local buffsize = k32Lib.GetTimeFormatEx(
lpLocaleName,
dwFlags,
lpTime,
lpFormat,
lpDataStr,
0);
-- buffsize should be the required size
if buffsize < 1 then
return false, k32Lib.GetLastError();
end
local lpDataStr = ffi.new("WCHAR[?]", buffsize);
local res = k32Lib.GetTimeFormatEx(
lpLocaleName,
dwFlags,
lpTime,
lpFormat,
lpDataStr,
buffsize);
if res == 0 then
return false, Lib.GetLastError();
end
-- We have a widechar, turn it into ASCII
return k32.Unicode16ToAnsi(lpDataStr);
end
Not too bad, if a bit redundant. There are a couple of things of note, which are easy to miss if you’re not paying close attention.
First of all, I’m following the convention that any system function that succeeds should return the value, and if it fails, it should return false, and an error.
First thing to do is deal with default parameters. The dwFlags parameter is an integer, so if it has not been specified, a default value of ’0′ will be used. If you don’t do this, then a ‘nil’ will be passed to the system function, and that will surely not work.
The time value can be passed in. If it is, it will be used. If not, then the nil in this position will result in using current system time. Same goes for localeName, and lpFormat. If they are nil, then the system default values will be used, according to the function call documentation.
The next important thing is, turning the lpFormat string into a unicode string if it was specified. Lua, by default, deals in straight ANSII 7/8 bit strings, not unicode, so by default, I assume what’s been specified is a standard Lua ansii string, so I convert it to unicode.
And finally, the first function call to the system. In this first call, I want to get the size of the buffer needed, so I pass in ’0′ as the size of the buffer. The return value of the function will be the size of the buffer needed to fill in the string. Of course, if the return value is ’0′, then the ‘GetLastError()’ function must be called to figure out what was wrong. In this case, it could be that one of the parameters specified was wrong, or something else. But, bail out at any rate.
Now that I know how big the buffer needs to be (in unicode characters, not in bytes?), I allocate the appropriate buffer, and make the call again, this time passing in the specified buffer.
Last step, take the unicode string that was returned, and turn it back into an ansii string so the rest of Lua can be happy with it.
There are a couple more error conditions that could possibly be handled, like checking the types of the passed in parameters, or the size of the needed buffer might change between the two system calls, but this is a ‘good enough’ approach.
It’s 38 lines of boilerplate code to ensure the simplicity and relative correctness of a single system call. With literally hundreds of very interesting system calls in the Win32 system, you can imagine how challenging it can get to do these things right.
Of course, this is why libraries exist, because someone has actually gone through and done all the challenging work to get things right. I find that doing this work in Lua is pretty easy. The biggest challenge is reading and interpreting the documentation of the API. Sometimes it’s clear, sometimes it’s not. Once conquered though, it sure does make programming in Windows a lot easier. I suspect the same it true of any language/os binding.
Taming VTables with Aplomb
Posted: April 22, 2013 Filed under: Core Programming, LuaJIT, Microsoft, System Programming | Tags: ffi, luajit, microsoft, security, vtable Leave a comment »If you do enough interop work, you’ll eventually run across a VTable that you’re going to have to work with. I have previously dealt with OpenGL, which doesn’t strictly have a vtable, but has a bunch of functions which you have to lookup in order to use. In explored the topic in this article: HeadsUp OpenGL Extension Wrangling
Recently, I have been writing code to support TLS connections in TINN. This ultimately involves using the sspi interfaces in Windows, which leads you to the sspi.h header file which contains the following:
typedef struct _SECURITY_FUNCTION_TABLE_A {
unsigned long dwVersion;
ENUMERATE_SECURITY_PACKAGES_FN_A EnumerateSecurityPackagesA;
QUERY_CREDENTIALS_ATTRIBUTES_FN_A QueryCredentialsAttributesA;
ACQUIRE_CREDENTIALS_HANDLE_FN_A AcquireCredentialsHandleA;
FREE_CREDENTIALS_HANDLE_FN FreeCredentialHandle;
void * Reserved2;
INITIALIZE_SECURITY_CONTEXT_FN_A InitializeSecurityContextA;
ACCEPT_SECURITY_CONTEXT_FN AcceptSecurityContext;
COMPLETE_AUTH_TOKEN_FN CompleteAuthToken;
DELETE_SECURITY_CONTEXT_FN DeleteSecurityContext;
APPLY_CONTROL_TOKEN_FN ApplyControlToken;
QUERY_CONTEXT_ATTRIBUTES_FN_A QueryContextAttributesA;
IMPERSONATE_SECURITY_CONTEXT_FN ImpersonateSecurityContext;
REVERT_SECURITY_CONTEXT_FN RevertSecurityContext;
MAKE_SIGNATURE_FN MakeSignature;
VERIFY_SIGNATURE_FN VerifySignature;
FREE_CONTEXT_BUFFER_FN FreeContextBuffer;
QUERY_SECURITY_PACKAGE_INFO_FN_A QuerySecurityPackageInfoA;
void * Reserved3;
void * Reserved4;
EXPORT_SECURITY_CONTEXT_FN ExportSecurityContext;
IMPORT_SECURITY_CONTEXT_FN_A ImportSecurityContextA;
ADD_CREDENTIALS_FN_A AddCredentialsA ;
void * Reserved8;
QUERY_SECURITY_CONTEXT_TOKEN_FN QuerySecurityContextToken;
ENCRYPT_MESSAGE_FN EncryptMessage;
DECRYPT_MESSAGE_FN DecryptMessage;
SET_CONTEXT_ATTRIBUTES_FN_A SetContextAttributesA;
SET_CREDENTIALS_ATTRIBUTES_FN_A SetCredentialsAttributesA;
CHANGE_PASSWORD_FN_A ChangeAccountPasswordA;
} SecurityFunctionTableA, * PSecurityFunctionTableA;
You get at this function table by making the following call:
local sspilib = ffi.load("secur32");
local VTable = sspilib.InitSecurityInterfaceA();
And then, to execute one of the functions, you could do this:
local pcPackages = ffi.new("int[1]");
local ppPackageInfo = ffi.new("PSecPkgInfoA[1]");
local result = VTable["EnumerateSecurityPackagesA"](pcPackages, ppPackageInfo);
-- Print names of all security packages
for i=0,pcPackages[0] do
print(ffi.string(ppPackageInfo[0][i].Name));
end
Tada!! What could be simpler…
Well, this is Lua of course, so things could be made a bit simpler.
First of all, why is there even a vtable in this case? All these functions are just in the .dll file directly aren’t they? Well, there’s a bit of trickery when it comes to security packages. It turns out, it’s best not to actually load the .dll that represents the security package into the address space of the program that’s using it, directly. By calling “IniSecurityInterface()”, the actual package is loaded into a different address space, and the vtable is then used to access the functions.
You can make multiple calls to InitSecurityInterface() to get that vtable pointer, or you could stuff it into a global variable, making it available to all modules within your program, or, you could stuff it into a bit of a table wrapping and make life much easier.
-- sspi.lua
local ffi = require("ffi");
local sspi_ffi = require("sspi_ffi");
local SecError = require ("SecError");
local sspilib = ffi.load("secur32");
local SecurityPackage = require("SecurityPackage");
local Credentials = require("CredHandle");
local schannel = require("schannel");
local SecurityInterface = {
VTable = sspilib.InitSecurityInterfaceA();
}
setmetatable(SecurityInterface, {
__index = function(self, key)
return self.VTable[key]
end,
});
return {
schannel = schannel;
SecurityInterface = SecurityInterface;
SecurityPackage = SecurityPackage;
Credentials = Credentials;
}
With this little bit, I can now do this in my program:
local sspi = require("sspi");
local SecurityInterface = sspi.SecurityInterface;
local pcPackages = ffi.new("int[1]");
local ppPackageInfo = ffi.new("PSecPkgInfoA[1]");
local result = SecurityInterface.EnumerateSecurityPackagesA(pcPackages, ppPackageInfo);
The SecurityInterface table takes care of loading the VTable as part of it’s construction. By doing the setmetatable, and implementing the ‘__index’ metamethod, whenever a ‘.functionname’ is asked for, as with ‘.EnumerateSecurityPackagesA’, the element within the vtable with that name will be returned. Those elements so happen to be function pointers, so they will then just be executed like regular functions!
I think that’s a pretty awesome trick. The SecurityInterface table looks like a static structure with function pointers, and you just get to call those functions directly, passing in the appropriate arguments. This looks pretty much exactly like what I would expect if I were writing this in C, but I don’t have to worry about type casts and the like.
This works in this particular case because there is a single table representing the function pointers. If you were instead doing something where there were instances of an object, and an attendant vtable, you’d have to do a little bit more work to preserve the instance data, and pass it into the individual functions. Not too hard, and I actually do this trick in my Kinect interface implementation.
At any rate, that’s a relatively easy way to tackle vtables without much work. It was actually a bit surprising to me that it worked so easily, and I’ve been able to refine a pattern that I somewhat understood before, and now truly appreciate.
Leap Motion Event Filtering
Posted: March 21, 2013 Filed under: Leap Motion, Lua, LuaJIT, System Programming, TINN | Tags: enumerate, event, leap motion Leave a comment »So, I’m always trying to make my code simpler. Easier to understand, easier to maintain. With the Leap, or any input device, you are faced with a continuous stream of data. In many situations, you’d like to just filter through stuff and only deal with certain types of data. In one sense, you need a simple stream based query processor.
I had the beginnings of such a thing, and I’ve now separated out the pieces. In the case of the Leap Motion, you are faced with streams of data that might look like this:
{
"id":1237560,
"r":[[0.444044,0.663489,-0.602169],[0.184129,-0.725287,-0.663367],[-0.876882,0.183687,-0.444227]],
"s":762.482,
"t":[5336.48,-24560.1,5768.29],
"timestamp":13587071004,
"hands":[{
"id":4,
"direction":[-0.0793992,0.899586,-0.427785],
"palmNormal":[-0.16208,-0.432144,-0.886711],
"palmPosition":[27.138,227.235,80.2504],
"palmVelocity":[-136.716,-134.926,-359.534],
"sphereCenter":[9.15823,202.468,9.29922],
"sphereRadius":106.122,
"r":[[0.989305,-0.132062,-0.0619254],[0.117032,0.97208,-0.203384],[0.0870557,0.193962,0.977139]],
"s":1.45151,
"t":[-18.2708,21.6366,-106.687]
}],
"pointables":[
{
"direction":[0.196259,0.670762,-0.715235],
"handId":4,
"id":7,
"length":68.7964,
"tipPosition":[61.3422,285.46,38.3742],
"tipVelocity":[-184.398,-119.405,-322.679],
"tool":false
},
{
"direction":[0.0324904,0.792378,-0.609165],
"handId":4,
"id":3,
"length":76.8893,
"tipPosition":[14.7425,304.766,41.4163],
"tipVelocity":[-229.246,-95.6285,-323.667],
"tool":false
}
}]
}
This data is representative of a single ‘frame’ coming off the device. As you can see, it’s hierarchical. That is, for every discreet frame, I get a grouping of hand(s) information as well as pointables and possibly gestures. This is how the Leap Motion software packages things up.
In some applications, what I really want is a stream of events, not presented hierarchically. Additionally, I want to read that stream of events and easily filter it looking for particular patterns. Basically, I’d like to be able to write a program like the following, which does nothing more than print the hand information. In particular, I’m looking for the radius of the sphere, as well as the sphere center.
package.path = package.path.."../?.lua"
local LeapInterface = require("LeapInterface");
local FrameEnumerator = require("FrameEnumerator");
local EventEnumerator = require("EventEnumerator");
local leap, err = LeapInterface();
assert(leap, "Error Loading Leap Interface: ", err);
local printHand = function(hand)
local c = hand.sphereCenter;
local r = hand.sphereRadius;
local n = hand.palmNormal;
print(string.format("HAND: %3.2f [%3.2f %3.2f %3.2f]", r, c[1], c[3], c[3]));
end
-- Only allow the hand events to come through
local handfilter = function(event)
return event.palmNormal ~= nil
end
local main = function()
for event in EventEnumerator(FrameEnumerator(leap), handfilter) do
printHand(event);
end
end
run(main);
Easy to digest. Looking at the main() function, there is an iterator chain. The core is the ‘FrameEnumerator(leap)’. This is the source. That is wrapped by the EventEnumerator() iterator, which consumes the frameiterator, as well as the ‘handfilter()’ function.
Looking at the FrameEnumerator:
-- FrameEnumerator.lua
local JSON = require("dkjson");
local ffi = require("ffi");
local FrameEnumerator = function(interface)
local closure = function()
for rawframe in interface:RawFrames() do
local frame = JSON.decode(ffi.string(rawframe.Data, rawframe.DataLength));
return frame;
end
end
return closure;
end
return FrameEnumerator
Not very much code at all, and borrowed from the FrameObserver object in a previous article. This enumerator has the sole purpose of turning the raw strings that arrive from the WebSocket interface into a stream of discreet table objects. It does that by doing the JSON.decode(), which takes a JSON formatted string and turns it into a Lua table object. It just hands those out until it can’t hand out any more.
The EventEnumerator is a little bit more complex, but not much. It’s sole purpose in life is to take Lua tables, which presumably have the format of these Leap Motion frames, and turn them into discreet events, basically flattening the hierarchical data structure. In addition to flattening the data structure, you can apply the filter, thus throwing away bits of the frame that are not going to be processed any further:
-- EventEnumerator.lua
local Collections = require("Collections");
local EventEnumerator = function(frameemitter, filterfunc)
local eventqueue = Collections.Queue.new();
local addevent = function(event)
if filterfunc then
if filterfunc(event) then
eventqueue:Enqueue(event);
end
else
eventqueue:Enqueue(event);
end
end
local closure = function()
-- If the event queue is empty, then grab the
-- next frame from the emitter, and turn it into
-- discreet events.
while eventqueue:Len() < 1 do
local frame = frameemitter();
if frame == nil then
return nil
end
if frame.hands ~= nil then
for _,hand in ipairs(frame.hands) do
addevent(hand);
end
end
if frame.pointables ~= nil then
for _,pointable in ipairs(frame.pointables) do
addevent(pointable);
end
end
if frame.gestures then
for _,gesture in ipairs(frame.gestures) do
addevent(gesture);
end
end
end
return eventqueue:Dequeue();
end
return closure;
end
return EventEnumerator
Stitch it all together and the first program actually works, you can get a stream of ‘hand’ events. The beauty of this system is that you can further string iterators together in this way. Of course, at the lowest level, you can easily filter to find hands, fingers, tools, gestures, and the like. You might also realize something else special about this. The ‘filter’ is just Lua code, so it can be as complex or simple as you want. In this particular case, I just wanted to check for the existance of a single field. If I thought the amount of data was too much, and I wanted to cut it in half, I could have easily kept a counter, and only returned every other event. You could go further, and not just return a true/false, but you could possibly alter the event as well in some way. But, the true power comes from composition. Rather than making a more complex single filter, I’d rather just make more filters and string them together.
Using this simple concept of Enumeration, I can imagine performing a whole bunch of simple, and even complex operations, simply by tying the proper enumerators, observers, queues and the like together.
Screen Capture for Fun and Profit
Posted: March 12, 2013 Filed under: Lua, LuaJIT, Network Programming, System Programming | Tags: http server, lua, network, screen capture 5 Comments »In Screen Sharing from a Browser I wrote about how relatively easy it is to display a continuous snapshot of a remote screen, and even send mouse and keyboard events back to it. That was the essence of modern day browser based screen sharing. Everything else is about compression for bandwidth management.
In this article, I’ll present the “server” side of the equation. Since I’ve discovered the ‘sourcecode’ bracket in WordPress, I can even present the code with line numbers. So, here in its entirety is the server side:
local ffi = require "ffi"
local WebApp = require("WebApp")
local HttpRequest = require "HttpRequest"
local HttpResponse = require "HTTPResponse"
local URL = require("url")
local StaticService = require("StaticService")
local GDI32 = require ("GDI32")
local User32 = require ("User32")
local BinaryStream = require("core.BinaryStream")
local MemoryStream = require("core.MemoryStream")
local WebSocketStream = require("WebSocketStream")
local Network = require("Network")
local utils = require("utils")
local zlib = require ("zlib")
local UIOSimulator = require("UIOSimulator")
--[[
Application Variables
--]]
local ScreenWidth = User32.GetSystemMetrics(User32.FFI.CXSCREEN);
local ScreenHeight = User32.GetSystemMetrics(User32.FFI.CYSCREEN);
local captureWidth = ScreenWidth;
local captureHeight = ScreenHeight;
local ImageWidth = captureWidth;
local ImageHeight = captureHeight;
local ImageBitCount = 16;
local hbmScreen = GDIDIBSection(ImageWidth, ImageHeight, ImageBitCount);
local hdcScreen = GDI32.CreateDCForDefaultDisplay();
local net = Network();
--[[
Application Functions
--]]
function captureScreen(nWidthSrc, nHeightSrc, nXOriginSrc, nYOriginSrc)
nXOriginSrc = nXOriginSrc or 0;
nYOriginSrc = nYOriginSrc or 0;
-- Copy some of the screen into a
-- bitmap that is selected into a compatible DC.
local ROP = GDI32.FFI.SRCCOPY;
local nXOriginDest = 0;
local nYOriginDest = 0;
local nWidthDest = ImageWidth;
local nHeightDest = ImageHeight;
local nWidthSrc = nWidthSrc;
local nHeightSrc = nHeightSrc;
GDI32.Lib.StretchBlt(hbmScreen.hDC.Handle,
nXOriginDest,nYOriginDest,nWidthDest,nHeightDest,
hdcScreen.Handle,
nXOriginSrc,nYOriginSrc,nWidthSrc,nHeightSrc,
ROP);
hbmScreen.hDC:Flush();
end
-- Serve the screen up as a bitmap image (.bmp)
local getContentSize = function(width, height, bitcount, alignment)
alignment = alignment or 4
local rowsize = GDI32.GetAlignedByteCount(width, bitcount, alignment);
local pixelarraysize = rowsize * math.abs(height);
local filesize = 54+pixelarraysize;
local pixeloffset = 54;
return filesize;
end
local filesize = getContentSize(ImageWidth, ImageHeight, ImageBitCount);
local memstream = MemoryStream.new(filesize);
local zstream = MemoryStream.new(filesize);
local writeImage = function(dibsec, memstream)
--print("printImage")
local width = dibsec.Info.bmiHeader.biWidth;
local height = dibsec.Info.bmiHeader.biHeight;
local bitcount = dibsec.Info.bmiHeader.biBitCount;
local rowsize = GDI32.GetAlignedByteCount(width, bitcount, 4);
local pixelarraysize = rowsize * math.abs(height);
local filesize = 54+pixelarraysize;
local pixeloffset = 54;
-- allocate a MemoryStream to fit the file size
local streamsize = GDI32.GetAlignedByteCount(filesize, 8, 4);
memstream:Seek(0);
local bs = BinaryStream.new(memstream);
-- Write File Header
bs:WriteByte(string.byte('B'))
bs:WriteByte(string.byte('M'))
bs:WriteInt32(filesize);
bs:WriteInt16(0);
bs:WriteInt16(0);
bs:WriteInt32(pixeloffset);
-- Bitmap information header
bs:WriteInt32(40);
bs:WriteInt32(dibsec.Info.bmiHeader.biWidth);
bs:WriteInt32(dibsec.Info.bmiHeader.biHeight);
bs:WriteInt16(dibsec.Info.bmiHeader.biPlanes);
bs:WriteInt16(dibsec.Info.bmiHeader.biBitCount);
bs:WriteInt32(dibsec.Info.bmiHeader.biCompression);
bs:WriteInt32(dibsec.Info.bmiHeader.biSizeImage);
bs:WriteInt32(dibsec.Info.bmiHeader.biXPelsPerMeter);
bs:WriteInt32(dibsec.Info.bmiHeader.biYPelsPerMeter);
bs:WriteInt32(dibsec.Info.bmiHeader.biClrUsed);
bs:WriteInt32(dibsec.Info.bmiHeader.biClrImportant);
-- Write the actual pixel data
memstream:WriteBytes(dibsec.Pixels, pixelarraysize, 0);
end
local getSingleShot = function(response, compressed)
captureScreen(captureWidth, captureHeight);
writeImage(hbmScreen, memstream);
zstream:Seek(0);
local compressedLen = ffi.new("int[1]", zstream.Length);
local err = zlib.compress(zstream.Buffer, compressedLen, memstream.Buffer, memstream:GetPosition() );
zstream.BytesWritten = compressedLen[0];
local contentlength = zstream.BytesWritten;
local headers = {
["Content-Length"] = tostring(contentlength);
["Content-Type"] = "image/bmp";
["Content-Encoding"] = "deflate";
}
response:writeHead("200", headers);
response:WritePreamble();
return response.DataStream:WriteBytes(zstream.Buffer, zstream.BytesWritten);
end
local handleUIOCommand = function(command)
local values = utils.parseparams(command)
if values["action"] == "mousemove" then
UIOSimulator.MouseMove(tonumber(values["x"]), tonumber(values["y"]))
elseif values["action"] == "mousedown" then
UIOSimulator.MouseDown(tonumber(values["x"]), tonumber(values["y"]))
elseif values["action"] == "mouseup" then
UIOSimulator.MouseUp(tonumber(values["x"]), tonumber(values["y"]))
elseif values["action"] == "keydown" then
UIOSimulator.KeyDown(tonumber(values["which"]))
elseif values["action"] == "keyup" then
UIOSimulator.KeyUp(tonumber(values["which"]))
end
end
local startupContent = nil
local handleStartupRequest = function(request, response)
-- read the entire contents
if not startupContent then
-- load the file into memory
local fs, err = io.open("viewscreen2.htm")
if not fs then
response:writeHead("500")
response:writeEnd();
return true
end
local content = fs:read("*all")
fs:close();
-- perform the substitution of values
-- assume content looks like this:
-- <!--?hostip? -->:<!--?serviceport?-->
local subs = {
["frameinterval"] = 300,
["hostip"] = net:GetLocalAddress(),
["capturewidth"] = captureWidth,
["captureheight"] = captureHeight,
["imagewidth"] = ImageWidth,
["imageheight"] = ImageHeight,
["screenwidth"] = ScreenWidth,
["screenheight"] = ScreenHeight,
["serviceport"] = Runtime.config.port,
}
startupContent = string.gsub(content, "%<%?(%a+)%?%>", subs)
end
-- send the content back to the requester
response:writeHead("200",{["Content-Type"]="text/html"})
response:writeEnd(startupContent);
return true
end
--[[
Responding to remote user input
]]--
local handleUIOSocketData = function(ws)
while true do
local bytes, bytesread = ws:ReadFrame()
if not bytes then
print("handleUIOSocketData() - END: ", err);
break
end
local command = ffi.string(bytes, bytesread);
handleUIOCommand(command);
end
end
local handleUIOSocket = function(request, response)
local ws = WebSocketStream();
ws:RespondWithServerHandshake(request, response);
Runtime.Scheduler:Spawn(handleUIOSocketData, ws);
return false;
end
--[[
Primary Service Response routine
]]--
local HandleSingleRequest = function(stream, pendingqueue)
local request, err = HttpRequest.Parse(stream);
if not request then
-- dump the stream
--print("HandleSingleRequest, Dump stream: ", err)
return
end
local urlparts = URL.parse(request.Resource)
local response = HttpResponse.Open(stream)
local success = nil;
if urlparts.path == "/uiosocket" then
success, err = handleUIOSocket(request, response)
elseif urlparts.path == "/screen.bmp" then
success, err = getSingleShot(response, true);
elseif urlparts.path == "/screen" then
success, err = handleStartupRequest(request, response)
elseif urlparts.path == "/favicon.ico" then
success, err = StaticService.SendFile("favicon.ico", response)
elseif urlparts.path == "/jquery.js" then
success, err = StaticService.SendFile("jquery.js", response)
else
response:writeHead("404");
success, err = response:writeEnd();
end
if success then
return pendingqueue:Enqueue(stream)
end
end
--[[
Start running the service
--]]
local serviceport = tonumber(arg[1]) or 8080
Runtime = WebApp({port = serviceport, backlog=100})
Runtime:Run(HandleSingleRequest);
As a ‘server’ this code is responsible for handling a couple of things. First, it needs to act as a basic http server, serving up relatively static content to get things started. When the user specifies the url http://localhost/screen, the server will respond by sending back the browser code that I showed in the previous article. The function “handleStartupRequest()” performs this operation. The file ‘viewscreen2.htm’ is HTML, but it’s a bit of a template as well. You can delimit a piece to be replaced by enclosing it in a tag such as: . This tag can be replaced by any bit of code that you choose. In this case, I’m doing replacements for the size of the image, the size of the screen, the refreshinterval, and the hostid and port. This last is most important because without it, you won’t be able to setup the websocket.
The other parts are fairly straight forward. Of particular note is the ‘captureScreen()’ function. In Windows, since the dawn of man, there has been GDI for graphics. Good ol’ GDI still has the ability to capture the screen, or a single window, or a portion of the screen. this still works in Windows 8 as well. So, capturing the screen is nothing more that drawing into a DIBSection, and that’s that. Just one line of code.
The magic happens after that. Rather than handing the raw image back to the client, I want to send it out as a compressed BMP image. I could choose PNG, or JPG, or any other format browsers are capable of handling, but BMP is the absolute easiest to deal with, even if it is the most bulky. I figure that since I’m using zlib to deflate it before sending it out, that will be somewhat helpful, and it turns out this works just fine.
The rest of the machinery there is just to deal with being an http server. A lot is hidden behind the ‘WebApp’ and the ‘WebSocket’ classes. Those are good for another discussion.
So, all in, this is about 300 lines of code. Not too bad for a rudimentary screen sharing service. Of course, there’s a supporting cast that runs into the thousands of lines of code, but I’m assuming this as a given since frameworks such as Node and various others exist.
I could explain each and every line of code here, but I think it’s small enough and easy enough to read that won’t be necessary. I will point out that there’s not much difference between sending single snapshots one at a time vs having an open stream and presenting the screen as h.264 or WebM. For that scenario, you just need a library that can capture snapshots of the screen and turn them into the properly encoded video stream. Since you have the WebSocket, it could easily be put to use for that purpose, rather than just receiving the mouse and keyboard events.
Food for thought.
Screen Sharing from the Browser
Posted: March 7, 2013 Filed under: html, System Programming | Tags: desktop sharing, html, screenshare 1 Comment »As Rocky and Bullwinkle used to say; And now for something we know you’ll really like…
There are literally hundreds of screen sharing solutions out and about in the market. Probably the reason there are so many is because the technologies behind them have become really pervasive, and the “good enough” solution is readily at hand. Well, not wanting to be left behind by the computing world, I decided to implement my own form of screen sharing that runs in any browser (that implements WebSocket at least).
Here’s my take on it:
<!DOCTYPE html>
<html>
<head>
<title>Screen View</title>
<script type="text/javascript" src="jquery.js"></script>
<script language="javascript" type="text/javascript">
window.addEventListener('load', eventWindowLoaded, false);
ScreenWidth = <?screenwidth?>
ScreenHeight = <?screenheight?>
CaptureWidth = <?capturewidth?>
CaptureHeight = <?captureheight?>
FrameInterval = <?frameinterval?>
ImageWidth = <?imagewidth?>
ImageHeight = <?imageheight?>
var uioUri = "ws://<?hostip?>:<?serviceport?>/uiosocket"
function eventWindowLoaded()
{
document.getElementById("screendiv").focus();
setInterval("refreshImage()", FrameInterval); // Capture the image once every few milliseconds
uioSocket = new WebSocket(uioUri);
}
function refreshImage()
{
if (!document.images)
return;
document.images['myScreen'].src = '/screen.bmp?' + Math.random();
}
</script>
</head>
<body style="margin:0px">
<div id="screendiv" tabindex="-1"
style="width:" + ImageWidth + "px; height:"+ImageHeight+"px; margin: 0px 0px 0px 0px; background:yellow; border:0px; groove;"
onselectstart="return false"
onmousedown="return false"
onkeydown="return false" >
<img src="/screen.bmp" name="myScreen">
</div>
</body>
<script>
function map(x, low, high, low2, high2)
{
return low2 + (x/(high-low) * (high2-low2));
}
$("#screendiv").keydown(function(e){
uioSocket.send("action=keydown;which="+e.which);
});
$("#screendiv").keyup(function(e){
uioSocket.send("action=keyup;which="+e.which);
});
$("#screendiv").mousedown(function(e){
var x = map(e.pageX, 0,ImageWidth, 0,CaptureWidth);
var y = map(e.pageY, 0,ImageHeight, 0,CaptureHeight);
uioSocket.send("action=mousedown;which="+e.which+";x="+x+";y="+y);
});
$("#screendiv").mouseup(function(e){
var x = map(e.pageX, 0,ImageWidth, 0,CaptureWidth);
var y = map(e.pageY, 0,ImageHeight, 0,CaptureHeight);
uioSocket.send("action=mouseup;which="+e.which+";x="+x+";y="+y);
});
$("#screendiv").mousemove(function(e){
var x = map(e.pageX, 0,ImageWidth, 0,CaptureWidth);
var y = map(e.pageY, 0,ImageHeight, 0,CaptureHeight);
uioSocket.send("action=mousemove;x="+x+";y="+y);
});
</script>
</html>
Well, this is a .htm file that any browser can load. It’s only 85 lines of code, not including the inclusion of jquery. JQuery is incidental to the implementation, it’s only used to notify when the document is ready, and to select the div tag that I use.
So, what’s going on here? It’s really fairly simple. To break it down, I’ll start from the reverse. The simplest form of screen sharing is simply taking snapshots of the screen that is to be shared, and send those snapshots out to whomever you intend to share with. If you can send them fast enough, the sharer will have a nice live update of your screen.
To truly share the desktop though, you have to be able to send keyboard and mouse events back to the desktop that is taking the snapshots. If you can do this fast enough, then you can actually control elements on the desktop and see them update in real time as your screen updates.
So, to do this in the browser takes two parts. The first part is to simply display screen captures. In this code, I assume there is a service I can talk to and ask for a particular bitmap image. The refreshImage() funcition does exactly this. It basically just asks the server for this image, and displays it. But we need this to occur on a regular basis.
That little bit of code that starts with $(document).ready(… calls setInterval(“refreshImage()”, FrameInterval);
This will ensure that the function is called every so many milliseconds.
Great. That covers the first part of the equation. Every few milliseconds, the screen will be updated with an image. This web page neither knows nor cares what the image is, where it comes from, or anything else. It just knows it will display the image. This is handy not just for screen sharing, but for cameras, or anything else that can generate an image with any regularity.
OK. So, now an image will show up every once in a while. What about sending keyboard and mouse events back the other way? Well, here I’ve chosen to employ websockets. The WebSocket API essentially gives you a bi-directional socket to the server you specify. This is great because you can then send anything down that pipe once the connection is established. This socket is completely independent from the one the browser might be using to do its normal tasks such as downloading that image. This is great for sending mouse and keyboard out of band because it’s fast and minimal. Other than the initial setup, which is in http protocol speak, the socket is just a plain old socket.
So, down at the bottom of the code, I’ve implemented some mouse and keyboard event handlers. Basically, just capture the keyboard and mouse, create a string that the server side can understand, and send that string to the server side using the WebSocket. Again, the script doesn’t know what it’s talking to for real. The keyboard and mouse commands could cause any number of actions to occur on the server side. The script just sends the data.
And that’s about it!
But wait, there’s some funky looking stuff in there that doesn’t look like valid html or javascript!! Yah, to get things like the Capture size, image size, frameinterval and the like the server side actually does a string replacement on these parts before it sends it to the browser. I’ll get to that when I explain the server side.
This is a very rudimentary implementation. It’s possibly good enough for me to help my mom if she’s having problems with her computer and I need to quickly help her out. It can only get better from here. You could add copy/paste, better compression, and all sorts of things that might have you thinking you’re ready to compete with the best of them in the race to replace the desktop.
But, for me, it just goes to show you that this is a category of software that has become easy enough that 84 lines of html is enough to demonstrate the basic concept.
