LJIT2Http_Parser – All the world’s a http stream

It’s no joke that Node.js has taken the world by storm.  Rightfully so.  It makes web programming just that much easier for some reason.  But what about Apache, and IIS, which have dominated this domain for the past decade, and then some?  Well, I guess there’s nothing like the changing technological landscape to change what’s in vogue.

In roughly 2000, Microsoft introduced the CLR, and ASP.net to the world.  One of the biggest claims to fame at the time was that your server pages would be “compiled” if you wrote them in one of the many .net languages.  At the time this was a big deal, and quite a boost/advantage over other forms of web server programming.  Given advances in both processor speed, and language development, this particular advantage has been eroded over time.  At that time, XML was intended to be the one “to rule them all”…

Then along came JavaScript on the server side.  More importantly perhaps, came JSON.  JSON is a data format that is almost a direct representation of javascript objects.  Unlike XML, it’s easily readable, and especially easily parseable if you so happen to be using javascript to do your programming.  Since javascript is now the lingua franca on the client side, it makes sense to exchange data using JSON.  Similarly, since the advent of things like the V8 javascript engine, it makes sense to use javascript on the server side as well.  Given that there’s javascript on both ends of the pipe, it makes sense to have a server side that totally embraces that language, and thus node.js makes a lot of sense.

At the core of node.js, there’s some low level http handling going on.  Oddly enough, this core is written in pure unadulterated C.  One of the key pieces of this core is the http_parser.  This bit of code is responsible for receiving the various requests off the wire, parsing them, and telling your service about headers, body, date/time, and the like.  The nitty gritty of dealing with the http protocol.  As a bonus, it has a nice little URL parser to boot.

Of course I’d want to be able to access the same from Lua, so I created a quick and dirty http_parser binding.  Although most users of node.js seem to be Linux users, I wanted to use it from Windows, so I provided a quick compile of the http_parser, so there’s a .dll as well, that is meant specifically for Windows consumption.

The http parser code is a couple thousand lines of code.  This might be considered ‘huge’ by the standards typically used to measure a lua codebase, but really it’s rather tiny for what it does.  One of the greatest design choices the designers made was to not do any allocations at runtime.  Everything is buffer based.  That is, you hand it a buffer to parser, and you give it pointers to functions to be called when various parsing events occur.  What your callback functions are handed are pointer offsets and lengths into the original buffer.  This works out just fine because any string manipulation code worth its salt these days can deal with a pointer and length just fine.  No need for null terminators and the like.  As such, this is about as lean as http parsing can become, at least in terms of resource allocation.  The core of the routine is a couple of state machines that deal with all the odd funky corner cases that are in the http protocol.

Doing the LuaJIT binding is an interesting exercise in subsumption.  The first stab at it, I just had to make the relevant API calls accessible, and there are only a handful of them.  Taking the next step, three of the functions are simply returning string values for enums.  Well, since I have to replicate those enums on the Lua side, I took it one step further, and represent them in a table form, which makes it easy to lookup values, and print out corresponding names.  Thus, the three functions can be fully supported on the Lua side, and the C code can be ignored.

the http_parser code has a typical pattern, which I like very much.  The core calls take some pointer to a structure, which is maintained by the API itself.

void http_parser_init(http_parser *parser, enum http_parser_type type);

In this case, a pointer to the http_parser structure is being passed around. This is good for the Lua side, because I can easily turn the http_parser structure into a meta type, which allows me to group the various related methods, and program it in a more object oriented way:

local lib = ffi.load("http_parser")

http_parser = nil
http_parser_mt = {
  __index = {
    new = function()
      local parser = ffi.cast("struct http_parser *", ffi.new("char *",ffi.sizeof("struct http_parser")))
      return parser;

    init = function(self, parser_type)
      -- brief sanity check
      if parser_type  HTTP_BOTH then return end

      lib.http_parser_init(self, parser_type);

    execute = function(self, settings, data, len)
      return lib.http_parser_execute(self,settings,data,len);

    pause = function(self, paused)
      lib.http_parser_pause(self, paused);

    should_keep_alive = function(self)
      return lib.http_should_keep_alive(self);
http_parser = ffi.metatype("struct http_parser", http_parser_mt);

With this in hand, typical usage becomes:

parser = http_parser().new()

-- create a http_settings object similarly, and set callbacks
-- then...
parser:execute(settings, buf, buflen)

I can go one step further and absorb the settings into the http_parser itself, so that callbacks are registered directly on the parser object, rather than through a separate settings object, at least from the perspective of the programmer.

It’s always a tossup when doing object oriented programming. Are you doing it for the purity of the thing, for maintainability, or because it truly is the appropriate model for the task. In this case, I do it because it simplifies my mental model of what the parser is, and what it does. It just seems natural to me. Of course, in javascript, the node.js code does a similar thing.

There’s another thing going on here though. At 500 lines, the interop code is becoming as big as the original library code. The only part of the library that is not essentially replicated in the interop code is the state machine handling. That’s just two primary routines, one for the URL parsing, and one for the http parsing in general. And what’s so magical about these two routines? Nothing horribly special. They are basically parsers, and as such, they do a lot of examining of characters in fairly involved case/if/then/else statements, with some bit twiddling and logical anding and oring thrown in to boot. Well, ok, LuaJIT should be just as good for that sort of thing as any shouldn’t it be? The bit twiddling can easily be handled using bitops, although things might look a little clunky. The pointer handling can clearly be dealt with, as witnessed by my previous musings on serialization. So, if this bit of code can be rewritten in LuaJIT directly, should it be? Would there be any benefit?

The benefit of having a pure LuaJIT implementation of a high speed http_parser are interesting to contemplate. As it is, LuaJIT makes for an excellent extension mechanism to almost any programming task. As I see it, the benefit of having Lua as an embedded add-on vs JavaScript is obvious. But, to be direct, the Lua runtime is tiny in comparison to any JavaScript engine I’ve encountered. It was designed from the beginning to be embeddable. Having a Lua native http parsing capability, that is easily embeddable into any application just means that given a decent sockets library, any application can easily become a REST driven application, for example. Not just web services, but any application. That means the likes of the Microsoft Calculator could suddenly gain a REST interface. Wouldn’t that be fun?

At any rate, it’s just speculation. Those case statements are daunting, suggesting a nightmare of deeply nested if/then/else statements. Or could it all be easily represented with some LPEG, or simply by clever usage of some tables? Who knows, such thoughts are above my pay grade.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s