What’s in a platform?Posted: June 4, 2012
A couple weeks back, I put out the LAPHLibs: Introducing LAPHLibs
There are a few reasons for having such a rudimentary library. First, is the realization that Lua does not really have what most would consider to be a standard library. Sure, it has inbuilt io, math, and a couple of os calls, but that’s about it. It does not have anything like libc, which includes the kitchen sink.
This is not for a lack of code, but rather the specific philosophy of the Lua creators. Small is beautiful, add in whatever you need, forget the rest. I so happen to like this attitude, but it would be nice if there were an equivalent of libc readily available for LuaJIT.
LuaJIT can be used to do very good high speed programming… I say this to people and the response is typically (Isn’t Lua used for game programming glue…). It’s an odd response to me, because you’d think that something that’s good enough for gaming, would be good enough for most applications. But, the problem is, when used for gaming, it’s not used to drive the primary game loop, or the physics, or whatever. It’s typically used to change some configurations, perhaps construct a character or simple maps, or whatever. But, I think LuaJIT kind of changes the game.
Dynamic languages are not evil and slow in and of themeselves. It’s really easy to write extremely wasteful code using dynamic languages, because they go out of their way to make things really convenient. But, you can also do really great work with them, with a little bit of forethought.
One of the areas that I’ve been focusing on is the ‘zero allocations’ path of things. Simply put, a routine is “zero allocation” if within the routine itself, it does not allocate any memory, and thus does not increase pressure on the GC. Recently, I went down the path of creating a tiny little XML Lexer. I won’t call it a “parser” because there’s so much it does not do with respect to the XML standard. But, XML is fairly simple. It’s just angle brackets and slashes after all…
At any rate, if you’ve done any XML over the years, you might have found things like msxml, system.xml, sax, expat, etc. Also, you may find the likes of PicoXML or Tiny XML (several variants). Pico XML is particularly interesting because it is less than 600 lines of code. Most of those lines taken up by a state transition table, and some character arrays.
Some basics. I wanted zero allocations, and I wanted a pull model parser. By “pull model”, I mean the code that is using the parser calls “GetNext()” to move from state to state, pulling lexemes out of the stream. I like this model because it’s easy to control, and it’s easy to turn it into a push if you feel like it.
As far as the API is concerned, I want to be able to feed it a buffer that comes from somewhere, and just have it start moving. Lastly, I do want the ability to put in some callback routines, in case of errors, or I just want to see the tokenization process in action.
What I ended up with is called LUXL and fits in about 300 lines of Lua code. LUa Xml Lexer (LUXL) is used like this:
local buf = strdup(somexmlstring); local len = strlen(buf) local xlex = luxl.new(buf, len); for event, offset, size in xlex:Lexemes() do local txt = GetString(buf, offset, size); print(string.format("[%s] '%s'", pico_event_str(event), txt)); end
Basically, instantiate an instance of the lexer handing it the source buffer. Then, get the lexem iterator: Lexemes() and start iterating, using a standard Lua iterator pattern.
The Lexemes() routine is just a convenient wrapper on the underlying GetNext() call. The iterator looks like this:
function luxl:Lexemes() return function() local event, offset, size = self:GetNext(); if(event == PICO_EVENT_END_DOC) then return nil; else return event, offset, size; end end end
This is nice because it makes XML a data type which is easily integrated into the Lua world, without much fuss. It’s not perfect, and it won’t catch a lot of XML cases. It does do reasonably enough to deal with parsing the likes of an .xsd file, or AMF, or any number of random simple configuration files.
Being part of the LaphLibs, it does not have any external dependencies. It joins a host of routines to deal with hashes, string routines, bit banging, and memory streams. In the days of old, the libc was very useful, and the basis for a ton of applications. In the good new days, network awareness, encryption, high bandwidth asyncrhonous process, are the norm. In order to deal with the world, there are certain core routines that need to exist, and essentially define a new “platform”. The LaphLibs provide some fundamentals to me as a programmer, and are the basis of my new platform.