Serialization 105 – The Lazy Programmer shares data

Serialization Series

And finally, to conclude this little serialization excursion. I want to create an application that is a collaborative whiteboard. All the application needs to do is receive a series of drawing command from multiple participants, and execute those commands to render something on each of the screens of the participants. Ideally, code ends up doing something like the following:

local commands = {
    MoveTo({x=10, y=10}),
    LineTo({x=20, y=20}),
    LineTo({x=30, y=30}),
    LineTo({x=10, y=10}),
}

NetworkInterface:Send(commands)

And magic happens!

Well, that’s pretty much what can be done now that all these various bit slinging bits and pieces have been assembled. There were a couple of enhancements that needed to be made along the way though. Here is what that MoveTo command looks like:

EMRTypes[EMR_MOVETOEX] = {
    name = "MoveTo";
    fields = {
        {name="id", basetype="int32_t", default= EMR_MOVETOEX},
        {name="x", basetype="int32_t"},
        {name="y", basetype="int32_t"},
    };
}

I have a bit of code that will take this info and turn it into a little class that will deal with all sorts of things. One of the new members on the field specification is a default value. What I can achieve with this is to have a value set on the structure as soon as it’s constructed. So, by default, the id field will have the value of EMR_MOVETOEX, which is a number. So, by default, I could do this:

cmd = MoveTo()
network:send(cmd.DataPtr, cmd.BufferSize)

And, what I’d get on the wire is, int32(EMR_MOVETOEX), int32(0), int32(0)

That’s great, because on the receiving end, I’ll first pick up on the ID field, which will tell me which data structure to pull out of the list of data structures, and then I can deserialize using that data structure.

cmd = network:receive()

Of course, the usual field accessors are available, because the autogen’d class has them, so:

cmd:get_x()
cmd:get_y()

These “commands” are just data. I did not send any behavior with them. What happens with the data is up to the recipient. The data could simply be logged, or actual drawing commands could be executed, or the data may be routed to other recipients after sniffing out a couple bits and pieces of information.

Sometimes, you want to set values from the get go though, so as listed above:

cmd = MoveTo({x=10, y=10})

This will allocate just enough space to hold the values for the class. At the same time, it will actually initialize the memory by calling the field setters with the specified values. That’s pretty neat I think.

You can also control the memory that is allocated on your own though:

size = MoveTo.ClassSize * 10  -- space for 10 structures
buff = ffi.new("char[?]",MoveTo.ClassSize * 10)
int offset = 0
cmds = {}
for i=1,10 do
  table.insert(cmds, MoveTo(buff, offset, {x=i, y=i*2}))
end
sender:send(buff, size)

That would construct 10 move commands, all in the same preallocated buffer, and send it off somewhere. I think that’s pretty useful.

Another way of seeing how this might be useful is when you consider the typical network packeting challenge.

For something like UDP, you have the following setup:

MAC Header
IP Header
UDP Header
Payload

That’s three headers and a payload. There are a couple of checksums in those headers as well, so it’s nice to have everything together for ease of calculation. Using these various serialization techniques, you could construct a single packet, and then just line up the various header structures appropriately, and set your values. That’s kind of useful because no copying has to occur along the way. No reallocation to prepend a header or any nonsense like that. If you’re doing something as silly as trying to use the IP protocol to communicate between threads on the same machine, this is fairly useful, and resource conservative.

There are other interesting constructs that can be built up here. There is for instance unions. That’s nothing more than throwing two or more layouts on top of the same chunk of memory. Well, that’s a no brainer here. Just define your two layouts, and then apply them to the same chunk of memory. This happens a lot when you’re dealing with something like IPv4 and IPv6. Which address type is it? Oh no, now I have to use that silly IN_STORAGE thing, or whatever.

There is another construct that I had to deal with, and that has to do with strings. Take the typical person data structure:

Person_Info = {
    name = "Person";
    fields = {
        {name = "First", basetype = "char", subtype="string", repeating = 20};
        {name = "Middle", basetype = "char", subtype="string", repeating = 20};
        {name = "Last", basetype = "char", subtype="string", repeating = 20};
        {name = "Age", basetype = "uint16_t"};
        {name = "City", basetype = "char", subtype="string", repeating = 32};
        {name = "State", basetype = "char", subtype="string", repeating = 32};
        {name = "Zip", basetype = "char", subtype="string", repeating = 10};
    };
};

There’s a couple of things of note here. First of all, these fields are of a fixed size in a buffer. The strings are not pointers to pieces of memory allocated somewhere else. It’s not tightly packed. This is like a real fixed size “record”. The basetype is “char”, which makes sense, but how do you really want to access this? Ideally, you’d want to do the following:

local p = Person();

p:set_First("William");
p:set_Middle("A");
p:set_Last("Adams");

and likewise, when you access the fields, you want Lua strings returned:

print("First: ", p:get_First())
print("Middle: ", p:get_Middle())
print("Last: ", p:get_Last())

Well, those “strings” are actually just bytes in a buffer. Normally, to convert to strings, you’d have to do something like:

ffi.string(charptr)

But, that’s a pain. So, the ‘subtype=”string”‘ is a bit of information that tells the serialization system that this is really a null terminated C style of string, and setting/getting should treat it as such. So, when you set the value, it will be copied into the buffer, as it should be, and a null terminator will be put in place. So, the size must include space for the null terminator. Similarly, when you go to get the value out again, a Lua string will be interned and you’ll get that back. You’re not pointing to anything within the buffer at that point, it’s not a pointer to the start of the characters.

This is a natural feel for the thing. It just makes sense. If you do actually want to get a pointer for a particular field, then just don’t define the subtype as ‘string’. Then, when you use the getter, you’ll get back a pointer and the size of the field:

ptr, size = p:get_Middle()

That way you’re free to do as you like, send it off somewhere, copy it, set the value, whatever. It’s just raw access within the underlying buffer at that point, with all the nastiness that entails.

Well, that’s enough for me to play with for now. It would be nice to add nested types, enums, and other things you typically find in type systems. Using this here though, I was able to reduce the amount of code in my TargaReader, and I’m now able to take a look at packets coming off the wire, down to the lowest level of protocol.

If you’ve ever done any customized packet filtering with WireShark, this might all seem very familiar. WireShark utilizes Lua to do its network funny business. The only challenge is WireShared does not provide a general libray for others to use for whatever needs they might have. They’ve created a set of routines that work very well within the context of the WireShark environment. I wanted to generalize that sort of thing so that I could apply the techniques to anything from media parsing to protocol sniffing, to collaborative apps.

And here it is. I am a lazy programmer, and this bit twiddling serializing stuff makes my life that much easier, and gives me that much more time to sit around and be lazy.


One Comment on “Serialization 105 – The Lazy Programmer shares data”

  1. […] It’s been almost a year since I wrote about stuff related to serialization: Serialization Series […]


Leave a comment