Serialization 101

As soon as you want to communicate with something, you have to worry about how you’ll represent your data. If you lookup binary serialization on the inter webs, you’ll get a laundry list full of history and methodologies. For a few years, I had the pleasure of pushing XML as the end all be all data representation format. Nowadays, JSON is very popular, because it matches JavaScript, which is very popular.

The basic problem is, I have data on my machine represented in some machine readable form, and I want to communicate it somewhere else, possibly on a different architecture, and probably across the internet.

In the case of binary data, there is always the need for a very low level serializer that can deal with things like reading and writing simple types, such as bytes, int, float, double, short, string. This lowest level thing takes care of endianness, and reliably reading and writing to the native format of the machine. I have implemented BitReader, and BitWriter classes. They deal with byte arrays, so they are fairly agnostic as to how thos byte arrays came to be.

  BitReader
  ReadByte(bytes)
  ReadInt16(bytes)
  ReadUInt16(bytes)
  ReadInt32(bytes)
  ReadUInt32(bytes)
  ReadInt64(bytes)
  ReadSingle(bytes)
  ReadDouble(bytes)

BitWriter
  WriteByte(bytes, value)
  WriteInt16(bytes, value)
  WriteUInt16(bytes, value)
  WriteInt32(bytes, value)
  WriteUInt32(bytes, value)
  WriteInt64(bytes, value)
  WriteUInt64(bytes, value)
  WriteSingle(bytes, value)
  WriteDouble(bytes, value)

This is all in the name of convenience of course. You might notice that there is no Read/WriteString() methods. That’s because a string does not have a universal representation as a data type. Is it a list of ASCII characters followed by a terminating null, or is it UTF-8 encoded, or is it like a Pascal string with a leading length, followed by ASCII or something else? So, at this very lowest level, there is no concept of a string. But, you do want to deal with strings, and it comes at the next level up.

Bit stuffing and unstuffing is very low level and fundamental. At a higher level, you need to deal with the actual buffers, and move pointers along in the buffer that you’re reading and writing. This is where the BinaryStreamReader/Writer comes in.

Similar to the low level bit reader/writer pair, these stream versions add some convenience. Here is an example of a reader and writer in action:

function test_Int()
    local len = 1024;
    local bytes = Array1D(len, "uint8_t");
    local writer = BinaryStreamWriter.CreateForBytes(bytes,len);

    writer:WriteInt16(32);
    writer:WriteInt32(958);
    writer:WriteInt16(2301);
    writer:WriteInt32(23);

    local reader = BinaryStreamReader.CreateForBytes(bytes, len);
    assert(reader:ReadInt16() == 32);
    assert(reader:ReadInt32() == 958);
    assert(reader:ReadInt16() == 2301);
    assert(reader:ReadInt32() == 23);
end

First, I create a buffer, using Array1D, of a sufficient size. Then I create a binary stream writer on top of the buffer. The writer will just keep track of where we are, and whether we’ve written off the end of the buffer. Every call to Write, will essentially pass through to the BitWriter.

Same thing is true of the binary stream reader. As these are separate functions, you could actually create both a reader and a writer on the same buffer, at the same time, with no problem, other than worrying about whether you’re overwriting something you don’t want to.

This is where the convenience methods for strings are implemented as well:

function test_string()
    local len = 1024;
    local bytes = Array1D(len, "uint8_t");

    local writer = BinaryStreamWriter.CreateForBytes(bytes,len);
    writer:WriteString("this is a whole long string that I want");

    local reader = BinaryStreamReader.CreateForBytes(bytes, len);
    local str = reader:ReadString();
    print(str);
end

I was actually able to use this in my Targa loading code like this:

fHeader.IDLength = reader:ReadByte();
fHeader.ColorMapType = reader:ReadByte();
fHeader.ImageType = reader:ReadByte();
fHeader.CMapStart = reader:ReadInt16();
fHeader.CMapLength = reader:ReadInt16();
fHeader.CMapDepth = reader:ReadByte();

    -- Image description
fHeader.XOffset = reader:ReadInt16();
fHeader.YOffset = reader:ReadInt16();
fHeader.Width = reader:ReadUInt16();
fHeader.Height = reader:ReadUInt16();        
fHeader.PixelDepth = reader:ReadByte();
fHeader.ImageDescriptor = reader:ReadByte();

It may not look like much, and if you’re a user of the .net frameworks, you’ll be doing a big yawn about now, because this is the same pattern that’s existed in that world since roughly 2001. But, this is Lua, and the standard libraries don’t support this, so here it is. There are various libraries that support one or another of the various serialization schemes, but they are typically meant to support that scheme only, and don’t necessarily generalize the binary reading and writing aspect of things.

Well, now that this little tool is in hand, other things can come. When I have data to send somewhere, it’s a fairly easy matter to write some serialization code that will take my various attributes, and use the binary stream writer to stuff them into a chunk of memory. Once there, it looks like a payload, and it can be delivered to any other interface that accepts chunks of memory. From the last article, I’ve shown that it can be a separate thread that receives this payload, or it could be a machine across the net. Once received, the binary stream reader is used to unpack stuff, and on you go.

The serialization process can be tedious, and when implemented by hand, error prone. Many frameworks try to help by writing serializers based on an abstract description of the data structure. I’m thinking, with Lua, I don’t need another format to represent data structures. I don’t need to resort to XML, ASN.1, JSON or anything else. I can just describe things in Lua, either using a table, with names and and data types, or some other mechanism. From the same description, I can generate a serializer on the fly, and get it JIT’d, which will be a very good thing indeed. I could also generate serializers for other languages fairly easily if needs be.

At any rate, armed with this tool, putting together things like protocol headers becomes a snap. Then, doing something like a remote desktop app, where you’re sharing bits of a screen, also become routine. You just construct payloads big enough to hold your screen bits, and send them away.

Once things are in payloads, it becomes relatively easy to do other things such as compression, and encryption. That might be useful if youi’re trying to construct a secured communications channel. But, I digress…


One Comment on “Serialization 101”


Leave a comment