Serialization 103 – Getting the Fiddly Bits

Early in the evolution of computing, storage was a scarce commodity. As such, the earliest protocols and data structures packed things as tightly as possible. You can see this all the way down in machine instructions, but at a higher level, all you have to do is take a look at the layout of a modern protocol header:

typedef struct IPv6Header {
    uint32_t version : 4;
    uint32_t priority : 4;
    uint32_t flowlabel : 24;
    uint16_t payloadlength;
    uint8_t nextheader;
    uint8_t hoplimit;
    uint8_t source[16];
    uint8_t destination[16];
} IPv6Header;

This structure says that the field ‘version’ takes up the first 4 bits (0-3) of an int32, and priority the next 4 bits (4-7) and flowlabel takes the remaining 24 (8-31). If you were writing code in C, you’d naturally have access to these fields just as if they were integers, without any concern as to how they got stuffed into the structure. The compiler will just deal with the access.

Same goes for LuaJIT. For the most part, once you have declared such a CType, you can just access it, without a concern for how it was laid out in memory.

But how do you get the bits and bytes stuffed in there in the first place, if they’re coming off the wire?

Well, depending on how well behaved the data structure is, you might just be able to read the bytes off the wired, and copy them directly into a data structure of the appropriate type, and be done with it. Wash your hands, call it a day…

But, what about endianness? Is it least significant bit first, or most significant? Well now, you’ve gone and ruined your perfect day! As is often the case with networking protocols, the representation is “big endian”. No doubt a vestige of the machines dominant at the time of the birth of the internet (VAX? PDP?). Nowadays, we’re dominated by little endian machines (intel). So, there’s this little bit twiddling dance that often has to occur to get things in the right order. At least the internet specs are clear about endianness, at least for the ones that I’ve read. Images, not always so clear. Many times there is an implied bigendian, but it just depends on what architecture the software was first created for.

So, here I am in Lua, which barely has support for bit operations, and I need to do a lot of bit twiddling. Well, like any good programming effort, I start with the test harness. I need a couple of things. First, I want to be able to test whether a bit is on or off in a number, I also want to be able to set the value of a bit to on or off. Of course, with LuaJIT, the raw bitops are there. But, I can never rember how to count by two, and in particular how to deal with the offsets, and which symbol to use for the power, etc. First step then is to create the functions for my lazy programming:

function isset(value, bit)
    return band(value, 2^bit) > 0

function setbit(value, bit)
    return bor(value, 2^bit)

function clearbit(value, bit)
    return bxor(value, 2^bit)

I’m sure these routines have been written 1000 times on numerous platforms, and probably in Lua as well. But, since they’re not in the standard Lua library, here are my versions.

The next thing I need is the ability to translate from binary string representation of a number to an actual number, and visa versa. To wit:

function numbertobinary(value, nbits)
    nbits = nbits or 32
    local res={}
    for i=nbits-1,0,-1 do
        if isset(value,i) then
            table.insert(res, '1')
            table.insert(res, '0')
    return table.concat(res)

function binarytonumber(str)
    local len = string.len(str)
    local value = 0

    for i=0,len-1 do
        if str:sub(len-i,len-i) == '1' then
            value = setbit(value, i)
    return value

Perhaps not the most elegant of solutions, but they get the job done, and I can move on to more interesting parts of the bit twiddling.

function getbitsvalue(src, lowbit, bitcount)
    lowbit = lowbit or 0
    bitcount = bitcount or 32

    local value = 0
    for i=0,bitcount-1 do
        value = bor(value, band(src, 2^(lowbit+i)))

    return rshift(value,lowbit)

And a little test program to convince myself things are working properly:

function test_getbitsvalue()
    print(getbitsvalue(0, 0, 8))
    print(getbitsvalue(3, 0, 8))
    print(getbitsvalue(6, 1, 8))
    print(getbitsvalue(0xff, 0, 8))

    local bin1 = "11000000"
    local n1 = binarytonumber(bin1)

    print("Binary 3: ", getbitsvalue(n1, 6, 2))

The test program shows how valuable it is to be able to create a human readable bit parttern and then turn that into a number that we can get bits out of. I can construct any manner of bit patterns, and then call on getbitsvalue (after converting to a number) and see if I get out what I expected to get out.

The astute reader will say “hay! you’re just doing little endian, wasn’t the point to be able to deal with big endian as well?”. So true. “left as an exercise for the reader”, but essentially, just reverse bits in the ‘getbitsvalue()’ routine. Throw in a check for big/little and you’re on your way. Although, this is a fairly low level routine, you have to check the specs of what it is that you’re trying to read off the wire. It might be a bitfield that is 4 bits. Are those 4 bits in little or big endian order? I don’t know, and neither does anyone else unless they read the spec. This is most applicable when you’ve got a set of bytes, and you go through the bytes one by one pulling out the bits that you want. If you’ve already read 4 bytes in as an int, then you have to think about whether you need to reverse things or not. You could use bit.swap() first, to get things in the proper order. But, close reading of the spec will tell you which way to go with these raw tools.

At any rate, armed with this little bit of twiddling, I can go back to my stream reader, and incorporate the reading of bitfields. That’s a big relief, because it means I will in fact be able to easily utilize funky protocols like IP to talk between both threads on a local machine, and across the internet.


One Comment on “Serialization 103 – Getting the Fiddly Bits”

  1. […] Serialization 103 – Getting the Fiddly Bits […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s