On the insideous deception of simplicity

I find that in programming, there is much tribal knowledge. This is true for every framework, language, OS, blogsphere, and the like. As such, it is the hapless seeker of knowledge who will often fall pray to false assumptions, miscalculations, missteps, and generally an unhappy time of it.

Case in point. A while back, I was working on schedlua, and it came time to add epoll to get some async socket action going.  Well, if you browse the interwebs, looking for things related to epoll, you’ll eventually come across ‘struct epoll_event’.  As I might be dealing with in some ‘binding’ environment, I have need to clearly define what this data structure is.

Here is a typical answer:

struct epoll_event {
    uint32_t events; /* epoll events (bit mask) */
    epoll_data_t data; /* User data */

Well that seems easy enough. When I first played with epoll, it was on a Raspberry Pi, and I probably used such a simple definition. I probably copied it from some other code, probably didn’t think much about it at all.

Then, when I did schedlua, things didn’t turn out so well. I used the same simple definition, and somehow I found my data structures were being overwritten, or just had wrong values. What gives? Well, it was actually pointed out to me by someone in the LuaJIT community (Mike Pall) that this particular data structure is defined thus:

typedef union epoll_data {
	void *ptr;
	int fd;
	uint32_t u32;
	uint64_t u64;
} epoll_data_t;

struct epoll_event {
	uint32_t events;
	epoll_data_t data;
#ifdef __x86_64__
__attribute__ ((__packed__))

That is, specifically when you’re talking about 64-bit machines, you have to use that packed attribute so that the structure ends up having the same layout as it would have on a 32-bit machine. I guess the Linux kernel itself cares about this alignment for this particular case.

Of course, if I weren’t a lazy programmer, I would have easily seen this if I examined the particulars in the header file it comes from (/usr/include/sys/epoll.h).

The Windows OS is littered with all sorts of this kind of goodness due to its long legacy of compatibility (all the way back to 16-bit computers). It’s just one of those things you have to be aware of.

More recently I’ve been working on the LJIT2libc project. As I wrote in: LJIT2libc – LuaJIT IS “batteries included”, the luajit program, as compared to the liblua51 library, provides access to all of libc, and libm. the only catch is you need the ffi.cdef definitions available to use it.  Well, that’s a heck of a lot of definitions!!  But, I am naive and undaunted by the enormity of the task, so I embarked…

Talk about esoteric tribal knowledge!  Do yourself a favor and browse through the code of one of these ‘libc’ libraries some time.  I used musl as my guide, because it’s modern and fairly well written in my opinion.  At the very least, you can gain an appreciation for the magic and seamless masking of platform specifics this library presents, by browsing the header file structure.  As libc is utilized on multiple different platforms, architectures, OSs, and the like, it has to cover the differences between all of those, and present something that is relatively the same for the programmer who chooses to utilize the library.  My epoll data structures is but one example.  Here’s more esoteric ones:

When you do something as simple as: #include <limits.h> in your C program, what do you get?

On ARM, you get some standard stuff, but the definition of max integer values looks like this:

#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
 || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
#define PAGE_SIZE 4096
#define LONG_BIT 32

#define LONG_MAX  0x7fffffffL
#define LLONG_MAX  0x7fffffffffffffffLL

And if you’re dealing with a 64-bit platform, you’re likely to get something like this:

#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
 || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
#define PAGE_SIZE 4096
#define LONG_BIT 64

#define LONG_MAX  0x7fffffffffffffffL
#define LLONG_MAX  0x7fffffffffffffffLL

Did you blink? Did you catch that change? LONG_MAX isn’t always LONG_MAX. Same is true for the language specific ‘size_t’. How big are these? What’s their range? It depends.

So, for something as simple as integer values and ranges, that tribal knowledge comes in handy. Of course the well groomed programmer wouldn’t make mistakes related to any assumptions around the range of these values, but the poor lazy slobs, such as myself, who don’t know all the details and assumptions, will make their own assumptions, and the bugs will come… eventually.

Doing LJIT2libc gives me an appreciation for how hard it is to actually create a library such as this to satisfy as broad an audience as it does. If I could only match the headers in detail, then LJIT2libc will be applicable to all platforms where luajit lives. That’s probably a good thing.

Pursuing this path for libc makes me appreciate even more the work of Justin Cormack in creating ljsyscall.  I first leveraged ljsyscall back when I was doing LuaJIT on the Raspberry Pi.  Back then it was getting some ioctl calls right so I could ready from joysticks, keyboards, and mice.  ljsyscall had all that covered.  Why it’s so much more special, and cool for lazy programmers such as myself, is it covers a wide swath of system programming.  Whereas libc tries to make a standard set of library function available to multiple environments, ljsyscall tries to make those multiple environments look relative the same.  For example, it provides a programming interface that is the same across Linux, FreeBsd, rum kernels, and other forms of unices that are similar, but each with their own tribal knowledge.  Quite a feat I’d say, and something I can really appreciate.  For me, ljsyscall has become the face of ‘UNIX’, at least when you’re programming using LuaJIT.

And so it goes.  It’s the small and innocuous which will trip you up time and time again.  Getting the smallest details right at the very start, sussing out that tribal knowledge, checking those facts again and again, are what will keep the simple things simple, and allow you to build cathedrals on granite rather than sand.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s