The Lazy Programmer Understands PE Files

Yesterday, I started to look at PE files. That’s essentially .dll, .exe, etc on Windows operating system. There were a couple of goals to my little excursion. One was to exercise the recently minted serialization codes. I figure if the code is good enough to deal with a PE file, it’s probably sufficient to tackle a few other tasks as well, like parsing various types of media files.

Another goal I had was to see how much could be expressed declaratively (writing a description), and how much has to be expressed imperatively (writing code). Thus far, the descriptive part of the serialization process has been fairly sufficient for things like internet protocol stuffing. Those protocols are fairly simple. They mostly consist of tightly packed bit fields and other base types, with the occasional varying length field. PE files are a whole different ballgame.

The challenge has to do with the complexity of what they represent. The .exe on disk represents the starting point of all the bits that your app will need to execute. The loader pulls stuff in, and doesn’t just memory map it into virtual memory. There are very specific alignments on page boundaries, fixups for various addresses, thunks to new addresses. It is typical to find virtual addresses in the file, which are not pointers to positions in the file, but rather represent where something will be in memory, once the loader loads everything up properly.

Again, the nightmare here is in reading the various specs. One spec will show all the data formats, but not really talk much about how it all ties together, or will talk about it in a way that assumes you already know what they’re talking about (probably written by the original developers). Another source will talk about how things are put together, but will be circa 2001, or some such, so it won’t cover the modern stuff, like .net sections.

At any rate, I was able to get it all together, and now I can read in a PE file, seeing all the sections, directories, pointers to code, and all that. Of course, this isn’t anything spectacular because you can already do all this with myriad tools all over the place, but, this was an exercise to see how hard it would be.

Now that it’s in hand, there are actually some useful things that can be done. This is Lua after all, where things are much simpler than in any other environment 😉

So, how hard is it to get a look at all the symbols that are being imported by a particular executable?

local pefile = CreatePEBrowser(filename)
for name, entry in pairs(pefile.Directories.Imports) do

The same is true of exports. Similarly, the list of sections within the image:

for name, section in pairs(pefile.Sections) do
    print(string.format("\tVirtual Size: 0x%08X", section:get_VirtualSize()))
    print(string.format("\tVirtual Address: 0x%08X", section:get_VirtualAddress()))
    print(string.format("\tSize of Raw Data: 0x%08X", section:get_SizeOfRawData()))
    print(string.format("\tPointer to Raw Data: 0x%08X", section:get_PointerToRawData()))
    print(string.format("\tCharacteristics: 0x%08X", section:get_Characteristics()))

Basically, just view the entirety of the PE file as if it were a nicely behaved hierarchical Lua table. Of course, some behaviors can be added as well, like adding a section, or removing a section, or just changing a bit here and there.

There are plenty of Windows APIs to help you look at Image files. But, they only work on Windows, so on Linux you’re out of luck. Why would you want to look at PE files on Linux? Who knows, but with a tool like this it’s relatively easy. Also, having a tool like this makes it easier to reason about and understand what’s going on within an executable, or .dll, or .lib, or .fon, or .whatever. Without such a tool, it seems like so much black magic to me, and it just doesn’t have to be that way.

One other thing I discovered was an appreciation for how hard it is to keep programs safe these days. These PE files are full of little nooks and crannies within which a bit of nasty code could hide, and make itself known later. I can fully appreciate how hard it is to both discover, and prevent viruses from occuring on a machine. It’s darned near impossible in fact. With all the thunking and rebasing that goes into making something run, it’s no wonder our machines slowly come to a crawl over a long period of usage.

And, oddly enough, this brings me back to the topic of the Existential Program. I wonder if a program could be packaged up in such a way that it knows its own boundary. It’s active and alive. It has an identity, and knows how to manage changes to itself. How could that be done?

Perhaps the program, when it is conceived, chooses an encryption algorithm and key which are perhaps unique to each instance of the program. But then, at launch time, the program is reponsible for launching itself, not allowing the system loader to do the work. All it’s handed is a chunk of memory, and the rest is up to the program.

There is a chicken and egg problem here of course. How can the first bit of bootstrap even start unless the system is involved in starting it? Hmmmm

At any rate, I think it is a problem for our age. My programs need identity if for no other reason than to protect themselves from unsavory alteration.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s