The Lazy Programmer looks at PE filesPosted: May 3, 2012
I thought I’d try and apply my bit twiddling skills against an interesting problem. I have reason to look at PE files, or ELF, or Mach-O, and do some interesting stuff based on what I find there. My experience dealing with these executable file formats is fairly limited. I probably knew about the inner workings of Elf back in the day, but I’ve never had any real need to know the depts of PE files.
How hard could it be? It’s just a bunch of bits and bytes right?
The hardest thing for such a challenge is finding a good source of information on the subject. In the case of PE file format, I started by getting the Microsoft PE File Format Specification. You have to accept a license agreement before you can download it. That document is full of information. It’s a bit cryptic in places though. It doesn’t spell out in absolute detail what the DOS header up front looks like. It just says “yah, it’s there, and it should start with “MZ”.
Like all things modern, I turned to the internet to get the real skinny on the format in question. I found one really good tutorial on the PE format, presented by CodeBreakers Magazine. This one is really interesting because it goes into intricate details about what’s in the file, how its used, how viruses are created and stashed away. The only challenge it has is that it doesn’t reflect all forms of the PE file format. It was written in 2006, so it’s missing some information. Still, it makes for a good companion to the MS document because it actually explains everything.
As well as having these raw reference materials, I also found a couple of tools of use. First was HexEdit. This is one of those simple file viewers that allows you to see the file in Hex form, which is great when you’re trying to confirm what you’re reading. Another tool that is very useful is PEBrowse Professional. This tool parses the PE file, and shows you all the detailed information you could care to look at. It includes a disassembler, and can read TYPELIB information if it’s present. That’s great for dealing with COM interfaces.
The way I got started was to find define the meta data for important data structures that can be found in the file. This includes things like:
and the like. The stream of code that deals with the first few pieces of information looks like this:
local buff, bufflen = copyFileToMemory("HeadsUp.exe") local offset = 0 local dosinfo = IMAGE_DOS_HEADER(buff, bufflen, offset) printDOSInfo(dosinfo) offset = offset + dosinfo.ClassSize local ntheadertype = MAGIC4(buff, bufflen, dosinfo:get_e_lfanew()) offset = ntheadertype.Offset + ntheadertype.ClassSize local fileHeader = COFF(buff, bufflen, offset) printCOFF(fileHeader) offset = offset + fileHeader.ClassSize -- Read the 2 byte magic for the optional header local pemagic = MAGIC2(buff, bufflen, offset) local peheader=nil if IsPe32Header(pemagic) then peheader = PE32Header(buff, bufflen, offset) elseif IsPe32PlusHeader(header) then peheader = PE32PlusHeader(buff, bufflen, offset) end printPEHeader(peheader);
That’s a start. From here, the basics are in hand, and other bits and pieces from the file can be loaded in. Not bad from not knowing anything about PE files to being able to browse the basics. Of course, Windows gives you plenty of library routines which make this trivial work, but, perhaps I want to be able to perform this little dance without using Windows at all.
The real work comes when the parsing is more involved, including relative offsets to base addressed stuff, and so on and so forth.
I like the ease of programming like this though. After rough data structures are defined, I’m left with just sliding them into the right positions along a buffer, and reading what I find there.