Reading PE format using data marshaling in .NET

I recently picked up a book on IL by Serge Lidin titled Expert .NET 2.0 IL Assembler. It is by far the most interesting book that I’ve read on .NET internals, which is actually fairly easy to follow thanks to the friendly nature of the managed environment and the awesomeness of the author.

If you want to satisfy your curiosity about disassembling and generating IL code, this is the best book on the market. Anyway, the first couple of chapters in the book go over the Portable Executable (PE) format which is really vital in understanding how your compiler pieces together assemblies and how the loader is able to resolve dependencies and get everything into memory to get things executing.

While going through the explanation of the PE format in the book is enough to understand the bare minimum, i decided to go hands-on and write a little PE parser.

Before i throw down some code, I’d like to reference a few resources which will help in understanding the PE format better.

PE File Structure by Matt Pietrek. Possibly the best resource on PE file format. I would recommend going over this reference before reading the rest of this article or at least have it open side-by-side for quick reference.

PInvoke.Net provides a ton of unmanaged types and structures ported to managed code.

So, where do we begin? First of all, it is necessary to understand that the PE format consists of a multitude of unmanaged data structures (C structs). Generally, the best way to read unmanaged structure in .NET is to use marshaling. Let’s take a look at the code to see how this is accomplished.

private static T MarshalBytesTo<T>(BinaryReader reader)
{
    byte[] bytes = reader.ReadBytes(Marshal.SizeOf(typeof(T)));

    GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
    T theStructure = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
    handle.Free();

    return theStructure;
}

This function above is generic, where T is a type, in our case a struct, which will be used to marshal byte data to. This is probably pretty obvious, but the raw data we’re reading resides in the BinaryReader object instance passed to the method. The first thing this method does is it takes the size of type T and converts it to unmanaged size of T, or simply the size of the type as it would have been in an unmanaged environment.

We now read that chunk of data from the binary reader. It’s important to note that what we’re reading here is the same type that we duped in C#, but in its unmanaged form. We’ve got our data in the byte array, now we need to marshal it to the managed type!

Here comes the interesting part. Marshaling moves unmanaged data to a managed memory location by creating a pointer to that location. Now, the problem is that we now have an unmanaged pointer to managed data in memory which can be moved to another location any time Garbage Collector (GC) sees it necessary. (The beauty of undeterministic nature of .NET). In other words, our unmanaged pointer could end up pointing to invalid memory location at any given time.

So, what’s the solution? The solution is to tell GC to “pin” our managed memory in place while we’re moving data. While this would fix our problem, there is drawback and it is that you could seriously affect performance of your code by not letting GC do its magic with memory optimization. However, the good news is that we’re not marshaling a ton of data here so we probably won’t see any performance hits.

Having said that, what we need to do now is allocate a pointer or handle to our unmanaged data residing in the byte array and “pin” it to prevent GC from moving it around while we’re marshaling data. Next, we tell the marshaller to take that data and convert it to our managed type. Finally, we deallocate the pointer and move on to more trivial stuff. Here is the entire class.

using System.IO;
using System.Runtime.InteropServices;

public class PEReader
{
    #region Structs

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_DOS_HEADER
    {
        public UInt16 e_magic;
        public UInt16 e_cblp;
        public UInt16 e_cp;
        public UInt16 e_crlc;
        public UInt16 e_cparhdr;
        public UInt16 e_minalloc;
        public UInt16 e_maxalloc;
        public UInt16 e_ss;
        public UInt16 e_sp;
        public UInt16 e_csum;
        public UInt16 e_ip;
        public UInt16 e_cs;
        public UInt16 e_lfarlc;
        public UInt16 e_ovno;
        [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
        public UInt16[] e_res1;
        public UInt16 e_oemid;
        public UInt16 e_oeminfo;
        [MarshalAs(UnmanagedType.ByValArray, SizeConst = 10)]
        public UInt16[] e_res2;
        public UInt32 e_lfanew;
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_NT_HEADERS
    {
        public UInt32 Signature;
        public IMAGE_FILE_HEADER FileHeader;
        public IMAGE_OPTIONAL_HEADER32 OptionalHeader32;
        public IMAGE_OPTIONAL_HEADER64 OptionalHeader64;
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_FILE_HEADER
    {
        public UInt16 Machine;
        public UInt16 NumberOfSections;
        public UInt32 TimeDateStamp;
        public UInt32 PointerToSymbolTable;
        public UInt32 NumberOfSymbols;
        public UInt16 SizeOfOptionalHeader;
        public UInt16 Characteristics;
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_OPTIONAL_HEADER32
    {
        public UInt16 Magic;
        public Byte MajorLinkerVersion;
        public Byte MinorLinkerVersion;
        public UInt32 SizeOfCode;
        public UInt32 SizeOfInitializedData;
        public UInt32 SizeOfUninitializedData;
        public UInt32 AddressOfEntryPoint;
        public UInt32 BaseOfCode;
        public UInt32 BaseOfData;
        public UInt32 ImageBase;
        public UInt32 SectionAlignment;
        public UInt32 FileAlignment;
        public UInt16 MajorOperatingSystemVersion;
        public UInt16 MinorOperatingSystemVersion;
        public UInt16 MajorImageVersion;
        public UInt16 MinorImageVersion;
        public UInt16 MajorSubsystemVersion;
        public UInt16 MinorSubsystemVersion;
        public UInt32 Win32VersionValue;
        public UInt32 SizeOfImage;
        public UInt32 SizeOfHeaders;
        public UInt32 CheckSum;
        public UInt16 Subsystem;
        public UInt16 DllCharacteristics;
        public UInt32 SizeOfStackReserve;
        public UInt32 SizeOfStackCommit;
        public UInt32 SizeOfHeapReserve;
        public UInt32 SizeOfHeapCommit;
        public UInt32 LoaderFlags;
        public UInt32 NumberOfRvaAndSizes;
        [MarshalAs(UnmanagedType.ByValArray, SizeConst = 16)]
        public IMAGE_DATA_DIRECTORY[] DataDirectory;
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_OPTIONAL_HEADER64
    {
        public UInt16 Magic;
        public Byte MajorLinkerVersion;
        public Byte MinorLinkerVersion;
        public UInt32 SizeOfCode;
        public UInt32 SizeOfInitializedData;
        public UInt32 SizeOfUninitializedData;
        public UInt32 AddressOfEntryPoint;
        public UInt32 BaseOfCode;
        public UInt64 ImageBase;
        public UInt32 SectionAlignment;
        public UInt32 FileAlignment;
        public UInt16 MajorOperatingSystemVersion;
        public UInt16 MinorOperatingSystemVersion;
        public UInt16 MajorImageVersion;
        public UInt16 MinorImageVersion;
        public UInt16 MajorSubsystemVersion;
        public UInt16 MinorSubsystemVersion;
        public UInt32 Win32VersionValue;
        public UInt32 SizeOfImage;
        public UInt32 SizeOfHeaders;
        public UInt32 CheckSum;
        public UInt16 Subsystem;
        public UInt16 DllCharacteristics;
        public UInt64 SizeOfStackReserve;
        public UInt64 SizeOfStackCommit;
        public UInt64 SizeOfHeapReserve;
        public UInt64 SizeOfHeapCommit;
        public UInt32 LoaderFlags;
        public UInt32 NumberOfRvaAndSizes;
        [MarshalAs(UnmanagedType.ByValArray, SizeConst = 16)]
        public IMAGE_DATA_DIRECTORY[] DataDirectory;
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_DATA_DIRECTORY
    {
        public UInt32 VirtualAddress;
        public UInt32 Size;
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct IMAGE_SECTION_HEADER
    {
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
        public string Name;
        public Misc Misc;
        public UInt32 VirtualAddress;
        public UInt32 SizeOfRawData;
        public UInt32 PointerToRawData;
        public UInt32 PointerToRelocations;
        public UInt32 PointerToLinenumbers;
        public UInt16 NumberOfRelocations;
        public UInt16 NumberOfLinenumbers;
        public UInt32 Characteristics;
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct Misc
    {
        [FieldOffset(0)]
        public UInt32 PhysicalAddress;
        [FieldOffset(0)]
        public UInt32 VirtualSize;
    }

    #endregion

    #region Fields

    private readonly IMAGE_DOS_HEADER _dosHeader;
    private IMAGE_NT_HEADERS _ntHeaders;
    private readonly IList<IMAGE_SECTION_HEADER> _sectionHeaders = new List<IMAGE_SECTION_HEADER>();

    #endregion

    public PEReader(BinaryReader reader)
    {
        // Reset reader position, just in case
        reader.BaseStream.Seek(0, SeekOrigin.Begin);

        // Read MS-DOS header section
        _dosHeader = MarshalBytesTo<IMAGE_DOS_HEADER>(reader);

        // MS-DOS magic number should read 'MZ'
        if (_dosHeader.e_magic != 0x5a4d)
        {
            throw new InvalidOperationException("File is not a portable executable.");
        }

        // Skip MS-DOS stub and seek reader to NT Headers
        reader.BaseStream.Seek(_dosHeader.e_lfanew, SeekOrigin.Begin);

        // Read NT Headers
        _ntHeaders.Signature = MarshalBytesTo<UInt32>(reader);

        // Make sure we have 'PE' in the pe signature
        if (_ntHeaders.Signature != 0x4550)
        {
            throw new InvalidOperationException("Invalid portable executable signature in NT header.");
        }

        _ntHeaders.FileHeader = MarshalBytesTo<IMAGE_FILE_HEADER>(reader);

        // Read optional headers
        if (Is32bitAssembly())
        {
            Load32bitOptionalHeaders(reader);
        }
        else
        {
            Load64bitOptionalHeaders(reader);
        }

        // Read section data
        foreach (IMAGE_SECTION_HEADER header in _sectionHeaders)
        {
            // Skip to beginning of a section
            reader.BaseStream.Seek(header.PointerToRawData, SeekOrigin.Begin);

            // Read section data... and do something with it
            byte[] sectiondata = reader.ReadBytes((int)header.SizeOfRawData);
        }
    }

    public IMAGE_DOS_HEADER GetDOSHeader()
    {
        return _dosHeader;
    }

    public UInt32 GetPESignature()
    {
        return _ntHeaders.Signature;
    }

    public IMAGE_FILE_HEADER GetFileHeader()
    {
        return _ntHeaders.FileHeader;
    }

    public IMAGE_OPTIONAL_HEADER32 GetOptionalHeaders32()
    {
        return _ntHeaders.OptionalHeader32;
    }

    public IMAGE_OPTIONAL_HEADER64 GetOptionalHeaders64()
    {
        return _ntHeaders.OptionalHeader64;
    }

    public IList<IMAGE_SECTION_HEADER> GetSectionHeaders()
    {
        return _sectionHeaders;
    }

    public bool Is32bitAssembly()
    {
        return ((_ntHeaders.FileHeader.Characteristics & 0x0100) == 0x0100);
    }

    private void Load64bitOptionalHeaders(BinaryReader reader)
    {
        _ntHeaders.OptionalHeader64 = MarshalBytesTo<IMAGE_OPTIONAL_HEADER64>(reader);

        // Should have 10 data directories
        if (_ntHeaders.OptionalHeader64.NumberOfRvaAndSizes != 0x10)
        {
            throw new InvalidOperationException("Invalid number of data directories in NT header");
        }

        // Scan data directories and load section headers
        for (int i = 0; i < _ntHeaders.OptionalHeader64.NumberOfRvaAndSizes; i++)
        {
            if (_ntHeaders.OptionalHeader64.DataDirectory[i].Size > 0)
            {
                _sectionHeaders.Add(MarshalBytesTo<IMAGE_SECTION_HEADER>(reader));
            }
        }
    }

    private void Load32bitOptionalHeaders(BinaryReader reader)
    {
        _ntHeaders.OptionalHeader32 = MarshalBytesTo<IMAGE_OPTIONAL_HEADER32>(reader);

        // Should have 10 data directories
        if (_ntHeaders.OptionalHeader32.NumberOfRvaAndSizes != 0x10)
        {
            throw new InvalidOperationException("Invalid number of data directories in NT header");
        }

        // Scan data directories and load section headers
        for (int i = 0; i < _ntHeaders.OptionalHeader32.NumberOfRvaAndSizes; i++)
        {
            if (_ntHeaders.OptionalHeader32.DataDirectory[i].Size > 0)
            {
                _sectionHeaders.Add(MarshalBytesTo<IMAGE_SECTION_HEADER>(reader));
            }
        }
    }

    private static T MarshalBytesTo<T>(BinaryReader reader)
    {
        // Unmanaged data
        byte[] bytes = reader.ReadBytes(Marshal.SizeOf(typeof(T)));

        // Create a pointer to the unmanaged data pinned in memory to be accessed by unmanaged code
        GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);

        // Use our previously created pointer to unmanaged data and marshal to the specified type
        T theStructure = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));

        // Deallocate pointer
        handle.Free();

        return theStructure;
    }
}

Here is a snippet using the class above.

Stream stream = new FileStream("my_assembly.dll", FileMode.Open);
PEReader peReader = new PEReader(new BinaryReader(stream));

The code above should be easy to follow at this point once we reference the PE diagram. Each struct defined in the code references a PE section and matches its byte signature. The only thing that we’re not interested in is the DOS stub which is a 64-byte section following DOS header. The reason we’re not interested in it is because it’s only responsible for outputting “This program cannot be run in MS-DOS mode.” message.

PE file structure

Defining structs in this case is a bit tricky in that you have to specify their layout in order for the marshaller to know how to marshal data into them. This is done using StructLayoutAttribute which can be either sequential (members of the struct are laid out in order they appear) or explicit (each member must specify offset using FieldOffsetAttribute).

If a member of a struct is an array, we need to “mark” it as array and and give it a size using MarshalAsAttribute. All structs seen in the code are 1-to-1 ports of those defined in winnt.h.

Obviously i didn’t cover reading the entire file structure, as this would be an immense undertaking. However this should provide a good head-start for anyone looking to work with the PE format.

Hopefully, next time I’ll come back to explain how to read the Resource Directory structure, which is arguably the most difficult data structure in the PE format.