PE: Difference between revisions

7,144 bytes added ,  26 days ago
m
[unchecked revision][unchecked revision]
m (→‎64 bit PE: spelling)
 
(37 intermediate revisions by 20 users not shown)
Line 1:
{{File formats}}
 
With Windows 959x/[[Windows NT|NT]], a new executable file type was required. Thus was born the "PE" Portable Executable, which is still in use. Unlike its predecessors, WIN-PE is a true 32bit file format, supporting relocatable code. It does distinguish between TEXT, DATA, and BSS. It is, in fact, a bastardized version of the [[COFF]] format.
 
If you did set up a [[Cygwin]] environment on your Windows machine, "PE" is the target format for your Cygwin GCC toolchain, which causes the unaware some headache when trying to link parts build under Cygwin with parts build under Linux or BSD (which use the ELF target by default). (Hint: You have to build a [[GCC Cross-Compiler]])
 
The PE format is used by Windows 95 and higher, Windows NT 3.1 and higher, the Mobius, ReactOS and ReactOS[[UEFI]]. The PE format is also used as a container for .NET assemblies by both the Microsoft .Net CLI and Mono., Inin which case it doesn't actually store executable data but .Net metadata with IL attached to it.
 
= Inside the PE file =
Below will attempt to explain the various concepts and parts that make up a PE file rather than the exact data structures inside of them since that would certainly take up too much space. The reasoning is that most resources on PE files tend to just dump a bunch of data structures in your face without fully explaining what they are for. So by reading the following and knowing what a PE file consists of you will have a better understanding of what to expect and how to utilize PE file resources.
 
==Overview==
The PE file consists of several parts. They are briefly sumarized below, then more detail is entered on each part.
Here are the definitions for the field types. PE files are stored in little-endian order, the same byte order as an x86.
 
[[Image:PEFigure1.jpg|left|frame|An overview of the format]]
 
== DOS Stub ==
The PE format begins with a MS-DOS stub (a header plus executable code) which makes it a valid MS-DOS executable. The MS-DOS header begins with the magic code 0x5A4D and is 64 bytes long, followed by real-mode executable code. The standard stub used almost universally is 128-bytes long (including header and executable code) and simply outputs "This program cannot be run in DOS mode." Despite many utilities that with PE files are hard coded to expect the PE header to start at exactly 128 bytes in, this is incorrect since in some linkers, including Microsoft's own [[Link]], it is possible to replace the MS-DOS stub with one of your own choosing, and many older programs did this to allow the developer to bundle a MS-DOS and Windows version into a single file. The correct way is to read a formerly reserved 4-byte address inside the MS-DOS header located at 0x3C (field commonly known as e_lfanew) which contains the address at which PE file signature is found, and PE file header follows immediately. Usually this is a pretty standard value (most of the time this field is set to 0xE8 by the default link.exe stub). Microsoft seemingly recommends aligning the PE header beginson an 8 byte boundary (https://web.archive.org/web/20160609191558/http://msdn.microsoft.com/en-us/gg463119.aspx, page 10, figure 1).
 
== PE header ==
The PE header contains information that concerns the entire file rather than individual pieces that will be coming up later. The bare minimum header contains a 4-byte signature (0x000011C60x00004550), the machine type/architecture of the executable code inside, a time stamp, a pointer to symbols, as well as various flags (is the file an executable, DLL, can the application handle addresses above 2GB, does the file needed be copy to the swap file if ran from a removable device, etc). Unless you're using a really stripped down statically linked PE file to save memory with a hard coded entry point and no resources, then the PE header alone isn't enough.
 
<syntaxhighlight lang="c">
// 1 byte aligned
struct PeHeader {
uint32_t mMagic; // PE\0\0 or 0x00004550
uint16_t mMachine;
uint16_t mNumberOfSections;
uint32_t mTimeDateStamp;
uint32_t mPointerToSymbolTable;
uint32_t mNumberOfSymbols;
uint16_t mSizeOfOptionalHeader;
uint16_t mCharacteristics;
};
</syntaxhighlight>
 
=== Optional header ===
Line 20 ⟶ 40:
 
Part of the optional header is NT-specific. This include the subsystem (console, driver, or GUI application), how much stack and heap space to reserve, and the minimum required Operating System, subsystem and Windows version. You can use your own values for all of these depending on the needs of your OS.
 
<syntaxhighlight lang="c">
// 1 byte aligned
struct Pe32OptionalHeader {
uint16_t mMagic; // 0x010b - PE32, 0x020b - PE32+ (64 bit)
uint8_t mMajorLinkerVersion;
uint8_t mMinorLinkerVersion;
uint32_t mSizeOfCode;
uint32_t mSizeOfInitializedData;
uint32_t mSizeOfUninitializedData;
uint32_t mAddressOfEntryPoint;
uint32_t mBaseOfCode;
uint32_t mBaseOfData;
uint32_t mImageBase;
uint32_t mSectionAlignment;
uint32_t mFileAlignment;
uint16_t mMajorOperatingSystemVersion;
uint16_t mMinorOperatingSystemVersion;
uint16_t mMajorImageVersion;
uint16_t mMinorImageVersion;
uint16_t mMajorSubsystemVersion;
uint16_t mMinorSubsystemVersion;
uint32_t mWin32VersionValue;
uint32_t mSizeOfImage;
uint32_t mSizeOfHeaders;
uint32_t mCheckSum;
uint16_t mSubsystem;
uint16_t mDllCharacteristics;
uint32_t mSizeOfStackReserve;
uint32_t mSizeOfStackCommit;
uint32_t mSizeOfHeapReserve;
uint32_t mSizeOfHeapCommit;
uint32_t mLoaderFlags;
uint32_t mNumberOfRvaAndSizes;
};
</syntaxhighlight>
 
=== Data Directories ===
While technically part of the optional header and follows directly after it is a list of entries pointing to data directories (only in executable images and DLLs). Because the optional header can vary in size, you only need to pay attention to the directories that exist and you expect, since it is likely that new data directories will be added to the PE specification in the future (.Net is an example of one that was recently added). Each data directory is referenced as an 8-byte entry in the optional header. The first 4 bytes is the Relative Virtual Address, or RVA (see Sections below), of the directory, and the last 4 bytes is the size of the directory.
 
Each data directory that the entries point to have their own format. Data directories are used to describe import tables for dynamic linking, a table of resources that are embedded inside of the PE file, debug information (line numbers and break points), the CLI .Net header.
 
{| class="wikitable"
! Position (PE/PE32+)
! Section
|-
| 96/112 || The export table address and size. Same format as .edata
|-
| 104/120 || The import table address and size. Same format as .idata
|-
| 112/128 || The resource table address and size. Same format as .rsc
|-
| 120/136 || The exception table address and size. Same format as .pdata
|-
| 128/144 || The attribute certificate table offset (not RVA) and size. See [https://wiki.osdev.org/PE#Signed_PE_with_Attribute_Certificate_Table Signed PE] below
|-
| 136/152 || The base relocation table address and size. Same format as .reloc
|-
| 144/160 || The debug data starting address and size. Same format as .debug
|-
| 152/168 || Architecture, reserved MBZ
|-
| 160/176 || Global Ptr, the RVA of the value to be stored in the global pointer register. The size member of this structure must be set to zero.
|-
| 168/184 || The thread local storage (TLS) table address and size. Same format as .tls
|}
 
== Sections ==
Line 31 ⟶ 112:
The Relative Virtual Base is a phrase that comes up a lot in PE documentation. The RVA is the address at where something exists once it's loaded into memory, rather than an offset into the file. To calculate the file's address from an RVA without actually loading the sections into memory, you can use the table of section entries. By using the virtual address and size of each section you can find which section the RVA belongs to, then subtract the difference between the section's virtual address and file offset.
 
=== Section header ===
Each section has an entry in section header table.
 
<syntaxhighlight lang="c">
struct IMAGE_SECTION_HEADER { // size 40 bytes
char[8] mName;
uint32_t mVirtualSize;
uint32_t mVirtualAddress;
uint32_t mSizeOfRawData;
uint32_t mPointerToRawData;
uint32_t mPointerToRelocations;
uint32_t mPointerToLinenumbers;
uint16_t mNumberOfRelocations;
uint16_t mNumberOfLinenumbers;
uint32_t mCharacteristics;
};
</syntaxhighlight>
 
===In asm linkage===
if in nasm you declare a block of code like this:
<syntaxhighlight lang="asm">
segment .code
aAsmFunction:
;Do whatever
mov BYTE[aData], 0
ret
segment .data
aData: db 0xFF
</syntaxhighlight>
The ''segments'' will apear as ''sections''. Using this it is possible to keep C and Asm seperate, as a linker will not automatically merge ''.code'' and ''.text'', which is the normal output by C compilers.
=== Position Independent Code ===
If each section specifies which virtual address to load it in to, you may be wondering how multiple DLLs can exist in one virtual address space without conflict. It is true that most code you'll find in a PE file (DLL or otherwise) is position dependent and linked to a specific address. However to resolve this issue there exist a structure called a Relocation Table that is attached to each section entry. The table is basically a HUGE long list of every address stored in that section so you can offset it to the location where you loaded the section.
 
Because addresses can point across section borders, relocations should be done after each section is loaded into memory. Then reiterate over each section, iterate through each address in the Relocation Table, find out what section that RVA exists in and add/subtract the offset between that section's linked virtual address and the section's virtual address you loaded it into.
 
== Signed PE with Attribute Certificate Table ==
Many PE executable (most notably all Microsoft updates) are signed with a certificate. These information is stored in the Attribute Certificate Table, pointed by the Data Directory's 5th entry. It is important that for the Attribute Certificate Table no RVA is stored, rather a simple file offset. The format is concatenated signatures, each with the following structure:
 
{| class="wikitable"
! Offset
! Size
! Field
! Description
|-
| 0 || 4 || dwLength || Specifies the length of the attribute certificate entry.
|-
| 4 || 2 || wRevision || Contains the certificate version number, magic 0x0200 (WIN_CERT_REVISION_2_0)
|-
| 6 || 2 || wCertificateType || Specifies the type of content in bCertificate, magic 0x0002 (WIN_CERT_TYPE_PKCS_SIGNED_DATA)
|-
| 8 || x || bCertificate || Contains a PKCS#7 SignedData structure
|}
 
For [https://wiki.osdev.org/EFI#Secure_Boot Secure Boot] under [[EFI]] such a signature is a must. It worth nothing that the PE format allows multiple certificates to be embedded in a single PE file, but UEFI firmware implementations usually only '''allow one''', which must be signed by the Microsoft KEK. If the firmware allows installing more KEK (not typical), then you can use other certificates as well.
 
The bCertificate data is a PKCS#7 signature with certificate, encoded in ASN.1 format. Microsoft uses signtool.exe to create these signature entries, but an Open Source solution exists, called [https://git.kernel.org/pub/scm/linux/kernel/git/jejb/sbsigntools.git sbsigntool].
 
== CLI / .Net ==
Line 50 ⟶ 183:
4. Create a new thread at that address and begin executing!
 
To load a PE file that requires a dynamic DLL you can do the same, but check the Import Table (referred to by the data directory) to find what symbols and PE files are required, the Export Table (also referred to by the data directory) inside of that PE file to see where those symbols are and match them up once you've loaded that PE's sections into memory (and relocated them!) And lastly, beware that you'll have to recursively resolve each DLL's Import Tables aas swellwell, and some DLLs can use tricks to reference a symbol in the DLL loading it so make sure you don't get your loader stuck in a loop! Registering symbols loaded and making them global might be a good solution.
 
It may also be a good idea to check the ''Machine'' and ''Magic'' fields for validity, not just the PE signature. This way your loader won't try loading a 64 bit binary into 32 bit mode (this would be certain to cause an exception).
 
=64 bit PE=
64 bit PE's are extremely similar to normal PE's, but the machine type, if AMD64, is 0x8664, not 0x14c. This field is directly after the PE signature. The magic number also changes from 0x10b to 0x20b. The magic field is at the beginning of the optional header. Plus, the BaseOfData member of the optional header is non existent. This is because the ImageBase member is expanded to 64 bits. The BaseOfData is removed to make room for it
 
= See Also =
* [https://learn.microsoft.com/en-us/windows/win32/debug/pe-format Microsoft Learn PE Format Reference]
* MSDN Magazine: [http://msdn.microsoft.com/en-us/magazine/cc301805.aspx Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format]
* PE Specification: [http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx latest edition, OOXML format], [http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/pecoff.doc 1999 edition, DOC format]
Line 62 ⟶ 198:
* Someone's Perspective On PE32 Vs ELF32. - http://forum.osdev.org/viewtopic.php?f=1&t=17686&p=133835#p133835
* http://www.ntcore.com/exsuite.php - Great tool for Windows that lets you explore inside of a PE file (and .Net metadata) to understand how it's laid out.
* [http://www.brokenthorn.com/Resources/OSDevPE.html BrokenThorn on the topic]
* [https://www.openrce.org/reference_library/files/reference/PE%20Format.pdf Graphical layout of relevant structures]
* [https://code.google.com/p/corkami/wiki/PE101 Illustrative walkthrough of simple PE executable]
* [https://github.com/erocarrera/pefile pefile] - handy python module for inspecting & manipulating PE files
* [https://github.com/cubiclesoft/php-winpefile php-winpefile] - Powerful PHP command-line tool and several PHP classes for inspecting, manipulating, and even creating PE files and finding PE file artifacts.
* [https://github.com/cubiclesoft/windows-pe-artifact-library Windows PE Artifact Library] - Over 375 carefully curated PE file artifacts that extensively cover the PE32 and PE32+ formats (plus some DOS and Win16 NE) for multiple architectures. Useful for seeing examples of what a bound imports table looks like, if a specific flag is ever used, or pretty much any option of the PE format.
 
[[Category:Executable Formats]]
[[Category:Object Files]]
[[Category:UEFI]]
[[Category:Windows]]
 
[[de:Microsoft Portable Executable and Common Object File Format]]