PE: Difference between revisions

3,489 bytes added ,  29 days ago
m
[unchecked revision][unchecked revision]
(Add PE101 reference (Illustrative walkthrough of simple PE executable))
 
(16 intermediate revisions by 12 users not shown)
Line 1:
{{File formats}}
 
With Windows 959x/[[Windows NT|NT]], a new executable file type was required. Thus was born the "PE" Portable Executable, which is still in use. Unlike its predecessors, WIN-PE is a true 32bit file format, supporting relocatable code. It does distinguish between TEXT, DATA, and BSS. It is, in fact, a bastardized version of the [[COFF]] format.
 
If you did set up a [[Cygwin]] environment on your Windows machine, "PE" is the target format for your Cygwin GCC toolchain, which causes the unaware some headache when trying to link parts build under Cygwin with parts build under Linux or BSD (which use the ELF target by default). (Hint: You have to build a [[GCC Cross-Compiler]])
 
The PE format is used by Windows 95 and higher, Windows NT 3.1 and higher, the Mobius, ReactOS and ReactOS[[UEFI]]. The PE format is also used as a container for .NET assemblies by both the Microsoft .Net CLI and Mono, in which case it doesn't actually store executable data but .Net metadata with IL attached to it.
 
= Inside the PE file =
Line 17:
 
== DOS Stub ==
The PE format begins with a MS-DOS stub (a header plus executable code) which makes it a valid MS-DOS executable. The MS-DOS header begins with the magic code 0x5A4D and is 64 bytes long, followed by real-mode executable code. The standard stub used almost universally is 128-bytes long (including header and executable code) and simply outputs "This program cannot be run in DOS mode." Despite many utilities that with PE files are hard coded to expect the PE header to start at exactly 128 bytes in, this is incorrect since in some linkers, including Microsoft's own [[Link]], it is possible to replace the MS-DOS stub with one of your own choosing, and many older programs did this to allow the developer to bundle a MS-DOS and Windows version into a single file. The correct way is to read a formerly reserved 4-byte address inside the MS-DOS header located at 0x3C (field commonly known as e_lfanew) which contains the address at which PE file signature is found, and PE file header follows immediately. Usually this is a pretty standard value (most of the time this field is set to 0xE8 by the default link.exe stub). Microsoft seemingly recommends aligning the PE header on an 8 byte boundary (https://web.archive.org/web/20160609191558/http://msdn.microsoft.com/en-us/gg463119.aspx, page 10, figure 1).
 
== PE header ==
The PE header contains information that concerns the entire file rather than individual pieces that will be coming up later. The bare minimum header contains a 4-byte signature (0x00004550), the machine type/architecture of the executable code inside, a time stamp, a pointer to symbols, as well as various flags (is the file an executable, DLL, can the application handle addresses above 2GB, does the file needed be copy to the swap file if ran from a removable device, etc). Unless you're using a really stripped down statically linked PE file to save memory with a hard coded entry point and no resources, then the PE header alone isn't enough.
 
<sourcesyntaxhighlight lang="c">
// 1 byte aligned
struct PeHeader {
Line 34:
uint16_t mCharacteristics;
};
</syntaxhighlight>
</source>
 
=== Optional header ===
Line 41:
Part of the optional header is NT-specific. This include the subsystem (console, driver, or GUI application), how much stack and heap space to reserve, and the minimum required Operating System, subsystem and Windows version. You can use your own values for all of these depending on the needs of your OS.
 
<sourcesyntaxhighlight lang="c">
// 1 byte aligned
struct Pe32OptionalHeader {
Line 75:
uint32_t mNumberOfRvaAndSizes;
};
</syntaxhighlight>
</source>
 
=== Data Directories ===
While technically part of the optional header and follows directly after it is a list of entries pointing to data directories (only in executable images and DLLs). Because the optional header can vary in size, you only need to pay attention to the directories that exist and you expect, since it is likely that new data directories will be added to the PE specification in the future (.Net is an example of one that was recently added). Each data directory is referenced as an 8-byte entry in the optional header. The first 4 bytes is the Relative Virtual Address, or RVA (see Sections below), of the directory, and the last 4 bytes is the size of the directory.
 
Each data directory that the entries point to have their own format. Data directories are used to describe import tables for dynamic linking, a table of resources that are embedded inside of the PE file, debug information (line numbers and break points), the CLI .Net header.
 
{| class="wikitable"
! Position (PE/PE32+)
! Section
|-
| 96/112 || The export table address and size. Same format as .edata
|-
| 104/120 || The import table address and size. Same format as .idata
|-
| 112/128 || The resource table address and size. Same format as .rsc
|-
| 120/136 || The exception table address and size. Same format as .pdata
|-
| 128/144 || The attribute certificate table offset (not RVA) and size. See [https://wiki.osdev.org/PE#Signed_PE_with_Attribute_Certificate_Table Signed PE] below
|-
| 136/152 || The base relocation table address and size. Same format as .reloc
|-
| 144/160 || The debug data starting address and size. Same format as .debug
|-
| 152/168 || Architecture, reserved MBZ
|-
| 160/176 || Global Ptr, the RVA of the value to be stored in the global pointer register. The size member of this structure must be set to zero.
|-
| 168/184 || The thread local storage (TLS) table address and size. Same format as .tls
|}
 
== Sections ==
Line 90 ⟶ 115:
Each section has an entry in section header table.
 
<sourcesyntaxhighlight lang="c">
struct IMAGE_SECTION_HEADER { // size 40 bytes
char[8] mName;
Line 97 ⟶ 122:
uint32_t mSizeOfRawData;
uint32_t mPointerToRawData;
uint32_t mPointerToRawDatamPointerToRelocations;
uint32_t mPointerToRealocations;
uint32_t mPointerToLinenumbers;
uint16_t mNumberOfRealocationsmNumberOfRelocations;
uint16_t mNumberOfLinenumbers;
uint32_t mCharacteristics;
};
</syntaxhighlight>
</source>
 
===In asm linkage===
if in nasm you declare a block of code like this:
<sourcesyntaxhighlight lang="asm">
segment .code
aAsmFunction:
Line 116 ⟶ 140:
segment .data
aData: db 0xFF
</syntaxhighlight>
</source>
The ''segments'' will apear as ''sections''. Using this it is possible to keep C and Asm seperate, as a linker will not automatically merge ''.code'' and ''.text'', which is the normal output by C compilers.
=== Position Independent Code ===
Line 122 ⟶ 146:
 
Because addresses can point across section borders, relocations should be done after each section is loaded into memory. Then reiterate over each section, iterate through each address in the Relocation Table, find out what section that RVA exists in and add/subtract the offset between that section's linked virtual address and the section's virtual address you loaded it into.
 
== Signed PE with Attribute Certificate Table ==
Many PE executable (most notably all Microsoft updates) are signed with a certificate. These information is stored in the Attribute Certificate Table, pointed by the Data Directory's 5th entry. It is important that for the Attribute Certificate Table no RVA is stored, rather a simple file offset. The format is concatenated signatures, each with the following structure:
 
{| class="wikitable"
! Offset
! Size
! Field
! Description
|-
| 0 || 4 || dwLength || Specifies the length of the attribute certificate entry.
|-
| 4 || 2 || wRevision || Contains the certificate version number, magic 0x0200 (WIN_CERT_REVISION_2_0)
|-
| 6 || 2 || wCertificateType || Specifies the type of content in bCertificate, magic 0x0002 (WIN_CERT_TYPE_PKCS_SIGNED_DATA)
|-
| 8 || x || bCertificate || Contains a PKCS#7 SignedData structure
|}
 
For [https://wiki.osdev.org/EFI#Secure_Boot Secure Boot] under [[EFI]] such a signature is a must. It worth nothing that the PE format allows multiple certificates to be embedded in a single PE file, but UEFI firmware implementations usually only '''allow one''', which must be signed by the Microsoft KEK. If the firmware allows installing more KEK (not typical), then you can use other certificates as well.
 
The bCertificate data is a PKCS#7 signature with certificate, encoded in ASN.1 format. Microsoft uses signtool.exe to create these signature entries, but an Open Source solution exists, called [https://git.kernel.org/pub/scm/linux/kernel/git/jejb/sbsigntools.git sbsigntool].
 
== CLI / .Net ==
Line 137 ⟶ 183:
4. Create a new thread at that address and begin executing!
 
To load a PE file that requires a dynamic DLL you can do the same, but check the Import Table (referred to by the data directory) to find what symbols and PE files are required, the Export Table (also referred to by the data directory) inside of that PE file to see where those symbols are and match them up once you've loaded that PE's sections into memory (and relocated them!) And lastly, beware that you'll have to recursively resolve each DLL's Import Tables aas swellwell, and some DLLs can use tricks to reference a symbol in the DLL loading it so make sure you don't get your loader stuck in a loop! Registering symbols loaded and making them global might be a good solution.
 
It may also be a good idea to check the ''Machine'' and ''Magic'' fields for validity, not just the PE signature. This way your loader won't try loading a 64 bit binary into 32 bit mode (this would be certain to cause an exception).
 
=64 bit PE=
64 bit PE's are extremely similar to normal PE's, but the machine type, if AMD64, is 0x8664, not 0x14c. This field is directly after the PE signature. The magic number also changes from 0x10b to 0x20b. The magic field is at the beginning of the optional header. Plus, the BaseOfData member of the optional header is non existent. This is because the ImageBase member is expanded to 64 bits. The BaseOfData is removed to make room for it
 
Also several fields have been expanded to 64 bits (but not RVAs or offsets). An example of these is the Preffered Base Address
 
= See Also =
* [https://learn.microsoft.com/en-us/windows/win32/debug/pe-format Microsoft Learn PE Format Reference]
* MSDN Magazine: [http://msdn.microsoft.com/en-us/magazine/cc301805.aspx Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format]
* PE Specification: [http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx latest edition, OOXML format], [http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/pecoff.doc 1999 edition, DOC format]
Line 157 ⟶ 202:
* [https://code.google.com/p/corkami/wiki/PE101 Illustrative walkthrough of simple PE executable]
* [https://github.com/erocarrera/pefile pefile] - handy python module for inspecting & manipulating PE files
* [https://github.com/cubiclesoft/php-winpefile php-winpefile] - Powerful PHP command-line tool and several PHP classes for inspecting, manipulating, and even creating PE files and finding PE file artifacts.
* [https://github.com/cubiclesoft/windows-pe-artifact-library Windows PE Artifact Library] - Over 375 carefully curated PE file artifacts that extensively cover the PE32 and PE32+ formats (plus some DOS and Win16 NE) for multiple architectures. Useful for seeing examples of what a bound imports table looks like, if a specific flag is ever used, or pretty much any option of the PE format.
 
[[Category:Executable Formats]]
[[Category:Object Files]]
[[Category:UEFI]]
[[Category:Windows]]
 
[[de:Microsoft Portable Executable and Common Object File Format]]