VMX

From OSDev.wiki
Jump to navigation Jump to search
This page is a stub.
You can help the wiki by accurately adding more contents to it.

Intel's Virtual-Machine Extensions

Programming

Discovering support

After enumerating the features from CPUID, bit 5 in ECX will tell you if VMX is supported or not. This is the first thing that should be checked!

Basic environment

Software should enable bit 13 of CR4 (CR4.VMXE) first. Here are a few definitions of some used MSR's that we'll be dealing with:

 #define IA32_VMX_BASIC          0x480
 #define IA32_VMX_CR0_FIXED0     0x486
 #define IA32_VMX_CR0_FIXED1     0x487
 #define IA32_VMX_CR4_FIXED0     0x488
 #define IA32_VMX_CR4_FIXED1     0x489

In order to see what your OS needs to support, you need to read the lower 32-bits of IA32_VMX_CR0_FIXED0 and IA32_VMX_CR4_FIXED0. The bits in those MSR's are what need to be set (and supported) in their corresponding control registers (CR0/CR4). If any of those bits are not enabled, a GPF will occur. Also, to see what extra bits can be set, check the IA32_VMX_CR0_FIXED1 and IA32_VMX_CR4_FIXED1. It's basically a mask of bits, if a bit is set to 1, then it "can" be enabled, but if it's a 0, you may generate an exception if you enable the corresponding bit in the control registers.

From my own hardware and emulator (bochs), I've found that you will most likely need to enable these bits: CR0.NE, CR4.VMXE, CR0.PG and CR0.PE. For 64-bit kernels, you need to have IA32_EFER.LMA set.

Executing VMXON

The main entry point for using VMX is through the VMXON instruction. The instruction requires a single operand of a m64 region called the VMXON region. The memory region needs to be 4096-byte aligned (bits 0-11 must be 0) and the only VMCS field that should be modified is the VMCS revision identification field. This ID field should contain the value in bits 0-31 of MSR IA32_VMX_BASIC. In order to prepare a memory address in 32-bit PMode for use as an m64, some modifications need to be made. The upper 32-bits of the m64 on non long mode capable processors have to be 0 or an "invalid memory address" error will occur and a VMEXIT will be called.

 uint32_t * region = (uint32_t *)allocate_4k_aligned(4096);
 uint64_t region64 = (uint64_t)((size_t)(region) & 0xFFFFFFFF);
 asm volatile("  vmxon %0; "::"m" (region64));

This general process of taking a 32-bit memory address and turning it into a psuedo-64bit int (unsigned long long) will be used for all m64 operands later. VMCLEAR is another example instruction that requires the upper 32-bits of a memory address to be 0.

Long mode capable processors simply requires a 64-bit pointer to the region.

Note: The VMXON, VMCLEAR and VMPTRLD instruction must point to the physical address of their respective regions.

VMX instruction error checking

VMX instructions have their own error reporting mechanism to indicate the success or failure of a given operation.

There are two flags used to signify the success or failure of a VM instruction. The carry flag(CF) and the zero flag(ZF).

If both of these flags are clear after a VM instruction was executed then it succeeded.

If the carry flag is set then current VMCS pointer is invalid.

If the zero flag is set, it indicates that the VMCS pointer is valid but there is some other error specified in the VM-instruction error field (encoding 4400h). Error numbers are listed in section 5.4 of the Intel SDM Volume 2B.

The following table represents the error numbers in the VM-instruction error field:

Error Number Description
0x01 VMCALL executed in VMX root operation
0x02 VMCLEAR with invalid physical address.
0x03 VMCLEAR with VMXON pointer.
0x04 VMLAUNCH with non-clear VMCS.
0x05 VMRESUME with non-launched VMCS.
0x06 VMRESUME with a corrupted VMCS. Indicates corruption of the current VMCS
0x07 VM entry with invalid VMX-control field(s).
0x08 VM entry with invalid host-state field(s).
0x09 VMPTRLD with invalid physical address.
0x0A VMPTRLD with VMXON pointer.
0x0B VMPTRLD with incorrect VMCS revision identifier.
0x0C VMREAD/VMWRITE from/to unsupported VMCS component.
0x0D VMWRITE to read-only VMCS component.
0x0F VMXON executed in VMX root operation.
0x1A VM entry with events blocked by MOV SS.

VMCS

The VMCS only has two relevant fields that can be accessed at this time, a 4-byte VMCS revision ID located at byte offset 0 and a 4-byte abort indicator field located at byte offset 4. The rest of the VMCS is reserved for field data.

The revision field must be filled with the 32-bit revision Id stored in bits 0-31 of the IA32_VMX_BASIC MSR similar to the VMX region.

The abort field will simply contain a non-zero value if the VM abort occurs during a VMX exit, see section 23.7 in Intel's SDM 3B for error values.

A VMCS is loaded with the VMPTRLD instruction, which loads and activates a VMCS, and requires a 64-bit memory address as it's operand in the same format as VMXON/VMCLEAR.

 asm volatile ("vmptrld %0; ":: "m" (vmcsRegion64));

The structure of the VMCS is covered in detail in Chapter 20 of the Intel SDM volume 3B (see link below). Field encodings for VMWRITE and VMREAD are covered in Appendix H of the same manual.

Peripheral Emulation

IO framework emulation

In x86, there are two kinds of IO channels: Port-Based IO(aka PIO ) and Memory-Mapped IO(aka MMIO). PIO has separate address space and special instructions to do IO jobs. while with MMIO, the device IO space is backed with the memory address space, you can use memory data move instructions to do IO jobs.

PIO emulation

with Intel VT-x, the hypervisor is able to determine whether the guest's IO instructions trap into vmx root mode by setting the primary processor based control bit 24. if this bit is set, all the guest's IO instructions will causes vm exits. otherwise, you have to setup the two IO bitmap regions to capture the vm exits you are interested in.

the IO causes vm exit with basic reason number as 30, you can retrieve the IO operation size, direction, port id and etc. for more please refer to the vmx_pio.c in reference pages.

MMIO emulation

MMIO emulation in x86 is a bit different: we are going to exploit EPT in order to capture MMIO events. VMX provides two kinds of EPT involved vm exits: EPT violation and EPT misconfiguration. In general, when guest is accessing memory which is not backed correctly(e.g. the memory is writable but not readable!), VMX results in vm exits with EPT misconfiguration. the hypervisor must do the following steps to do MMIO operation:

 1). decode the memory move instruction to determine the memory involved instruction length, access size, direction, operations, registers index/immediates  and memory address,
 2). search the MMIO devices regions to see whether the address is backed with a DEVICE.
 3). store the result in destination  register if necessary.
 4). advance to next instruction by adding guest RIP with instruction length resolved in step 1.

Devices emulation examples

Device IO type Refference
Intel 8259 PIC PIO https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/device_8259pic.c
Intel 8253 PIT PIO https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/device_8253pit.c
Intel 8042 keyboard PIO https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/device_keyboard.c
serial port controller PIO https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/device_serial.c
16-colors video controller MMIO https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/device_video.c

References

Intel's SDM 3B: http://www.intel.com/Assets/PDF/manual/253669.pdf

Intel's SDM 2B: http://www.intel.com/Assets/PDF/manual/253667.pdf

KVM's VMX.c (GPLv2): http://lxr.free-electrons.com/source/arch/x86/kvm/vmx.c

BOCHS's VMX.c (LGPLv2): http://bochs.cvs.sourceforge.net/viewvc/bochs/bochs/cpu/vmx.cc

PIO sub handler: https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/vmx_pio.c

Memory move instruction decode: https://github.com/chillancezen/ZeldaOS.x86_64/blob/master/vm_monitor/vmx_instruction_decoding.c


Other examples

Vmx implementation in home made OS: http://www.dumais.io/index.php?article=ac3267239dd3e34c061de6413203fb98