Timekeeping in virtual machines: Difference between revisions
[unchecked revision] | [unchecked revision] |
m Bot: Replace deprecated source tag with syntaxhighlight |
m Bot: fixing lint errors, replacing obsolete HTML tags: |
||
Line 1: | Line 1: | ||
{{In Progress}} |
{{In Progress}} |
||
There are several ways to keep track of time in a VM, but they're either very slow (e.g. HPET) or do not work correctly if the VM is migrated (e.g. TSC). |
There are several ways to keep track of time in a VM, but they're either very slow (e.g. HPET) or do not work correctly if the VM is migrated (e.g. TSC). |
||
To work around this, VMs such as QEMU/KVM provide several ways keep track of time whilst sacrificing little performance. |
To work around this, VMs such as QEMU/KVM provide several ways keep track of time whilst sacrificing little performance. |
||
== KVM_HC_CLOCK_PAIRING == |
== KVM_HC_CLOCK_PAIRING == |
||
This hypercall is used to get the parameters to calculate a host's clock (KVM_CLOCK_PAIRING_WALLCLOCK for CLOCK_REALTIME). |
This hypercall is used to get the parameters to calculate a host's clock (KVM_CLOCK_PAIRING_WALLCLOCK for CLOCK_REALTIME). |
||
The host copies the following structure to a physical address given by the guest: |
The host copies the following structure to a physical address given by the guest: |
||
<syntaxhighlight lang="c"> |
<syntaxhighlight lang="c"> |
||
struct kvm_clock_pairing { |
struct kvm_clock_pairing { |
||
Line 20: | Line 16: | ||
}; |
}; |
||
</syntaxhighlight> |
</syntaxhighlight> |
||
A hypercall is performed with the `vmcall` instruction. On KVM, RBX, RCX, RDX and RSI are used for arguments, RAX as the hypercall number and as the return value. No other registers are clobbered (unless explicitly noted). |
A hypercall is performed with the `vmcall` instruction. On KVM, RBX, RCX, RDX and RSI are used for arguments, RAX as the hypercall number and as the return value. No other registers are clobbered (unless explicitly noted). |
||
For example, calling KVM_HC_CLOCK_PAIRING can be done as follows on x86_64: |
For example, calling KVM_HC_CLOCK_PAIRING can be done as follows on x86_64: |
||
<syntaxhighlight lang="asm"> |
<syntaxhighlight lang="asm"> |
||
; rdi: physical address to copy structure to |
; rdi: physical address to copy structure to |
||
Line 35: | Line 29: | ||
ret |
ret |
||
</syntaxhighlight> |
</syntaxhighlight> |
||
== pvclock == |
== pvclock == |
||
pvclock is a simple protocol and the fastest way to properly track system time in a VM. |
pvclock is a simple protocol and the fastest way to properly track system time in a VM. |
||
Line 44: | Line 36: | ||
The host will write the following structure to this address: |
The host will write the following structure to this address: |
||
<syntaxhighlight lang="C"> |
<syntaxhighlight lang="C"> |
||
struct pvclock_vcpu_time_info { |
struct pvclock_vcpu_time_info { |
||
Line 57: | Line 48: | ||
}; |
}; |
||
</syntaxhighlight> |
</syntaxhighlight> |
||
The host will automatically update this structure when necessary (e.g. when finishing a migration). |
The host will automatically update this structure when necessary (e.g. when finishing a migration). |
||
The system time <b>in nanoseconds</b> is calculated as such: |
The system time <b>in nanoseconds</b> is calculated as such: |
||
<syntaxhighlight lang="C"> |
<syntaxhighlight lang="C"> |
||
time = rdtsc() - tsc_timestamp |
time = rdtsc() - tsc_timestamp |
||
Line 71: | Line 60: | ||
time = time + system_time |
time = time + system_time |
||
</syntaxhighlight> |
</syntaxhighlight> |
||
The version field is used to detect when the structure has been / is being updated. |
The version field is used to detect when the structure has been / is being updated. |
||
If the version is odd an update is in progress and the guest must not read the other fields yet. |
If the version is odd an update is in progress and the guest must not read the other fields yet. |
||
⚫ | |||
⚫ | |||
<syntaxhighlight lang="C"> |
<syntaxhighlight lang="C"> |
||
struct ms_hyperv_tsc_page { |
struct ms_hyperv_tsc_page { |
||
Line 86: | Line 72: | ||
}; |
}; |
||
</syntaxhighlight> |
</syntaxhighlight> |
||
== See also == |
== See also == |
||
=== References === |
=== References === |
||
* [https://docs.kernel.org/virt/kvm/x86/hypercalls.html Linux KVM Hypercall] |
* [https://docs.kernel.org/virt/kvm/x86/hypercalls.html Linux KVM Hypercall] |
||
* [https://docs.kernel.org/virt/kvm/x86/msr.html KVM-specific MSRs] |
* [https://docs.kernel.org/virt/kvm/x86/msr.html KVM-specific MSRs] |
||
* [https://opensource.com/article/17/6/timekeeping-linux-vms An introduction to timekeeping in Linux VMs] |
* [https://opensource.com/article/17/6/timekeeping-linux-vms An introduction to timekeeping in Linux VMs] |
||
[[Category:Time]] |
[[Category:Time]] |
||
[[Category:Virtual]] |
[[Category:Virtual]] |
Latest revision as of 15:46, 9 June 2024
There are several ways to keep track of time in a VM, but they're either very slow (e.g. HPET) or do not work correctly if the VM is migrated (e.g. TSC).
To work around this, VMs such as QEMU/KVM provide several ways keep track of time whilst sacrificing little performance.
KVM_HC_CLOCK_PAIRING
This hypercall is used to get the parameters to calculate a host's clock (KVM_CLOCK_PAIRING_WALLCLOCK for CLOCK_REALTIME).
The host copies the following structure to a physical address given by the guest:
struct kvm_clock_pairing {
s64 sec;
s64 nsec;
u64 tsc;
u32 flags;
u32 pad[9];
};
A hypercall is performed with the `vmcall` instruction. On KVM, RBX, RCX, RDX and RSI are used for arguments, RAX as the hypercall number and as the return value. No other registers are clobbered (unless explicitly noted).
For example, calling KVM_HC_CLOCK_PAIRING can be done as follows on x86_64:
; rdi: physical address to copy structure to
; rsi: clock type (KVM_CLOCK_PAIRING_WALLCLOCK = 0)
kvm_hc_clock_pairing:
mov eax, 9 ; KVM_HC_CLOCK_PAIRING
mov rbx, rdi
mov rcx, rsi
vmcall
ret
pvclock
pvclock is a simple protocol and the fastest way to properly track system time in a VM.
To use it, write a 64-bit 4-byte aligned physical address with bit 0 set to 1 to MSR_KVM_SYSTEM_TIME_NEW (0x4b564d01
).
The presence of this MSR is indicated by bit 3 in EAX from leaf 0x4000001 of CPUID.
The host will write the following structure to this address:
struct pvclock_vcpu_time_info {
u32 version;
u32 pad0;
u64 tsc_timestamp;
u64 system_time;
u32 tsc_to_system_mul;
s8 tsc_shift;
u8 flags;
u8 pad[2];
};
The host will automatically update this structure when necessary (e.g. when finishing a migration).
The system time in nanoseconds is calculated as such:
time = rdtsc() - tsc_timestamp
if (tsc_shift >= 0)
time <<= tsc_shift;
else
time >>= -tsc_shift;
time = (time * tsc_to_system_mul) >> 32
time = time + system_time
The version field is used to detect when the structure has been / is being updated. If the version is odd an update is in progress and the guest must not read the other fields yet.
Hyper-V TSC page
struct ms_hyperv_tsc_page {
volatile u32 tsc_sequence;
u32 reserved1;
volatile u64 tsc_scale;
volatile s64 tsc_offset;
u64 reserved2[509];
};