Timekeeping in virtual machines: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
No edit summary
m (Bot: fixing lint errors, replacing obsolete HTML tags:)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{In Progress}}
{{In Progress}}

There are several ways to keep track of time in a VM, but they're either very slow (e.g. HPET) or do not work correctly if the VM is migrated (e.g. TSC).
There are several ways to keep track of time in a VM, but they're either very slow (e.g. HPET) or do not work correctly if the VM is migrated (e.g. TSC).


To work around this, VMs such as QEMU/KVM provide several ways keep track of time whilst sacrificing little performance.
To work around this, VMs such as QEMU/KVM provide several ways keep track of time whilst sacrificing little performance.

== KVM_HC_CLOCK_PAIRING ==
== KVM_HC_CLOCK_PAIRING ==

This hypercall is used to get the parameters to calculate a host's clock (KVM_CLOCK_PAIRING_WALLCLOCK for CLOCK_REALTIME).
This hypercall is used to get the parameters to calculate a host's clock (KVM_CLOCK_PAIRING_WALLCLOCK for CLOCK_REALTIME).


The host copies the following structure to a physical address given by the guest:
The host copies the following structure to a physical address given by the guest:
<syntaxhighlight lang="c">

<source lang="c">
struct kvm_clock_pairing {
struct kvm_clock_pairing {
s64 sec;
s64 sec;
Line 19: Line 15:
u32 pad[9];
u32 pad[9];
};
};
</syntaxhighlight>
</source>

A hypercall is performed with the `vmcall` instruction. On KVM, RBX, RCX, RDX and RSI are used for arguments, RAX as the hypercall number and as the return value. No other registers are clobbered (unless explicitly noted).
A hypercall is performed with the `vmcall` instruction. On KVM, RBX, RCX, RDX and RSI are used for arguments, RAX as the hypercall number and as the return value. No other registers are clobbered (unless explicitly noted).


For example, calling KVM_HC_CLOCK_PAIRING can be done as follows on x86_64:
For example, calling KVM_HC_CLOCK_PAIRING can be done as follows on x86_64:
<syntaxhighlight lang="asm">

<source lang="asm">
; rdi: physical address to copy structure to
; rdi: physical address to copy structure to
; rsi: clock type (KVM_CLOCK_PAIRING_WALLCLOCK = 0)
; rsi: clock type (KVM_CLOCK_PAIRING_WALLCLOCK = 0)
Line 34: Line 28:
vmcall
vmcall
ret
ret
</syntaxhighlight>
</source>

== pvclock ==
== pvclock ==

pvclock is a simple protocol and the fastest way to properly track system time in a VM.
pvclock is a simple protocol and the fastest way to properly track system time in a VM.


Line 44: Line 36:


The host will write the following structure to this address:
The host will write the following structure to this address:
<syntaxhighlight lang="C">

<source lang="C">
struct pvclock_vcpu_time_info {
struct pvclock_vcpu_time_info {
u32 version;
u32 version;
Line 56: Line 47:
u8 pad[2];
u8 pad[2];
};
};
</syntaxhighlight>
</source>

The host will automatically update this structure when necessary (e.g. when finishing a migration).
The host will automatically update this structure when necessary (e.g. when finishing a migration).


The system time <b>in nanoseconds</b> is calculated as such:
The system time <b>in nanoseconds</b> is calculated as such:
<syntaxhighlight lang="C">

<source lang="C">
time = rdtsc() - tsc_timestamp
time = rdtsc() - tsc_timestamp
if (tsc_shift >= 0)
if (tsc_shift >= 0)
Line 70: Line 59:
time = (time * tsc_to_system_mul) >> 32
time = (time * tsc_to_system_mul) >> 32
time = time + system_time
time = time + system_time
</syntaxhighlight>
</source>

The version field is used to detect when the structure has been / is being updated.
The version field is used to detect when the structure has been / is being updated.
If the version is odd an update is in progress and the guest must not read the other fields yet.
If the version is odd an update is in progress and the guest must not read the other fields yet.
== Hyper-V TSC page ==

<syntaxhighlight lang="C">
== Hyper-V TSC page ==

<source lang="C">
struct ms_hyperv_tsc_page {
struct ms_hyperv_tsc_page {
volatile u32 tsc_sequence;
volatile u32 tsc_sequence;
Line 85: Line 71:
u64 reserved2[509];
u64 reserved2[509];
};
};
</syntaxhighlight>
</source>

== See also ==
== See also ==

=== References ===
=== References ===

* [https://docs.kernel.org/virt/kvm/x86/hypercalls.html Linux KVM Hypercall]
* [https://docs.kernel.org/virt/kvm/x86/hypercalls.html Linux KVM Hypercall]
* [https://docs.kernel.org/virt/kvm/x86/msr.html KVM-specific MSRs]
* [https://docs.kernel.org/virt/kvm/x86/msr.html KVM-specific MSRs]
* [https://opensource.com/article/17/6/timekeeping-linux-vms An introduction to timekeeping in Linux VMs]
* [https://opensource.com/article/17/6/timekeeping-linux-vms An introduction to timekeeping in Linux VMs]

[[Category:Time]]
[[Category:Time]]
[[Category:Virtual]]
[[Category:Virtual]]

Latest revision as of 15:46, 9 June 2024

This page is a work in progress.
This page may thus be incomplete. Its content may be changed in the near future.

There are several ways to keep track of time in a VM, but they're either very slow (e.g. HPET) or do not work correctly if the VM is migrated (e.g. TSC).

To work around this, VMs such as QEMU/KVM provide several ways keep track of time whilst sacrificing little performance.

KVM_HC_CLOCK_PAIRING

This hypercall is used to get the parameters to calculate a host's clock (KVM_CLOCK_PAIRING_WALLCLOCK for CLOCK_REALTIME).

The host copies the following structure to a physical address given by the guest:

struct kvm_clock_pairing {
    s64 sec;
    s64 nsec;
    u64 tsc;
    u32 flags;
    u32 pad[9];
};

A hypercall is performed with the `vmcall` instruction. On KVM, RBX, RCX, RDX and RSI are used for arguments, RAX as the hypercall number and as the return value. No other registers are clobbered (unless explicitly noted).

For example, calling KVM_HC_CLOCK_PAIRING can be done as follows on x86_64:

; rdi: physical address to copy structure to
; rsi: clock type (KVM_CLOCK_PAIRING_WALLCLOCK = 0)
kvm_hc_clock_pairing:
    mov eax, 9   ; KVM_HC_CLOCK_PAIRING
    mov rbx, rdi
    mov rcx, rsi
    vmcall
    ret

pvclock

pvclock is a simple protocol and the fastest way to properly track system time in a VM.

To use it, write a 64-bit 4-byte aligned physical address with bit 0 set to 1 to MSR_KVM_SYSTEM_TIME_NEW (0x4b564d01). The presence of this MSR is indicated by bit 3 in EAX from leaf 0x4000001 of CPUID.

The host will write the following structure to this address:

struct pvclock_vcpu_time_info {
    u32 version;
    u32 pad0;
    u64 tsc_timestamp;
    u64 system_time;
    u32 tsc_to_system_mul;
    s8 tsc_shift;
    u8 flags;
    u8 pad[2];
};

The host will automatically update this structure when necessary (e.g. when finishing a migration).

The system time in nanoseconds is calculated as such:

time = rdtsc() - tsc_timestamp
if (tsc_shift >= 0)
    time <<= tsc_shift;
else
    time >>= -tsc_shift;
time = (time * tsc_to_system_mul) >> 32
time = time + system_time

The version field is used to detect when the structure has been / is being updated. If the version is odd an update is in progress and the guest must not read the other fields yet.

Hyper-V TSC page

struct ms_hyperv_tsc_page {
    volatile u32 tsc_sequence;
    u32 reserved1;
    volatile u64 tsc_scale;
    volatile s64 tsc_offset;
    u64 reserved2[509];
};

See also

References