Virtio

From OSDev.wiki
Jump to navigation Jump to search

VirtIO is a standardized interface which allows virtual machines access to simplified "virtual" devices, such as block devices, network adapters and consoles. Accessing devices through VirtIO on a guest VM improves performance over more traditional "emulated" devices, as VirtIO devices require only the bare minimum setup and configuration needed to send and receive data, while the host machine handles the majority of the setup and maintenance of the actual physical hardware.

Technical Details

VirtIO devices appear, to the guest VM, to be normal PCI devices with a specific VendorID and DeviceID. All VirtIO devices have a Vendor ID of 0x1AF4, and have a DeviceID between 0x1000 and 0x103F. The type of VirtIO device (Network Adapter, Block Device, etc.) can be determined by the Subsystem ID field in the PCI Configuration Space for the device. The currently defined types are:

Subsystem ID Name
01 Network Card
02 Block Device
03 Console
04 Entropy Source
05 Memory Ballooning
06 IO Memory
07 RPMSG
08 SCSI Host
09 9P Transport
10 MAC802.11 WLAN

I/O Registers

Based on the PCI subsystem ID, the I/O mapped registers for the device can be determined. All devices have a common "header" block of registers:

Offset (Hex) Name
00 Device Features
04 Guest Features
08 Queue Address
0C Queue Size
0E Queue Select
10 Queue Notify
12 Device Status
13 ISR Status

The Device Features register is pre-configured by the device, and includes flags to notify the guest VM what features are supported by the device. The Guest Features register is used by the guest VM to communicate the features that the guest VM driver supports. This allows both the host and the guest to maintain both backward and forward compatibility.

The Device Status field is used by the guest VM to communicate the current state of the guest VM driver. The flags in this register designate when the driver has found the device, when the driver has determined that the device is supported, and when the all of the necessary registers have been configured by the guest driver, and communication between the guest and host may begin.

Device Specific Registers

Immediately after the common registers above, any device specific registers are located (at offset 0x14).

Network Device Registers

Offset (Hex) Name
14 MAC Address 1
15 MAC Address 2
16 MAC Address 3
17 MAC Address 4
18 MAC Address 5
19 MAC Address 6
1A Status

Configuration

All VirtIO devices use one or more ring buffers on the guest machine to communicate with the host machine. These buffers are added to virtual queues in memory, and each device has a predefined number of queues. For instance, the VirtIO Network device has 2 mandatory queues (the receive queue and the send queue), and one optional queue (the control queue). The optional control queue must be supported by both the host and guest (i.e. feature bit must be set in both the Device Features register and the Guest Features register), and allows the guest to control network specific features on the host machine, like promiscuous mode and hardware checksum calculations.

Each queue must be configured by the guest operating system before the device can be enabled. The guest can determine the necessary memory needed by the queue by setting the Queue Select register to the desired queue index, and then reading the Queue Size register. Once the virtual queue has been created in memory, its address is written to the Queue Address register. This value written to this register is the address of the queue divided by 4096, which means that the virtual queue must be aligned on a 4096 byte boundary.

Virtual Queue Descriptor

struct VirtualQueue
{
  struct Buffers[QueueSize]
  {
    long Address; // 64-bit address of the buffer on the guest machine.
    int Length;   // 32-bit length of the buffer.
    short Flags;  // 1:Next field contains linked buffer index.  2:Buffer is write-only (clear for read-only).  4:Buffer contains additional buffer addresses.
    short Next;   // If flag is set, contains index of next buffer in chain.
  }
  struct Available
  {
    short Flags;            // 1: Do not trigger interrupts.
    short Index;            // Index of the next ring index to be used.  (Last available ring buffer index+1)
    short[QueueSize] Ring;  // List of available buffer indexes from Buffers array above.
    short InterruptIndex;   // If enabled, device will trigger interrupt after this ring index has been processed.
  }
  byte[] Padding;  // Reserved
  // 4096 byte alignment
  struct Used
  {
    short Flags;            // 1: Do not notify device when buffers are added to available ring.
    short Index;            // Index of the next ring index to be used.  (Last used ring buffer index+1)
    struct Ring[QueueSize]
    {
      int Index;  // Index of the used buffer in the Buffers array above.
      int Length; // Total bytes written to buffer.
    }
    short InterruptIndex;   // If enabled, device will trigger interrupt after this ring index has been used.
  }
}