GPU passthrough with libvirt qemu kvm
GPU passthrough is a technology that allows you to Article description::directly present an internal PCI GPU to a virtual machine. The device acts as if it were directly driven by the VM, and the VM detects the PCI device as if it were physically connected. GPU passthrough is also often known as IOMMU, although this is a bit of a misnomer, since the IOMMU is the hardware technology that provides this feature but also provides other features such as some protection from DMA attacks or ability to address 64-bit memory spaces with 32-bit addresses.
As you can imagine, the most common application for GPU passthrough at least gaming, since GPU passthrough allows a VM direct access to your graphics card with the end result of being able to play games with nearly the same performance as if you were running your game directly on your computer.
QEMU (Quick EMUlator) is a generic, open source hardware emulator and virtualization suite.
This article typically uses KVM as the accelerator of choice due to its GPL licensing and availability. Without KVM nearly all commands described here will still work (unless KVM specific).
Installation
BIOS and UEFI firmware
In order to utilize KVM either VT-x or AMD-V must be supported by the processor. VT-x or AMD-V are Intel and AMD's respective technologies for permitting multiple operating systems to concurrently execute operations on the processors.
To inspect hardware for visualization support issue the following command:
user $
grep --color -E "vmx|svm" /proc/cpuinfo
For a period manufacturers were shipping with virtualization turned off by default in the system BIOS
Hardware
- A CPU that supports Intel VT-d or AMD-Vi. Check List of compatible Intel CPUs (Intel VT-x and Intel VT-d)
- A motherboard that supports the aforementioned technologies. To find this out, check in your motherboard's BIOS configuration for an option to enable IOMMU or something similar. Chances are that your motherboard will support it if it's from 2013 or newer, but make sure to check since this is a niche technology and some manufacturers may save costs by axing it from their motherboards or delivering a defective implementation (such as Gigabyte's 2015-2016 series) simply because NORPs never use it.
- At least two GPUs: one for your physical OS, another for your VM. (You can in theory run your computer headless through SSH or a serial console, but it might not work and you risk locking yourself away from your computer if you do so).
- Optional but recommended: Additional monitor, keyboard and mouse
EFI configuration
Go into BIOS (EFI) settings and turn on VT-d and IOMMU support.
VT-d and Virtualization configuration params are same
Some EFI doesn't have IOMMU configuration settings
IOMMU
IOMMU – or input–output memory management unit – is a memory management unit (MMU) that connects a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. The IOMMU maps a device-visible virtual address ( I/O virtual address or IOVA) to a physical memory address. In other words, it translates the IOVA into a real physical address.
In an ideal world, every device has its own IOVA address space and no two devices share the same IOVA. But in practice this is often not the case. Moreover, the PCI-Express (PCIe) specifications allow PCIe devices to communicate with each other directly, called peer-to-peer transactions, thereby escaping the IOMMU.
That is where PCI Access Control Services (ACS) are called to the rescue. ACS is able to tell whether or not these peer-to-peer transactions are possible between any two or more devices, and can disable them. ACS features are implemented within the CPU and the chipset.
Unfortunately the implementation of ACS varies greatly between different CPU or chip-set models.
IOMMU kernel configuration
To enable IOMMU support in kernel:
Device Drivers ---> [*] IOMMU Hardware Support ---> Generic IOMMU Pagetable Support ---- [*] AMD IOMMU support <*> AMD IOMMU Version 2 driver [*] Support for Intel IOMMU using DMA Remapping Devices [*] Support for Shared Virtual Memory with Intel IOMMU [*] Enable Intel DMA Remapping Devices by default [*] Support for Interrupt Remapping
Re-build the kernel
Turn on IOMMU in GRUB. Edit file /etc/default/grub and add parameters to the kernel
/etc/default/grub
GRUB_CMDLINE_LINUX="... iommu=pt intel_iommu=on pcie_acs_override=downstream,multifunction ..."
If your system hangs after rebooting, check your bios and iommu settings.
And apply changes:
root #
grub-mkconfig -o /boot/grub/grub.cfg
Check, that IOMMU turned on and works
user $
dmesg | grep 'IOMMU enabled'
[ 0.000000] DMAR: IOMMU enabled
For CPU on XEN architecture you may run:
user $
lspci -vv | grep -i 'Access Control Services'
IOMMU groups
Passing through PCI or VGA devices requires you to pass through all devices within an IOMMU group. The exception to this rule are PCI root devices that reside in the same IOMMU group with the device(s) we want to pass through. These root devices cannot be passed through as they often perform important tasks for the host. A number of (Intel) CPUs, usually consumer-grade CPUs with integrated graphics (IGD), share a root device in the same IOMMU group as the first PCIe 16x slot.
user $
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;
... IOMMU Group 13 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1) IOMMU Group 15 02:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100] [1002:67c4] IOMMU Group 16 02:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0] ...
Nvidia in IOMMU Group 13 and AMD Video Card in IOMMU group 15 and 16. Everything looks fine. But if you have buggy IOMMU support and all devices within one IOMMU group, hardware can't guarantee good device isolation. Unfortunately, it is not possible to fix that. The only workaround - use ACS override patch witch ignore IOMMU hardware check. See ACS override patch.
Other devices in my IOMMU group
ACS override patch
root #
git clone https://github.com/feniksa/gentoo_ACS_override_patch.git /etc/portage/patches
Next re-emerge the kernel
root #
emerge gentoo-sources
* Applying 4400_alpha-sysctl-uac.patch (-p1) ... [ ok ] * Applying 4567_distro-Gentoo-Kconfig.patch (-p1) ... [ ok ] >>> Source unpacked in /var/tmp/portage/sys-kernel/gentoo-sources-4.14.52/work >>> Preparing source in /var/tmp/portage/sys-kernel/gentoo-sources-4.14.52/work/linux-4.14.52-gentoo ... * Applying override_for_missing_acs_capabilities.patch ... [ ok ] * User patches applied.
VFIO
Kernel drivers
Device Drivers ---> <M> VFIO Non-Privileged userpsace driver framework ---> [*] VFIO No-IOMMU support ---- <M> VFIO support for PCI devices [*] VFIO PCI support for VGA devices < > Mediated device driver framework
Search for your VGA card ids. Run
root #
lspci -nn
... 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1) 04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aaf8] ..
Add VGA PCI ids to VFIO
/etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:687f,1002:aaf8
Libvirt
Windows
Create windows 10 as usual via libvirt manager. Edit virtual image, click Add Hardware, select AMD Ati Vega 64 and AMD Ati device. Click Apply.
Now you can start Windows 10 guest OS.
Amd card have 2 devices on PCIe bus -> one video output and another is HDMI output. Windows drivers works only if KVM will bypass to windows both AMD devices.
Fixed Vega 56/64 reset bug
AMD Vega 56/64 is unable to initialize itself after Guest host shutdown/reboot, because drivers left card in "garbage" state. As workaround of this bug, VFIO should load AMD card ROM at guest startup. To do that:
- Install clear Windows 10 somewhere (not in libvirt. We need REAL windows10)
- Install all latest windows 10 updates
- Install AMD vga drivers
- Reboot
- Go again to real windows 10
- Install GPU-Z
- In GPU-Z in main tab, near BIOS version will be small button "Save ROM". Click it and save ROM somewhere. We will need this ROM for our gentoo and libvirt. For example, for my Vega64 I save ROM as Vega64.rom
- Boot into gentoo back
- mkdir /etc/firmware
- Copy to /etc/firmware your ROM file (in my case it is Vega64.rom)
- Go to /etc/libvirt/qemu
- Edit xml file with description of Win10 guest
- Find section with AMD Video Card device (not AMD HDMI. You can always re-check by lspci)
In my case:
/etc/libvirt/qemu/win10.xml
<syntaxhighlight lang="xml">... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> ...</syntaxhighlight>
14. Add path to vga rom
<rom bar='on' file='/etc/firmware/Vega64.rom'/>
So, it should be:
/etc/libvirt/qemu/win10.xml
<syntaxhighlight lang="xml">... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> <rom bar='on' file='/etc/firmware/Vega64.rom'/> </hostdev> ...</syntaxhighlight>
Fixed Navi reset bug
AMD Navi 10 series GPUs require a vendor specific reset procedure. According to AMD a PSP mode 2 reset should be enough however at this time the details of how to perform this are not available.
Instead kernel can signal the SMU to enter and exit BACO which has the same desired effect.
To apply workaround (for kernel 4.19.72. For newer kernel replace number 4.19.72 with newer kernel):
- Download patchset https://github.com/feniksa/gentoo_ACS_override_patch/blob/master/sys-kernel/gentoo-sources-4.19.72/navi_reset.patch
- Put patchset into: /etc/portage/patches/sys-kernel/gentoo-sources-4.19.72
- Re-emerge gentoo-sources package
root #
emerge gentoo-sources
- Re-compile kernel
Applied patchset contain custom logic for reset GPU.
Sound
root #
mkdir /home/qemu
root #
cp /home/<user>/.config/pulse /home/qemu
root #
chown qemu:qemu -R /home/qemu
Change the home directory for the qemu user:
root #
usermod -d /home/qemu qemu
Qemu
In case you want to use Qemu directly, here are some configurations to get you started. In general, as a typical qemu call will usually require many command-line flags, it is typically advised to place the qemu call in a bash script and to run it that way. Don't forget to make the file executable.
Minimal
This minimal configuration will simply boot into the bios - there aren't any drives connected so there is nothing else for qemu to do. However, this allows us to verify that the gpu passthrough is actually working.
MinimalPassthrough.sh
<syntaxhighlight lang="bash">#!/bin/bash exec qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -cpu host,kvm=off \ -m 8G \ -name "BlankVM" \ -smp cores=4 \ -device vfio-pci,host=09:00.0,x-vga=on,multifunction=on,romfile=GP107_patched.rom \ -device vfio-pci,host=09:00.1 \ -monitor stdio \ -nographic \ -vga none \ $@</syntaxhighlight>
Here's an explanation of each line:
- `-nodefaults` stops qemu from creating some default devices. Specifically, it creates a vga device by default, which interferes with our attempt to pass through the video card (if you have a multi-video card host this may not be an issue for you)
- `-enable-kvm` enables acceleration
- `-cpu host, kvm=off \` this makes the virtual machine match the cpu architecture of the host. Not really sure what `kvm=off` does...
- `-m 8G` give the guest 8 gigabytes of RAM
- `-name "BlankVM"` I guess it just gives the virtual machine a name
- `-smp cores=4` how many cores the guest should have. I'm matching the host.
- `-device vfio-pci,host=09:00.0...` add a device using vfio-pci kernel module, from the host's address "09:00.0"
- `...x-vga=on` this is an option for the vfio-pci module (I think)
- `...multifunction=on` since our card is doing both audio and video, it needs multifunction (I think...)
- `...romfile=GP107_patched.rom` due to known issues on NVIDIA cards, it may be necessary to use a modified vbios. This is how you make qemu use that modified vbios.
- `-device vfio-pci,host=09:00.1` just like above - this is the audio device that is in the same IOMMU group as the video device.
- `-monitor stdio` this will drop you into a qemu "command line" (they call it a monitor) once you launch the vm, allowing you to do things.
- `-vga none` this is probably redundant since we did "nodefaults"
As noted above, there are certain known issues with NVIDIA drivers. I used this tool to patch my vbios, after first downloading my vbios in windows 10 using this gpuz tool.
Linux Guest
Here is a slightly more complicated qemu call, that actually loads a gentoo vm.
GentooPassthrough.sh
<syntaxhighlight lang="bash">#!/bin/bash exec qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -cpu host,kvm=off,hv_vendor_id=1234567890ab \ -m 8G \ -name "Gentoo VM" \ -smp cores=4 \ -boot order=d \ -drive file=Gentoo_VM.img,if=virtio \ -monitor stdio \ -serial none \ -net nic \ -net user,hostfwd=tcp::50000-:22,hostfwd=tcp::50001-:5900,hostname=gentoo_qemu \ -nographic \ -vga none \ -device vfio-pci,host=09:00.0,x-vga=on,multifunction=on,romfile=GP107_patched.rom \ -device vfio-pci,host=09:00.1 \ -usb \ -device usb-host,vendorid=0x1532,productid=0x0101,id=mouse \ -device usb-host,vendorid=0x04f2,productid=0x0833,id=keyboard \ $@</syntaxhighlight>
Here's an explanation of the new configurion options:
- `...hv_vendor_id=...` despite the patched vbios, the nvidia driver still recognized that it is being run in a virtual machine and refuses to load. This "spoofs" the vendor id (somewhere) and tricks the driver
- `-boot order=d` boot the hard drive first
- `-drive file=Gentoo_VM.img,if=virtio` this is a drive that is emulated in th vm. The "Gentoo_VM.img" file is a qcow qemu-style virtual drive file.
- `-serial none` actually, I can't remember why I put this in there....
- `-net nic` create a ethernet in the guest vm
- `-net user,hostfwd...` forwards the ports from host 50000 and 50001 to the guest ports 22 and 5900. Now, from the host, you can ssh into the guest using `ssh -p 50000 myuser@127.0.0.1`, and if you have a vnc server running in the guest on port 5900, you can access it using port 50001 in the host
- `-nographic` this may not be needed if you have a dedicated graphics card for the guest
- `-usb` emulate a usb device on the guest
- `-device usb-host,...` these two lines forward the keyboard and mouse from the host to the guest. The vendorid and productid can be found using lsusb in the host.
Please note that without the `hv_vendor_id` portion, you can boot in and use the console in the guest with the forwarded graphics card. But whenever you launch X, which initialized the proprietary nvidia driver, it will fail.
See also
- QEMU — a generic, open source hardware emulator and virtualization suite.
External resources
- https://heiko-sieger.info/iommu-groups-what-you-need-to-consider/#What_is_IOMMU_and_why_do_I_need_it
- https://wiki.installgentoo.com/index.php/PCI_passthrough - PCI passthrough on gentoo
- https://www.reddit.com/r/VFIO/comments/ahg1ta/bsod_when_launching_gpuz/
- https://forum.level1techs.com/t/navi-reset-kernel-patch/147547
- https://forum.level1techs.com/t/linux-host-windows-guest-gpu-passthrough-reinitialization-fix/121097?source_topic_id=121737 - AMD GPU on windows guest
- https://www.reddit.com/r/VFIO/comments/baa8e3/issue_unable_to_power_on_device_stuck_in_d3/