PCI Passthrough error 'group x is not viable' 2

 1Failed to build and run instance: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2022-08-05T12:05:46.630755Z qemu-system-x86_64: -device vfio-pci,host=0000:c3:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio 0000:c3:00.0: group 15 is not viable
 2Traceback (most recent call last):
 3  File "/usr/local/lib/python3.8/dist-packages/nova/compute/manager.py", line 2398, in _build_and_run_instance
 4    self.driver.spawn(context, instance, image_meta,
 5  File "/usr/local/lib/python3.8/dist-packages/nova/virt/libvirt/driver.py", line 4225, in spawn
 6    self._create_guest_with_network(
 7  File "/usr/local/lib/python3.8/dist-packages/nova/virt/libvirt/driver.py", line 7293, in _create_guest_with_network
 8    self._cleanup(
 9  File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
10    self.force_reraise()
11  File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
12    raise self.value
13  File "/usr/local/lib/python3.8/dist-packages/nova/virt/libvirt/driver.py", line 7262, in _create_guest_with_network
14    guest = self._create_guest(
15  File "/usr/local/lib/python3.8/dist-packages/nova/virt/libvirt/driver.py", line 7202, in _create_guest
16    guest.launch(pause=pause)
17  File "/usr/local/lib/python3.8/dist-packages/nova/virt/libvirt/guest.py", line 168, in launch
18    LOG.exception('Error launching a defined domain with XML: %s',
19  File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
20    self.force_reraise()
21  File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
22    raise self.value
23  File "/usr/local/lib/python3.8/dist-packages/nova/virt/libvirt/guest.py", line 165, in launch
24    return self._domain.createWithFlags(flags)
25  File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 193, in doit
26    result = proxy_call(self._autowrap, f, *args, **kwargs)
27  File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 151, in proxy_call
28    rv = execute(f, *args, **kwargs)
29  File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 132, in execute
30    six.reraise(c, e, tb)
31  File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
32    raise value
33  File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 86, in tworker
34    rv = meth(*args, **kwargs)
35  File "/usr/lib/python3/dist-packages/libvirt.py", line 1265, in createWithFlags
36    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
37libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2022-08-05T12:05:46.630755Z qemu-system-x86_64: -device vfio-pci,host=0000:c3:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio 0000:c3:00.0: group 15 is not viable
38Please ensure all devices within the iommu_group are bound to their vfio bus driver.
39
40Successfully unplugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:7d:97:c6,bridge_name='br-int',has_traffic_filtering=True,id=3860b4d7-48af-4e84-905e-514a7ab8c14f,network=Network(955e9ddc-604d-41dd-b2c5-df54c417615b),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap3860b4d7-48')
41default default] [instance: fd3de719-0fa5-44f6-ab75-bf35034d0726] Took 0.25 seconds to deallocate network for instance.
42default default] Deleted allocations for instance fd3de719-0fa5-44f6-ab75-bf35034d0726
1#!/bin/bash
2# change the 999 if needed
3shopt -s nullglob
4for d in /sys/kernel/iommu_groups/{0..999}/devices/*; do
5n=${d#*/iommu_groups/*}; n=${n%%/*}
6printf 'IOMMU Group %s ' "$n"
7lspci -nns "${d##*/}"
8done;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
sudo lspci -nnv | grep "c1:00.1" -i -A30
c1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:14ad]
	Flags: bus master, fast devsel, latency 0, IRQ 357, NUMA node 0
	Memory at c5080000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [160] Data Link Feature <?>
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

/etc/default/grub Set line to GRUB_CMDLINE_LINUX_DEFAULT=amd_iommu=on iommu=pt kvm.ignore_msrs=1 vfio-pci.ids=10de:1aef,10de:2230,10de:2231,10de:24B0,10de:228b adding the 10de:228b /etc/modprobe.d/vfio.conf to options vfio-pci ids=10de:1aef,10de:2230,10de:2231,10de:24B0,10de:228b

update-grub2 reboot

Looks like the libvirt process cant pass the whole iommu group through to the guest VM becuase the nvidia audio device associated with the GPU on the A4000 has a different PCI ID to the A5000 and A6000, so it hasnt been added to the vfio driver. Thus when KVM tries to passthrough the GPU and the associated Audio device it fails becuase the Nvidia driver has the device locked. Ive made a change to the vfio.conf and I'm rolling it out now to test if it works

After the fix applied

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
ubuntu@ai-pro-72:~$ sudo lspci -nnv | grep "c1:00.1" -A30
c1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:147e]
	Flags: fast devsel, IRQ 255, NUMA node 0
	Memory at cb080000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [160] Data Link Feature <?>
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel