vGPU - 代码之美

pve8.1实现Intel显卡虚拟化

参考：
https://gist.github.com/scyto/e4e3de35ee23fdb4ae5d5a3b85c16ed3
https://www.derekseaman.com/2023/11/proxmox-ve-8-1-windows-11-vgpu-vt-d-passthrough-with-intel-alder-lake.html

配置要点：
– vGPU要设为主GPU
– Windows系统直接下载Intel官方显卡驱动
驱动名称为Intel Arc & Iris Xe Graphic,下载地址:https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
– 更新固件到最新

# 这个是12代CPU用的固件
wget -r -nd -e robots=no -A '*.bin' --accept-regex '/plain/' https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915/adlp_dmc.bin

cp adlp_dmc.bin /lib/firmware/i915/

设置内核参数
>GRUB_CMDLINE_LINUX_DEFAULT=”quiet iommu=pt intel_iommu=on i915.enable_guc=2 i915.max_vfs=3″
映射vGPU

我这里开启2个vGPU
添加pci硬件

2023年11月30日

pve(KVM)中抑制ignored rdmsr消息

这个问题跟nvidia显卡虚拟化有关，执行以下命令可解决
echo “options vfio_iommu_type1 allow_unsafe_interrupts=1” > /etc/modprobe.d/iommu_unsafe_interrupts.conf && update-initramfs -u -k all

参考：
https://pve.proxmox.com/wiki/PCI_Passthrough#NVIDIA_Tips

2023年10月13日

PVE安装Tesla P4并实现GPU虚拟化

硬件介绍

P4的参数相当于GTX1080，是一个半高单槽的卡，另外P100(HBM2)和P40(GDDR5X)是双槽全高的卡，具体比较可以看这里：

https://developer.aliyun.com/article/753454
https://cloud-atlas.readthedocs.io/zh_CN/latest/machine_learning/hardware/nvidia_gpu/tesla_p10.html

P4适用于1U的服务器，P40和P100适用于2U的机器，价格比P4要贵一倍多，在HP 360 Gen9服务器上P4可以直接安装。

关于HP DL360 Gen9服务器的介绍，这个是别人写的，我贴过来：
https://cloud-atlas.readthedocs.io/zh_CN/latest/linux/server/hardware/hpe/hpe_dl360_gen9.html
360GGen9的扩展能力比较小，最多只能安装两张GPU卡，再加一个SSD就满了。
支持vGPU的显卡列表：
https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html
桌面显卡启用vGPU:
https://gitlab.com/polloloco/vgpu-proxmox

host驱动安装

开启IOMMU
编辑/etc/default/grub，在quiet后面添加intel_iommu=on iommu=pt

安装驱动

apt install pve-headers-$(uname -r)
apt install build-essential dkms pve-headers mdevctl
echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all
chmod +x NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run
./NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run --dkms
reboot

用nvidia-smq命令可以查看显卡的工作状态
用mdevctl types命令可以查看支持的vGPU实例类型和数量，所有vGPU实例需要用完全相同的规格，显存也是等分的，不能超额分配。如果要支持8K显示至少要2G显存，如果4K显示1G就够了。

虚拟机配置

vGPU实例类型参考这里：
https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#virtual-gpu-types-grid
– 添加虚拟机
先添加虚拟机实例，安装好virtio-gt和virtio-gm程序，虚拟机配置中开启qemu-guest-agnet，bios不要选uefi(装驱动时需要重新签名)

安装远程桌面
apt install xrdp

禁用或删除nvidia开源驱动

apt remove nvidia-*
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf

不然的话在添加vGPU后有可能黑屏

添加PCI设备
硬件->添加->PCI设备

这里面列出的设备可能比较多，要仔细找

guest驱动安装

以Debian为例
– 开启xorg（关闭wayland）
编辑/etc/gdm3/daemon.conf，添加
WaylandEnable=false

更新系统

sudo apt update
sudo apt upgrade
reboot

安装依懒

sudo apt install dkms build-essential dkms jq uuid-runtime -y
sudo apt install -y linux-headers-$(uname -r)
sudo apt install pkg-config libglvnd-dev -y

安装驱动

init 3
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all
chmod +x NVIDIA-Linux-x86_64-535.104.05-grid.run
./NVIDIA-Linux-x86_64-535.104.05-grid.run --dkms
reboot

授权

curl --insecure -L -X GET https://dls.hetao.me/-/client-token -o /etc/nvidia/ClientConfigToken/client_configuration_token_$(date '+%d-%m-%Y-%H-%M-%S').tok
service nvidia-gridd restart
nvidia-smi -q | grep "License" #要等大约1分种后才能看到结果，每次授权的有效期是3个月

远程
apt install xrdp
在我使用的时候发现rdp不能在本地户用登录的状态下使用，反之使用rdp的时候本地也无法登录(黑屏)
Debian本身带的也有桌面共享，是基于rdp协议的，但是我没有连成功，最后还是安装的xrdp。
驱动下载
现在可用的下载链接：
https://alist.homelabproject.cc/foxipan/vGPU
https://yun.yangwenqing.com/ESXI_PVE/vGPU/NVIDIA
https://pan.hetao.me/s/RHMxwE95QwmGmAG
已经不能下载：
https://foxi.buduanwang.vip/pan/vGPU/
https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/
https://github.com/justin-himself/NVIDIA-VGPU-Driver-Archive
https://archive.biggerthanshit.com/NVIDIA/

参考：
https://gitlab.com/polloloco/vgpu-proxmox/-/tree/master

2023年9月21日

标签： vGPU