PVE安装Tesla P4并实现GPU虚拟化

硬件介绍

P4的参数相当于GTX1080，是一个半高单槽的卡，另外P100(HBM2)和P40(GDDR5X)是双槽全高的卡，具体比较可以看这里：

https://developer.aliyun.com/article/753454
https://cloud-atlas.readthedocs.io/zh_CN/latest/machine_learning/hardware/nvidia_gpu/tesla_p10.html

P4适用于1U的服务器，P40和P100适用于2U的机器，价格比P4要贵一倍多，在HP 360 Gen9服务器上P4可以直接安装。

关于HP DL360 Gen9服务器的介绍，这个是别人写的，我贴过来：
https://cloud-atlas.readthedocs.io/zh_CN/latest/linux/server/hardware/hpe/hpe_dl360_gen9.html
360GGen9的扩展能力比较小，最多只能安装两张GPU卡，再加一个SSD就满了。
支持vGPU的显卡列表：
https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html
桌面显卡启用vGPU:
https://gitlab.com/polloloco/vgpu-proxmox

host驱动安装

开启IOMMU
编辑/etc/default/grub，在quiet后面添加intel_iommu=on iommu=pt

安装驱动

apt install pve-headers-$(uname -r)
apt install build-essential dkms pve-headers mdevctl
echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all
chmod +x NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run
./NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run --dkms
reboot

用nvidia-smq命令可以查看显卡的工作状态
用mdevctl types命令可以查看支持的vGPU实例类型和数量，所有vGPU实例需要用完全相同的规格，显存也是等分的，不能超额分配。如果要支持8K显示至少要2G显存，如果4K显示1G就够了。

虚拟机配置

vGPU实例类型参考这里：
https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#virtual-gpu-types-grid
– 添加虚拟机
先添加虚拟机实例，安装好virtio-gt和virtio-gm程序，虚拟机配置中开启qemu-guest-agnet，bios不要选uefi(装驱动时需要重新签名)

安装远程桌面
apt install xrdp

禁用或删除nvidia开源驱动

apt remove nvidia-*
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf

不然的话在添加vGPU后有可能黑屏

添加PCI设备
硬件->添加->PCI设备

这里面列出的设备可能比较多，要仔细找

guest驱动安装

以Debian为例
– 开启xorg（关闭wayland）
编辑/etc/gdm3/daemon.conf，添加
WaylandEnable=false

更新系统

sudo apt update
sudo apt upgrade
reboot

安装依懒

sudo apt install dkms build-essential dkms jq uuid-runtime -y
sudo apt install -y linux-headers-$(uname -r)
sudo apt install pkg-config libglvnd-dev -y

安装驱动

init 3
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all
chmod +x NVIDIA-Linux-x86_64-535.104.05-grid.run
./NVIDIA-Linux-x86_64-535.104.05-grid.run --dkms
reboot

授权

curl --insecure -L -X GET https://dls.hetao.me/-/client-token -o /etc/nvidia/ClientConfigToken/client_configuration_token_$(date '+%d-%m-%Y-%H-%M-%S').tok
service nvidia-gridd restart
nvidia-smi -q | grep "License" #要等大约1分种后才能看到结果，每次授权的有效期是3个月

远程
apt install xrdp
在我使用的时候发现rdp不能在本地户用登录的状态下使用，反之使用rdp的时候本地也无法登录(黑屏)
Debian本身带的也有桌面共享，是基于rdp协议的，但是我没有连成功，最后还是安装的xrdp。
驱动下载
现在可用的下载链接：
https://alist.homelabproject.cc/foxipan/vGPU
https://yun.yangwenqing.com/ESXI_PVE/vGPU/NVIDIA
https://pan.hetao.me/s/RHMxwE95QwmGmAG
已经不能下载：
https://foxi.buduanwang.vip/pan/vGPU/
https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/
https://github.com/justin-himself/NVIDIA-VGPU-Driver-Archive
https://archive.biggerthanshit.com/NVIDIA/

参考：
https://gitlab.com/polloloco/vgpu-proxmox/-/tree/master

Post Views: 317

PVE安装Tesla P4并实现GPU虚拟化

硬件介绍

host驱动安装

虚拟机配置

guest驱动安装

评论

发表回复取消回复

更多文章

devstack安装

openstack概念

nextcloud aio添加外部存储

nextcloud aio重置

PVE安装Tesla P4并实现GPU虚拟化

硬件介绍

host驱动安装

虚拟机配置

guest驱动安装

评论

发表回复 取消回复

更多文章

devstack安装

openstack概念

nextcloud aio添加外部存储

nextcloud aio重置

发表回复取消回复