参考本帖:卸载当前版本Nvidia及CUDA驱动
nvidia-uninstall
sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"
sudo apt-get autoremove --purge -V
reboot
Google Cloud Compute官方安装 GPU 驱动程序说明:
https://cloud.google.com/compute/docs/gpus/install-drivers-gpu?hl=zh-cn#minimum-driver
机器系列 | NVIDIA GPU 型号 | Linux 推荐的 CUDA 工具包 | Windows 推荐的 CUDA 工具包 |
---|
A3 | H100 | | 不适用 |
G2 | L4 | | |
A2 | A100 |
N1 | | | |
系统版本:
6.8.0-1016-gcp #18~22.04.1-Ubuntu
下载对应版本安装后的nvidia、cuda及nvcc版本:
Driver Version: 550.90.07
CUDA Driver Version: 12.4
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
查看加载驱动版本也正常(550.90.07)
ls -al /usr/lib/x86_64-linux-gnu/|grep cuda
lrwxrwxrwx 1 root root 12 Jun 3 18:20 libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root 20 Jun 3 18:20 libcuda.so.1 -> libcuda.so.550.90.07
-rw-r--r-- 1 root root 28581024 May 31 17:51 libcuda.so.550.90.07
lrwxrwxrwx 1 root root 28 Jun 3 18:20 libcudadebugger.so.1 -> libcudadebugger.so.550.90.07
-rw-r--r-- 1 root root 10524136 May 31 16:56 libcudadebugger.so.550.90.07
lrwxrwxrwx 1 root root 18 Nov 4 2021 libicudata.so.70 -> libicudata.so.70.1
-rw-r--r-- 1 root root 29476472 Nov 4 2021 libicudata.so.70.1
安装对应cuda版本后可能还需要安装以下依赖包:
linux-objects-nvidia-550-6.8.0-1016-gcp
linux-signatures-nvidia-6.8.0-1016-gcp
nvidia-firmware-550-server-550.90.07
nvidia-driver-550-server
nvidia-modprobe
使用docker容器可能还需要安装以下依赖包:
nvidia-docker2
当然最后下面的步骤一定要执行:
nvidia-smi -pm 1
固定nvidia内核驱动版本(可选)
dpkg --get-selections |grep -E "linux-image|linux-headers|linux-modules"|grep -i hold
linux-headers-6.8.0-1016-gcp hold
linux-image-6.8.0-1016-gcp hold
linux-modules-6.8.0-1016-gcp hold