ubuntu22.04 A100基于apt安装cuda-driver/cuda-toolkit

在 Ubuntu 22.04.3 上通过 apt 安装 NVIDIA A100 GPU 的开发支持库(如 CUDA、cuDNN、NCCL 等),推荐使用 NVIDIA 官方的 APT 仓库来安装。

env

  • ubuntu22.04.3 LTS
  • cuda-12.8 nvidia-A100

1. 添加 NVIDIA APT 仓库(推荐官方源)

1
2
3
4
5
6
7
8
9
10
sudo apt update
sudo apt install -y wget gnupg lsb-release

distribution=$(. /etc/os-release; echo $ID$VERSION_ID|sed 's/\.//g')

#ubuntu2204

wget https://developer.download.nvidia.com/compute/cuda/repos/${distribution}/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update

2. 安装驱动NVIDIA 官方 CUDA Toolkit

2.1.默认安装最新版本runtime + driver(兼容)

1
sudo apt install -y cuda  # 默认安装最新版本cuda

这将安装:

  • NVIDIA 驱动(通常是最新版本)
  • cuda-toolkit(包括 nvcc 编译器、头文件、运行时等)
  • 各种开发库(如 libcudart-devlibcublas-dev 等)

2.2.安装特定版本

1.查看支持cuda版本

root@gpu-develop-dev:~#  apt-cache search cuda | grep cuda-12  # 模糊匹配
cuda-12-0 - CUDA 12.0 meta-package
cuda-12-1 - CUDA 12.1 meta-package
libnvjpeg2k0-cuda-12 - Runtime libraries for libnvjpeg2k for CUDA 12
libnvjpeg2k0-dev-cuda-12 - Development headers and symlinks for libnvjpeg2k for CUDA 12
libnvjpeg2k0-static-cuda-12 - Static libraries for libnvjpeg2k for CUDA 12
nvjpeg2k-cuda-12 - NVIDIA nvJPEG 2000 for CUDA 12
cuquantum-cuda-12 - NVIDIA cuQuantum SDK for CUDA 12
libcuquantum0-cuda-12 - Runtime libraries for libcuquantum for CUDA 12
libcuquantum0-dev-cuda-12 - Development headers and symlinks for libcuquantum for CUDA 12
libcuquantum0-static-cuda-12 - Static libraries for libcuquantum for CUDA 12
cuda-12-2 - CUDA 12.2 meta-package
libnvtiff0-cuda-12 - Runtime libraries for libnvtiff for CUDA 12
libnvtiff0-dev-cuda-12 - Development headers and symlinks for libnvtiff for CUDA 12
libnvtiff0-static-cuda-12 - Static libraries for libnvtiff for CUDA 12
nvtiff-cuda-12 - NVIDIA nvTIFF for CUDA 12
cuda-12-3 - CUDA 12.3 meta-package
cudss-cuda-12 - NVIDIA cuDSS for CUDA 12
libcudss0-cuda-12 - Runtime libraries for libcudss for CUDA 12
libcudss0-dev-cuda-12 - Development headers and symlinks for libcudss for CUDA 12
libcudss0-static-cuda-12 - Static libraries for libcudss for CUDA 12
libcal0-cuda-12 - Runtime libraries for libcal for CUDA 12
libcal0-dev-cuda-12 - Development headers and symlinks for libcal for CUDA 12
libcal-cuda-12 - NVIDIA cal library for CUDA 12
libnvimgcodec-cuda-12 - Runtime libraries for nvImageCodec for CUDA
cusolvermp-cuda-12 - NVIDIA cuSOLVERMp for CUDA 12
libcusolvermp0-cuda-12 - Runtime libraries for libcusolvermp for CUDA 12
libcusolvermp0-dev-cuda-12 - Development headers and symlinks for libcusolvermp for CUDA 12
cudnn9-cuda-12-3 - NVIDIA cuDNN for CUDA 12.3
cudnn9-cuda-12 - NVIDIA cuDNN for CUDA 12
cuda-12-4 - CUDA 12.4 meta-package
cudnn9-cuda-12-4 - NVIDIA cuDNN for CUDA 12.4
cuda-12-5 - CUDA 12.5 meta-package
cudnn9-cuda-12-5 - NVIDIA cuDNN for CUDA 12.5
libnppplus0-cuda-12 - Runtime libraries for libnppplus for CUDA 12
libnppplus0-dev-cuda-12 - Development headers and symlinks for libnppplus for CUDA 12
libnppplus0-static-cuda-12 - Static libraries for libnppplus for CUDA 12
nppplus-cuda-12 - NVIDIA NPP Plus for CUDA 12
cuda-12-6 - CUDA 12.6 meta-package
cudnn9-cuda-12-6 - NVIDIA cuDNN for CUDA 12.6
libnvcomp4-cuda-12 - Runtime libraries for libnvcomp for CUDA 12
libnvcomp4-dev-cuda-12 - Development headers and symlinks for libnvcomp for CUDA 12
libnvcomp4-static-cuda-12 - Static libraries for libnvcomp for CUDA 12
nvcomp-cuda-12 - NVIDIA nvCOMP for CUDA 12
libnvshmem3-cuda-12 - Runtime libraries for libnvshmem for CUDA 12
libnvshmem3-dev-cuda-12 - Development headers and symlinks for libnvshmem for CUDA 12
libnvshmem3-static-cuda-12 - Static libraries for libnvshmem for CUDA 12
nvshmem-cuda-12 - NVIDIA nvshmem for CUDA 12
cudnn9-cuda-12-8 - NVIDIA cuDNN for CUDA 12.8
cuda-12-8 - CUDA 12.8 meta-package
cufftmp-cuda-12 - NVIDIA cuFFTMp CUDA 12 .
libcufftmp11-cuda-12 - Runtime libraries for libcufftmp for CUDA 12
libcufftmp11-dev-cuda-12 - Development headers and symlinks for libcufftmp for CUDA 12
cuda-12-9 - CUDA 12.9 meta-package
cudnn9-cuda-12-9 - NVIDIA cuDNN for CUDA 12.9
libcudnn9-cuda-12 - cuDNN runtime libraries for CUDA 12.9
cudnn9-jit-cuda-12 - NVIDIA cuDNN-jit for CUDA 12
cudnn9-jit-cuda-12-9 - NVIDIA cuDNN-jit for CUDA 12.9
libcudnn9-dev-cuda-12 - cuDNN development libraries for CUDA 12.9
libcudnn9-headers-cuda-12 - cuDNN header files for CUDA 12.9
libcudnn9-jit-cuda-12 - cuDNN-jit runtime libraries for CUDA 12.9
libcudnn9-jit-dev-cuda-12 - cuDNN-jit development libraries for CUDA 12.9
libcudnn9-static-cuda-12 - cuDNN static libraries for CUDA 12.9
cublasmp-cuda-12 - NVIDIA cuBLASMp library for multi-GPU, multi-node distributed basic dense linear algebra.
libcublasmp0-cuda-12 - NVIDIA cuBLASMp library for multi-GPU, multi-node distributed basic dense linear algebra.
libcublasmp0-dev-cuda-12 - NVIDIA cuBLASMp library for multi-GPU, multi-node distributed basic dense linear algebra.

root@gpu-develop-dev:~# apt-cache search cuda | egrep ^cuda-12  # 精准匹配
cuda-12-0 - CUDA 12.0 meta-package
cuda-12-1 - CUDA 12.1 meta-package
cuda-12-2 - CUDA 12.2 meta-package
cuda-12-3 - CUDA 12.3 meta-package
cuda-12-4 - CUDA 12.4 meta-package
cuda-12-5 - CUDA 12.5 meta-package
cuda-12-6 - CUDA 12.6 meta-package
cuda-12-8 - CUDA 12.8 meta-package
cuda-12-9 - CUDA 12.9 meta-package

2.安装特定版本cuda

apt install -y cuda-12-8

3.cuda路径

root@gpu-develop-dev:~# ll /usr/local/cuda
cuda/      cuda-12/   cuda-12.8/ 
root@gpu-develop-dev:~# ll /usr/local/cuda
lrwxrwxrwx 1 root root 22 Jul 25 12:13 /usr/local/cuda -> /etc/alternatives/cuda/
root@gpu-develop-dev:~# ll /usr/local/cuda/
total 128
drwxr-xr-x 3 root root  4096 Jul 25 12:13 bin/
drwxr-xr-x 5 root root  4096 Jul 25 12:12 compute-sanitizer/
drwxr-xr-x 3 root root  4096 Jul 25 12:12 doc/
-rw-r--r-- 1 root root   160 Feb 13 18:43 DOCS
-rw-r--r-- 1 root root 63021 Feb 13 18:43 EULA.txt
drwxr-xr-x 5 root root  4096 Jul 25 12:13 extras/
drwxr-xr-x 3 root root  4096 Jul 25 12:13 gds/
lrwxrwxrwx 1 root root    28 Feb 13 19:31 include -> targets/x86_64-linux/include/
lrwxrwxrwx 1 root root    24 Feb 13 19:29 lib64 -> targets/x86_64-linux/lib/
drwxr-xr-x 7 root root  4096 Jul 25 12:13 libnvvp/
drwxr-xr-x 2 root root  4096 Jul 25 12:13 nsightee_plugins/
drwxr-xr-x 3 root root  4096 Jul 25 12:13 nvml/
drwxr-xr-x 6 root root  4096 Jul 25 12:11 nvvm/
-rw-r--r-- 1 root root   524 Feb 13 18:43 README
drwxr-xr-x 3 root root  4096 Jul 25 12:12 share/
drwxr-xr-x 2 root root  4096 Jul 25 12:12 src/
drwxr-xr-x 3 root root  4096 Jul 25 12:10 targets/
drwxr-xr-x 2 root root  4096 Jul 25 12:13 tools/
-rw-r--r-- 1 root root  3306 Mar  4 13:02 version.

root@gpu-develop-dev:~# ll /usr/local/cuda/bin/
total 196496
-rwxr-xr-x 1 root root    88848 Feb 22 13:51 bin2c*
lrwxrwxrwx 1 root root        4 Feb 22 13:55 computeprof -> nvvp*
-rwxr-xr-x 1 root root      112 Feb 22 13:03 compute-sanitizer*
drwxr-xr-x 2 root root     4096 Jul 25 12:11 crt/
-rwxr-xr-x 1 root root  8718680 Feb 22 13:51 cudafe++*
-rwxr-xr-x 1 root root     1758 Feb 13 20:26 cuda-gdb*
-rwxr-xr-x 1 root root 14072248 Feb 13 20:26 cuda-gdb-minimal*
-rwxr-xr-x 1 root root 14898320 Feb 13 20:26 cuda-gdb-python3.10-tui*
-rwxr-xr-x 1 root root 14897952 Feb 13 20:26 cuda-gdb-python3.11-tui*
-rwxr-xr-x 1 root root 14906976 Feb 13 20:26 cuda-gdb-python3.12-tui*
-rwxr-xr-x 1 root root 14898456 Feb 13 20:26 cuda-gdb-python3.8-tui*
-rwxr-xr-x 1 root root 14898720 Feb 13 20:26 cuda-gdb-python3.9-tui*
-rwxr-xr-x 1 root root   765328 Feb 13 20:26 cuda-gdbserver*
-rwxr-xr-x 1 root root    75928 Feb 13 19:31 cu++filt*
-rwxr-xr-x 1 root root   568992 Feb 13 19:28 cuobjdump*
-rwxr-xr-x 1 root root  1249368 Feb 22 13:51 fatbinary*
-rwxr-xr-x 1 root root     3826 Mar  4 13:02 ncu*
-rwxr-xr-x 1 root root     3616 Mar  4 13:02 ncu-ui*
-rwxr-xr-x 1 root root     1580 Feb 13 20:16 nsight_ee_plugins_manage.sh*
-rwxr-xr-x 1 root root      197 Mar  4 13:02 nsight-sys*
-rwxr-xr-x 1 root root      743 Mar  4 13:02 nsys*
-rwxr-xr-x 1 root root      833 Mar  4 13:02 nsys-ui*
-rwxr-xr-x 1 root root 24828456 Feb 22 13:51 nvcc*
-rwxr-xr-x 1 root root    11032 Feb 22 13:51 __nvcc_device_query*
-rw-r--r-- 1 root root      425 Feb 22 13:51 nvcc.profile
-rwxr-xr-x 1 root root  5898888 Feb 13 19:26 nvdisasm*
-rwxr-xr-x 1 root root 32376952 Feb 22 13:51 nvlink*
-rwxr-xr-x 1 root root  5939552 Feb 14 04:37 nvprof*
-rwxr-xr-x 1 root root   117760 Feb 13 19:25 nvprune*
-rwxr-xr-x 1 root root      285 Feb 22 13:55 nvvp*
-rwxr-xr-x 1 root root 31912192 Feb 22 13:51 ptxas*

4.配置环境变量及验证

tee >>/etc/profile <<<-'EOF'
# cuda-runtime
export PATH=/usr/local/cuda-12.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
EOF

source/etc/profile

root@gpu-develop-dev:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

3.安装cuda驱动+utils

仅包含驱动,不包含Toolkit

nivdia驱动和cuda版本对照

af61c9ecd7dbdd60bc46aecdfcb3ac33.png

1.安装nvidia-drvier && cuda 12.8

1.1.安装驱动&&utils
apt install -y nvidia-driver-570
apt install -y nvidia-utils-570-server  # nvidia-smi

1.2.查看安装版本
root@gpu-develop-dev:~# dpkg -l | grep nvidia-driver
ii  nvidia-driver-570                      570.172.08-0ubuntu1                     amd64        NVIDIA driver metapackage

1.3.驱动持久化加载(默认按需加载)
systemctl enable --now   nvidia-persistenced

root@gpu-develop-dev:~# systemctl status nvidia-persistenced
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2025-07-25 12:50:24 CST; 3s ago
    Process: 274819 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=0/SUCCESS)
   Main PID: 274820 (nvidia-persiste)
      Tasks: 1 (limit: 76792)
     Memory: 484.0K
        CPU: 1.196s
     CGroup: /system.slice/nvidia-persistenced.service
             └─274820 /usr/bin/nvidia-persistenced --verbose

Jul 25 12:50:23 gpu-develop-dev systemd[1]: Starting NVIDIA Persistence Daemon...
Jul 25 12:50:23 gpu-develop-dev nvidia-persistenced[274820]: Verbose syslog connection opened
Jul 25 12:50:23 gpu-develop-dev nvidia-persistenced[274820]: Started (274820)
Jul 25 12:50:23 gpu-develop-dev nvidia-persistenced[274820]: device 0000:00:06.0 - registered
Jul 25 12:50:24 gpu-develop-dev nvidia-persistenced[274820]: device 0000:00:06.0 - persistence mode enabled.
Jul 25 12:50:24 gpu-develop-dev nvidia-persistenced[274820]: device 0000:00:06.0 - NUMA memory onlined.
Jul 25 12:50:24 gpu-develop-dev nvidia-persistenced[274820]: Local RPC services initialized
Jul 25 12:50:24 gpu-develop-dev systemd[1]: Started NVIDIA Persistence Daemon.
root@gpu-develop-dev:~#

1.4.查看驱动情况
root@gpu-develop-dev:~# nvidia-smi 
Fri Jul 25 14:06:07 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.172.08             Driver Version: 570.172.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  |   00000000:00:06.0 Off |                  Off |
| N/A   37C    P0             35W /  250W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

1.5.dkms驱动加载情况
root@gpu-develop-dev:~# dkms status
nvidia/570.172.08, 5.15.0-119-generic, x86_64: installed

root@gpu-develop-dev:~# lsmod | grep nvidia
nvidia_uvm           1744896  0
nvidia              90243072  1 nvidia_uvm
drm                   622592  7 drm_kms_helper,nvidia,cirrus,drm_ttm_helper,ttm,nouveau

2.查看可以安装版本

root@gpu-develop-dev:~# apt-cache search nvidia-driver |egrep ^nvidia-driver   ### 查看可以安装版本
nvidia-driver-390 - NVIDIA driver metapackage
nvidia-driver-418 - Transitional package for nvidia-driver-430
nvidia-driver-418-server - NVIDIA Server Driver metapackage
nvidia-driver-435 - Transitional package for nvidia-driver-455
nvidia-driver-440 - Transitional package for nvidia-driver-450
nvidia-driver-440-server - Transitional package for nvidia-driver-450-server
nvidia-driver-450 - Transitional package for nvidia-driver-460
nvidia-driver-450-server - NVIDIA Server Driver metapackage
nvidia-driver-455 - Transitional package for nvidia-driver-460
nvidia-driver-460 - Transitional package for nvidia-driver-470
nvidia-driver-460-server - Transitional package for nvidia-driver-470-server
nvidia-driver-465 - Transitional package for nvidia-driver-470
nvidia-driver-470 - NVIDIA driver metapackage
nvidia-driver-470-server - NVIDIA Server Driver metapackage
nvidia-driver-495 - Transitional package for nvidia-driver-510
nvidia-driver-510 - Transitional package for nvidia-driver-535
nvidia-driver-510-server - Transitional package for nvidia-driver-515-server
nvidia-driver-515-open - Transitional package for nvidia-driver-535
nvidia-driver-515-server - Transitional package for nvidia-driver-535-server
nvidia-driver-520-open - Transitional package for nvidia-driver-535
nvidia-driver-525-open - NVIDIA driver (open kernel) metapackage (transitional package)
nvidia-driver-525-server - NVIDIA Server Driver metapackage (transitional package)
nvidia-driver-535 - NVIDIA driver metapackage
nvidia-driver-535-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-535-server - NVIDIA Server Driver metapackage
nvidia-driver-535-server-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-545 - NVIDIA driver metapackage
nvidia-driver-545-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-550 - NVIDIA driver metapackage
nvidia-driver-550-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-550-server - NVIDIA Server Driver metapackage
nvidia-driver-550-server-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-565-server - NVIDIA Server Driver metapackage
nvidia-driver-565-server-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-570 - NVIDIA driver metapackage
nvidia-driver-570-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-570-server - NVIDIA Server Driver metapackage
nvidia-driver-570-server-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-575 - NVIDIA driver metapackage
nvidia-driver-575-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-575-server - NVIDIA Server Driver metapackage
nvidia-driver-575-server-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-515 - NVIDIA driver metapackage
nvidia-driver-520 - NVIDIA driver metapackage
nvidia-driver-525 - NVIDIA driver metapackage
nvidia-driver-430 - Transitional package for nvidia-driver-545
nvidia-driver-555 - NVIDIA driver metapackage
nvidia-driver-555-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-assistant - Detect and install the best NVIDIA driver packages for the system
nvidia-driver-530 - Transitional package for nvidia-driver-560
nvidia-driver-530-open - Transitional package for nvidia-driver-560-open
nvidia-driver-560 - NVIDIA driver metapackage
nvidia-driver-560-open - NVIDIA driver (open kernel) metapackage
nvidia-driver-565 - NVIDIA driver metapackage
nvidia-driver-565-open - NVIDIA driver (open kernel) metapackage

4.nvidia-container-toolkit支持容器化调度gpu

1.下载镜像 gpgkey

curl -fsSL https://mirrors.ustc.edu.cn/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

2.配置apt源

curl -s -L https://mirrors.ustc.edu.cn/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://nvidia.github.io#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://mirrors.ustc.edu.cn#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

3.更新及安装

apt-get update
apt-get install -y nvidia-container-toolkit

4.配置docker runtime支持gpu

1
2
3
4
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker


5.gpu验证

docker run -it --gpus all python:3.13 bash
nvidia-smi  # 正常输出则说明驱动及容器使用驱动正常


dev@gpu-develop-dev:~$ nvidia-container-cli --version
cli-version: 1.17.8
lib-version: 1.17.8
build date: 2025-05-30T13:47+00:00
build revision: 6eda4d76c8c5f8fc174e4abca83e513fb4dd63b0
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

5. 安装 cuDNN(可选,需注册账号)

  1. 登录 https://developer.nvidia.com/cudnn
  2. 下载适用于 Ubuntu 的 .deb 安装包
  3. 安装示例:
1
2
sudo dpkg -i libcudnn8*.deb
sudo apt install -f # 自动安装依赖

或使用官方仓库(有 CUDA 账号的情况)配置。

--

6. 验证安装

1
2
nvcc --version          # 验证 CUDA 编译器
nvidia-smi # 查看 GPU 驱动状态

7.常见可选开发库(APT 包名)

APT 包名
cuDNN libcudnn8, libcudnn8-dev
NCCL libnccl2, libnccl-dev
TensorRT libnvinfer8, libnvinfer-dev
Thrust 已包含在 cuda-toolkit
OpenCL nvidia-opencl-dev, ocl-icd-opencl-dev