Ubuntu24.04 NVIDIA驱动安装 nvidia-smi报错及修复
Ubuntu24.04 NVIDIA驱动安装 nvidia-smi报错及修复Ubuntu24.04 安装 NVIDIA 595 驱动显示已最新但 nvidia-smi 无法通信报错完美解决一、环境说明显卡NVIDIA GeForce RTX 4080系统Ubuntu 24.04 LTS二、错误全过程复现1. 手动安装推荐驱动先查看系统推荐显卡驱动ubuntu-drivers devices输出vendor : NVIDIA Corporation model : AD103 [GeForce RTX 4080] driver : nvidia-driver-595-open - distro non-free driver : nvidia-driver-580-open - distro non-free driver : nvidia-driver-535 - distro non-free driver : nvidia-driver-595-server - distro non-free driver : nvidia-driver-595 - distro non-free recommended driver : nvidia-driver-595-server-open - distro non-free driver : nvidia-driver-580 - distro non-free driver : nvidia-driver-535-server-open - distro non-free driver : nvidia-driver-535-open - distro non-free driver : nvidia-driver-535-server - distro non-free driver : nvidia-driver-580-server - distro non-free driver : nvidia-driver-580-server-open - distro non-free driver : xserver-xorg-video-nouveau - distro free builtin看到nvidia-driver-595 为系统推荐版本于是执行手动安装sudoaptupdatesudoaptinstallnvidia-driver-595安装日志正在读取软件包列表... 完成 正在分析软件包的依赖关系树... 完成 正在读取状态信息... 完成 nvidia-driver-595 已经是最新版 (595.58.03-0ubuntu0.24.04.1)。 下列软件包是自动安装的并且现在不需要了 libboost-iostreams1.83.0 libfcitx5-qt-data ... 使用sudo apt autoremove来卸载它(它们)。 升级了 0 个软件包新安装了 0 个软件包要卸载 0 个软件包有 347 个软件包未被升级。表面看驱动已安装且是最新版以为没问题。2. 执行 nvidia-smi 直接报错nvidia-smi报错信息NVIDIA-SMI has failed because it couldnt communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.3. 查看显卡硬件是否识别lspci|grep-invidia输出01:00.0 VGA compatible controller: NVIDIA Corporation AD103 [GeForce RTX 4080] (rev a1) 01:00.1 Audio device: NVIDIA Corporation Device 22bb (rev a1)硬件正常识别不是显卡硬件问题。4. 查看内核驱动模块无输出lsmod|grepnvidia无任何输出说明 NVIDIA 内核模块根本没加载。5. 手动加载模块报错sudomodprobe nvidia报错modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.17.0-14-generic核心问题驱动没给当前内核编译对应模块。三、问题根因直接apt install nvidia-driver-595虽然装了驱动包但缺少对应内核头文件DKMS 无法自动编译内核模块当前内核6.17.0-14-generic没有匹配的 nvidia 内核模块系统找不到驱动旧驱动残留、nouveau 开源驱动冲突也会导致驱动无法正常加载。四、完整修复步骤逐条复制执行步骤1安装当前内核匹配的内核头文件关键sudoaptinstalllinux-headers-$(uname-r)步骤2彻底卸载所有NVIDIA旧驱动及依赖sudoaptpurge nvidia-* libnvidia-*sudoaptautoremovesudoaptclean步骤3安装DKMS内核模块编译工具sudoaptinstalldkms步骤4系统自动适配显卡内核安装最优驱动sudoubuntu-drivers autoinstall步骤5重启生效sudoreboot五、重启后验证成功重启后执行nvidia-smi正常输出Wed May 6 11:19:13 2026 ----------------------------------------------------------------------------------------- | NVIDIA-SMI 595.58.03 Driver Version: 595.58.03 CUDA Version: 13.2 | --------------------------------------------------------------------------------------- | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | || | 0 NVIDIA GeForce RTX 4080 Off | 00000000:01:00.0 On | N/A | | 33% 38C P0 27W / 320W | 78MiB / 16376MiB | 0% Default | | | | N/A | --------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------- | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | || | 0 N/A N/A 1534 G /usr/lib/xorg/Xorg 39MiB | | 0 N/A N/A 1727 G /usr/bin/gnome-shell 11MiB | -----------------------------------------------------------------------------------------驱动、显卡、CUDA 全部正常。六、避坑总结不要直接apt install nvidia-driver-xxx就完事极易出现驱动已装但内核模块缺失出现Module nvidia not found就是缺少内核头文件、未编译驱动模块Ubuntu 装 N 卡驱动最佳方式用ubuntu-drivers autoinstall自动适配内核和显卡联想工作站若仍异常进 BIOS 关闭Secure Boot 安全启动即可一定要先清旧驱动残留再重装避免版本冲突。
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2592163.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!