CUDA在suspend之后不可用问题
问题描述
一觉醒来,电脑cuda不可用
/home/你的电脑/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
 
-  
尝试
export PATH=/usr/local/cuda-11/bin:$PATHexport LD_LIBRARY_PATH=/usr/local/cuda-11/lib64:$LD_LIBRARY_PATH- 但不是因为没有加载环境变量
 
 -  
根据查到参考[1]中,可能与电脑suspend相关,查到[2]
 -  
系统无法与GPU通信会提示这样的错误
- 原因1:因为驱动更新但未重启或者其他安装问题
 - 原因2:电脑进入过suspend状态,重启可再次生效
 
 
解决办法

 sudo rmmod nvidia_uvm
 sudo modprobe nvidia_uvm
 
- 快速验证是否可用
 
import torch
torch.cuda.is_available()
 
关于rmmod和modprobe介绍可以参考[3]的介绍
参考
[1] https://blog.csdn.net/weixin_48319333/article/details/128214617
 [2] https://discuss.pytorch.org/t/userwarning-cuda-initialization-cuda-unknown-error-this-may-be-due-to-an-incorrectly-set-up-environment-e-g-changing-env-variable-cuda-visible-devices-after-program-start-setting-the-available-devices-to-be-zero/129335/2
 [3] https://blog.csdn.net/Ternence_zq/article/details/131068125



















