Original blog: [Doi Technical Team] (http://blog.doiduoyi.com/)
Link address: https://blog.doiduoyi.com/authors/1584446358138
Original intention: record the learning experience of an excellent Doi technical team
table of Contents
Preface
Install graphics driver
Disable nouveau driver
Download driver
Uninstall the old driver
Install new driver
Uninstall CUDA
Install CUDA
Download and install CUDA
Test whether the installation was successful
Download and install CUDNN
Test installation result
Reference
Recently, I am studying the installation and use of PaddlePaddle in various graphics card driver versions, so I also learned how to install and uninstall CUDA and CUDNN in Ubuntu, and record the learning process by the way. While for everyone to learn, they are also strengthening their memory. This article takes uninstalling CUDA 8.0 and CUDNN 7.05 as an example, and installing CUDA 10.0 and CUDNN 7.4.2 as an example.
sudo vim /etc/modprobe.d/blacklist.conf
Add at the end of the text:
blacklist nouveau
options nouveau modeset=0
Then execute:
sudo update-initramfs -u
After restarting, execute the following command. If there is no screen output, it means that nouveau is successfully disabled:
lsmod | grep nouveau
Official website download address: https://www.nvidia.cn/Download/index.aspx?lang=cn, download the corresponding version of the graphics card driver according to your own graphics card, for example, the author’s graphics card is RTX2070:
After the download is complete, you will get an installation package. The file name of different versions may be different:
NVIDIA-Linux-x86_64-410.93.run
The following operations all need to operate in the command interface, execute the following shortcut keys to enter the command interface, and log in:
Ctrl-Alt+F1
Execute the following command to disable the X-Window service, otherwise the graphics driver cannot be installed:
sudo service lightdm stop
Execute the following three commands to uninstall the original graphics driver:
sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-410.93.run
sudo ./NVIDIA-Linux-x86_64-410.93.run --uninstall
You can install the new driver directly by executing the driver file, and the default is all the time:
sudo ./NVIDIA-Linux-x86_64-410.93.run
Execute the following command to start the X-Window service
sudo service lightdm start
Finally, execute the restart command to restart the system:
reboot
**Note: ** If there are repeated logins after the system restarts, the wrong version of the graphics driver is installed in most cases. You need to download the graphics card version installed on your own machine.
Why did I uninstall CUDA in the first place? This is because I changed the graphics card RTX2070. The original installation of CUDA 8.0 and CUDNN 7.0.5 cannot be used normally. I need to install CUDA 10.0 and CUDNN 7.4.2, so I must first Uninstall the original CUDA. Note that the following commands are all operated under the root user.
Uninstalling CUDA is very simple. One command is enough. The main execution is the uninstall script that comes with CUDA. Readers should find the uninstall script according to their own cuda version:
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
After uninstalling, there are still some remaining folders. CUDA 8.0 was installed before. Can be deleted together:
sudo rm -rf /usr/local/cuda-8.0/
In this way, even if CUDA is uninstalled.
Installed CUDA and CUDNN versions:
The following installation steps are all operated under the root user.
We can go to the official website: CUDA10 download page,
Download the CUDA that matches your own system version. The page is as follows:
After the download is complete, grant execution permissions to the file:
chmod +x cuda_10.0.130_410.48_linux.run
Execute the installation package and start the installation:
. /cuda_10.0.130_410.48_linux.run
After starting the installation, you need to read the instructions, you can use Ctrl + C
to read directly to finish, or use Spacebar
to read slowly. Then configure, let me explain here:
(Do you agree to the terms, you must agree to continue the installation)
accept/decline/quit: accept
(Do not install the driver here, because the latest driver is already installed, otherwise an old version of the graphics card driver may be installed, resulting in repeated logins)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?(y)es/(n)o/(q)uit: n
Install the CUDA 10.0 Toolkit?(Whether to install CUDA 10, it must be installed here)(y)es/(n)o/(q)uit: y
Enter Toolkit Location (installation path, use the default, just press Enter)
[ default is /usr/local/cuda-10.0]:
Do you want to install a symbolic link at /usr/local/cuda?(Agree to create a soft link)(y)es/(n)o/(q)uit: y
Install the CUDA 10.0 Samples?(There is no need to install and test, it has itself)(y)es/(n)o/(q)uit: n
Installing the CUDA Toolkit in/usr/local/cuda-10.0...(start installation)
After the installation is complete, you can configure their environment variables, add the following configuration information at the end of vim ~/.bashrc
:
export CUDA_HOME=/usr/local/cuda-10.0export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
Finally use the command source ~/.bashrc
to make it effective.
You can use the command nvcc -V
to view the installed version information:
test@test:~$ nvcc -V
nvcc:NVIDIA(R) Cuda compiler driver
Copyright(c)2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Execute the following commands:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
. /deviceQuery
Output under normal circumstances:
. /deviceQuery Starting...
CUDA Device Query(Runtime API)version(CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0:"GeForce RTX 2070"
CUDA Driver Version / Runtime Version 10.0/10.0
CUDA Capability Major/Minor version number:7.5
Total amount of global memory:7950MBytes(8335982592 bytes)(36) Multiprocessors,(64) CUDA Cores/MP:2304 CUDA Cores
GPU Max Clock rate:1620MHz(1.62 GHz)
Memory Clock rate:7001 Mhz
Memory Bus Width:256-bit
L2 Cache Size:4194304 bytes
Maximum Texture Dimension Size(x,y,z) 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
Maximum Layered 1D Texture Size,(num) layers 1D=(32768),2048 layers
Maximum Layered 2D Texture Size,(num) layers 2D=(32768,32768),2048 layers
Total amount of constant memory:65536 bytes
Total amount of shared memory per block:49152 bytes
Total number of registers available per block:65536
Warp size:32
Maximum number of threads per multiprocessor:1024
Maximum number of threads per block:1024
Max dimension size of a thread block(x,y,z):(1024,1024,64)
Max dimension size of a grid size(x,y,z):(2147483647,65535,65535)
Maximum memory pitch:2147483647 bytes
Texture alignment:512 bytes
Concurrent copy and kernel execution: Yes with3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing(UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID:0/1/0
Compute Mode:<Default(multiple host threads can use ::cudaSetDevice()with device simultaneously)>
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =10.0, CUDA Runtime Version =10.0, NumDevs =1
Result = PASS
Enter the official download website of CUDNN: https://developer.nvidia.com/rdp/cudnn-download, and click Download to start selecting the download version. Of course, you still have to log in before downloading. The selection version interface is as follows, we select cuDNN Library for Linux
:
After downloading, it is a compressed package, as follows:
cudnn-10.0-linux-x64-v7.4.2.24.tgz
Then unzip it, the command is as follows:
tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
After decompression, you can get the following files:
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a
Use the following two commands to copy these files to the CUDA directory:
cp cuda/lib64/* /usr/local/cuda-10.0/lib64/
cp cuda/include/* /usr/local/cuda-10.0/include/
After the copy is complete, you can use the following command to view the version information of CUDNN:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
So far, the installation of CUDA 10 and CUDNN 7.4.2 has been completed. You can install the corresponding GPU version of Pytorch to test whether it can be used normally. The installation is as follows:
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.0-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision
Then use the following procedure to test the installation:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms
classNet(nn.Module):
def __init__(self):super(Net, self).__init__()
self.conv1 = nn.Conv2d(1,10, kernel_size=5)
self.conv2 = nn.Conv2d(10,20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320,50)
self.fc2 = nn.Linear(50,10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x),2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)),2))
x = x.view(-1,320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)return F.log_softmax(x, dim=1)
def train(model, device, train_loader, optimizer, epoch):
model.train()for batch_idx,(data, target)inenumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output =model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()if batch_idx %10==0:print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx *len(data),len(train_loader.dataset),100.* batch_idx /len(train_loader), loss.item()))
def main():
cudnn.benchmark = True
torch.manual_seed(1)
device = torch.device("cuda")
kwargs ={'num_workers':1,'pin_memory': True}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,),(0.3081,))])),
batch_size=64, shuffle=True,**kwargs)
model =Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)for epoch inrange(1,11):train(model, device, train_loader, optimizer, epoch)if __name__ =='__main__':main()
If the following information is output normally, it proves that it has been installed:
Train Epoch:1[0/60000(0%)] Loss:2.365850
Train Epoch:1[640/60000(1%)] Loss:2.305295
Train Epoch:1[1280/60000(2%)] Loss:2.301407
Train Epoch:1[1920/60000(3%)] Loss:2.316538
Train Epoch:1[2560/60000(4%)] Loss:2.255809
Train Epoch:1[3200/60000(5%)] Loss:2.224511
Train Epoch:1[3840/60000(6%)] Loss:2.216569
Train Epoch:1[4480/60000(7%)] Loss:2.181396
Recommended Posts