Original blog: [Doi Technical Team] (http://blog.doiduoyi.com/)
Link address: https://blog.doiduoyi.com/authors/1584446358138
Original intention: record the learning experience of an excellent Doi technical team

table of Contents#

Article Directory###

table of Contents
Preface
Install graphics driver
Disable nouveau driver
Download driver
Uninstall the old driver
Install new driver
Uninstall CUDA
Install CUDA
Download and install CUDA
Test whether the installation was successful
Download and install CUDNN
Test installation result
Reference

Preface#

Recently, I am studying the installation and use of PaddlePaddle in various graphics card driver versions, so I also learned how to install and uninstall CUDA and CUDNN in Ubuntu, and record the learning process by the way. While for everyone to learn, they are also strengthening their memory. This article takes uninstalling CUDA 8.0 and CUDNN 7.05 as an example, and installing CUDA 10.0 and CUDNN 7.4.2 as an example.

Install graphics driver#

Disable nouveau driver##

sudo vim /etc/modprobe.d/blacklist.conf

Add at the end of the text:

blacklist nouveau
options nouveau modeset=0

Then execute:

sudo update-initramfs -u

After restarting, execute the following command. If there is no screen output, it means that nouveau is successfully disabled:

lsmod | grep nouveau

Download driver##

Official website download address: https://www.nvidia.cn/Download/index.aspx?lang=cn, download the corresponding version of the graphics card driver according to your own graphics card, for example, the author’s graphics card is RTX2070:

After the download is complete, you will get an installation package. The file name of different versions may be different:

NVIDIA-Linux-x86_64-410.93.run

Uninstall the old driver##

The following operations all need to operate in the command interface, execute the following shortcut keys to enter the command interface, and log in:

Ctrl-Alt+F1

Execute the following command to disable the X-Window service, otherwise the graphics driver cannot be installed:

sudo service lightdm stop

Execute the following three commands to uninstall the original graphics driver:

sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-410.93.run
sudo ./NVIDIA-Linux-x86_64-410.93.run --uninstall

Install a new driver##

You can install the new driver directly by executing the driver file, and the default is all the time:

sudo ./NVIDIA-Linux-x86_64-410.93.run

Execute the following command to start the X-Window service

sudo service lightdm start

Finally, execute the restart command to restart the system:

reboot

**Note: ** If there are repeated logins after the system restarts, the wrong version of the graphics driver is installed in most cases. You need to download the graphics card version installed on your own machine.

Uninstall CUDA

Why did I uninstall CUDA in the first place? This is because I changed the graphics card RTX2070. The original installation of CUDA 8.0 and CUDNN 7.0.5 cannot be used normally. I need to install CUDA 10.0 and CUDNN 7.4.2, so I must first Uninstall the original CUDA. Note that the following commands are all operated under the root user.

Uninstalling CUDA is very simple. One command is enough. The main execution is the uninstall script that comes with CUDA. Readers should find the uninstall script according to their own cuda version:

sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl

After uninstalling, there are still some remaining folders. CUDA 8.0 was installed before. Can be deleted together:

sudo rm -rf /usr/local/cuda-8.0/

In this way, even if CUDA is uninstalled.

Install CUDA#

Installed CUDA and CUDNN versions:

CUDA 10.0
CUDNN 7.4.2

The following installation steps are all operated under the root user.

Download and install CUDA

We can go to the official website: CUDA10 download page,
Download the CUDA that matches your own system version. The page is as follows:

After the download is complete, grant execution permissions to the file:

chmod +x cuda_10.0.130_410.48_linux.run

Execute the installation package and start the installation:

. /cuda_10.0.130_410.48_linux.run

After starting the installation, you need to read the instructions, you can use Ctrl + C to read directly to finish, or use Spacebar to read slowly. Then configure, let me explain here:

(Do you agree to the terms, you must agree to continue the installation)
accept/decline/quit: accept

(Do not install the driver here, because the latest driver is already installed, otherwise an old version of the graphics card driver may be installed, resulting in repeated logins)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?(y)es/(n)o/(q)uit: n

Install the CUDA 10.0 Toolkit?(Whether to install CUDA 10, it must be installed here)(y)es/(n)o/(q)uit: y

Enter Toolkit Location (installation path, use the default, just press Enter)
 [ default is /usr/local/cuda-10.0]:  

Do you want to install a symbolic link at /usr/local/cuda?(Agree to create a soft link)(y)es/(n)o/(q)uit: y

Install the CUDA 10.0 Samples?(There is no need to install and test, it has itself)(y)es/(n)o/(q)uit: n

Installing the CUDA Toolkit in/usr/local/cuda-10.0...(start installation)

After the installation is complete, you can configure their environment variables, add the following configuration information at the end of vim ~/.bashrc:

export CUDA_HOME=/usr/local/cuda-10.0export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}

Finally use the command source ~/.bashrc to make it effective.

You can use the command nvcc -V to view the installed version information:

test@test:~$ nvcc -V
nvcc:NVIDIA(R) Cuda compiler driver
Copyright(c)2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Test whether the installation is successful##

Execute the following commands:

cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
. /deviceQuery

Output under normal circumstances:

. /deviceQuery Starting...

 CUDA Device Query(Runtime API)version(CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0:"GeForce RTX 2070"
 CUDA Driver Version / Runtime Version          10.0/10.0
 CUDA Capability Major/Minor version number:7.5
 Total amount of global memory:7950MBytes(8335982592 bytes)(36) Multiprocessors,(64) CUDA Cores/MP:2304 CUDA Cores
 GPU Max Clock rate:1620MHz(1.62 GHz)
 Memory Clock rate:7001 Mhz
 Memory Bus Width:256-bit
 L2 Cache Size:4194304 bytes
 Maximum Texture Dimension Size(x,y,z)         1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
 Maximum Layered 1D Texture Size,(num) layers  1D=(32768),2048 layers
 Maximum Layered 2D Texture Size,(num) layers  2D=(32768,32768),2048 layers
 Total amount of constant memory:65536 bytes
 Total amount of shared memory per block:49152 bytes
 Total number of registers available per block:65536
 Warp size:32
 Maximum number of threads per multiprocessor:1024
 Maximum number of threads per block:1024
 Max dimension size of a thread block(x,y,z):(1024,1024,64)
 Max dimension size of a grid size(x,y,z):(2147483647,65535,65535)
 Maximum memory pitch:2147483647 bytes
 Texture alignment:512 bytes
 Concurrent copy and kernel execution:          Yes with3 copy engine(s)
 Run time limit on kernels:                     Yes
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support:                        Disabled
 Device supports Unified Addressing(UVA):      Yes
 Device supports Compute Preemption:            Yes
 Supports Cooperative Kernel Launch:            Yes
 Supports MultiDevice Co-op Kernel Launch:      Yes
 Device PCI Domain ID / Bus ID / location ID:0/1/0
 Compute Mode:<Default(multiple host threads can use ::cudaSetDevice()with device simultaneously)>

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =10.0, CUDA Runtime Version =10.0, NumDevs =1
Result = PASS

Download and install CUDNN

Enter the official download website of CUDNN: https://developer.nvidia.com/rdp/cudnn-download, and click Download to start selecting the download version. Of course, you still have to log in before downloading. The selection version interface is as follows, we select cuDNN Library for Linux:

After downloading, it is a compressed package, as follows:

cudnn-10.0-linux-x64-v7.4.2.24.tgz

Then unzip it, the command is as follows:

tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz

After decompression, you can get the following files:

cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a

Use the following two commands to copy these files to the CUDA directory:

cp cuda/lib64/* /usr/local/cuda-10.0/lib64/
cp cuda/include/* /usr/local/cuda-10.0/include/

After the copy is complete, you can use the following command to view the version information of CUDNN:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Test installation result#

So far, the installation of CUDA 10 and CUDNN 7.4.2 has been completed. You can install the corresponding GPU version of Pytorch to test whether it can be used normally. The installation is as follows:

pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.0-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision

Then use the following procedure to test the installation:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms

classNet(nn.Module):
 def __init__(self):super(Net, self).__init__()
  self.conv1 = nn.Conv2d(1,10, kernel_size=5)
  self.conv2 = nn.Conv2d(10,20, kernel_size=5)
  self.conv2_drop = nn.Dropout2d()
  self.fc1 = nn.Linear(320,50)
  self.fc2 = nn.Linear(50,10)

 def forward(self, x):
  x = F.relu(F.max_pool2d(self.conv1(x),2))
  x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)),2))
  x = x.view(-1,320)
  x = F.relu(self.fc1(x))
  x = F.dropout(x, training=self.training)
  x = self.fc2(x)return F.log_softmax(x, dim=1)

def train(model, device, train_loader, optimizer, epoch):
 model.train()for batch_idx,(data, target)inenumerate(train_loader):
  data, target = data.to(device), target.to(device)
  optimizer.zero_grad()
  output =model(data)
  loss = F.nll_loss(output, target)
  loss.backward()
  optimizer.step()if batch_idx %10==0:print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
    epoch, batch_idx *len(data),len(train_loader.dataset),100.* batch_idx /len(train_loader), loss.item()))

def main():
 cudnn.benchmark = True
 torch.manual_seed(1)
 device = torch.device("cuda")
 kwargs ={'num_workers':1,'pin_memory': True}
 train_loader = torch.utils.data.DataLoader(
  datasets.MNIST('../data', train=True, download=True,
      transform=transforms.Compose([
       transforms.ToTensor(),
       transforms.Normalize((0.1307,),(0.3081,))])),
  batch_size=64, shuffle=True,**kwargs)

 model =Net().to(device)
 optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)for epoch inrange(1,11):train(model, device, train_loader, optimizer, epoch)if __name__ =='__main__':main()

If the following information is output normally, it proves that it has been installed:

Train Epoch:1[0/60000(0%)]	Loss:2.365850
Train Epoch:1[640/60000(1%)]	Loss:2.305295
Train Epoch:1[1280/60000(2%)]	Loss:2.301407
Train Epoch:1[1920/60000(3%)]	Loss:2.316538
Train Epoch:1[2560/60000(4%)]	Loss:2.255809
Train Epoch:1[3200/60000(5%)]	Loss:2.224511
Train Epoch:1[3840/60000(6%)]	Loss:2.216569
Train Epoch:1[4480/60000(7%)]	Loss:2.181396

References#

https://developer.nvidia.com
https://www.cnblogs.com/luofeel/p/8654964.html

Install and uninstall CUDA and CUDNN on Ubuntu