Detailed tutorial for installing CUDA9.0 on Ubuntu 16.04

**Foreword: **

This article is written based on the experience of installing CUDA 9.0. CUDA9.0 currently supports two versions of Ubuntu16.04 and Ubuntu17.04, as shown in the following figure (the lowest installation method we choose the first one, namely the runfile method):

You can download the CUDA file first, but it is best not to rush to install it. You must first take a closer look at the official instruction manual given by NVIDIA, and then look for a few good blogs to get a general understanding of the CUDA installation process. It is necessary to have a general understanding of the problems that may occur during the installation process, and do not reinstall the system as a last resort.

Installation suggestions:

1 ) When downloading CUDA from the official website, be sure to find a copy of the official installation document and read it carefully, and follow its steps as much as possible. Don’t be lazy. At the same time, look for a few good blogs as a reference, and be confident before installation.

2 ) Before installation, you must check in detail your computer configuration (single graphics card or dual graphics card), whether the type of graphics card meets the installation requirements of CUDA, and whether the system meets the installation requirements.

3 ) Every time you perform an operation during the installation process, try to check whether the operation is successful.

Installation process:

Install and be familiar with ubuntu16.04 system

Before installing the software, it is best to have some basic understanding of ubuntu's command line, such as sudo, cd, ls, nona, cat, chmod, etc., so as to save a lot of unnecessary trouble during the installation process. (I recommend that you go to Baidu to search for Mo Fan Python. He has videos about ubuntu commands. Each episode is very refined and speaks very well)

Check whether your computer environment has the conditions to install CUDA

Verify that your computer has a GPU that can support CUDA

You can find the specific model of the graphics card in the configuration information of the computer. If you have a dual system, you can also find the detailed information of the graphics card in the Device Manager under Windows;
You can also enter the command in the ubuntu terminal: $ lspci | grep -i nvidia, it will display your NVIDIA GPU version information, but it is not very detailed.

My display is (GeForceGT630M):

01:00.0 3 D controller: NVIDIA Corporation GF117M [GeForce 610M/710M/810M/820M / GT 620M/625M/630M/720M] (rev a1)

Then go to CUDA's official website to check whether your GPU version is in the CUDA support list.

Verify that your Linux version supports CUDA (Ubuntu 16.04 is no problem)

input the command:

$ uname -m && cat /etc/*release

The results show that:

x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
……

Verify that the system has gcc installed

Type in the terminal: $ gcc –version

The results show that:

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
……

If it is not installed, use the following command to install:

sudo apt-get install build-essential

Verify that the system has installed the kernel header and package development

a. Check the running system kernel version:

Enter in the terminal: $ uname -r

The results show that:

4.10.0- 40- generic

b. Enter in the terminal: sudoapt−getinstalllinux−headers−(uname -r)

You can install the kernel header and package development corresponding to the kernel version

The results show that:

……

0 software packages have been upgraded, and 0 software packages have been newly installed. To uninstall 0 software packages, 4 software packages have not been upgraded.

It means there is already in the system, so there is no need to repeat the installation.

If the above verification checks meet the requirements, the following formal installation process can be carried out. If you do not meet the requirements, you can refer to the official cuda documentation, which contains detailed solutions for each problem.

Three, choose the installation method

CUDA provides two installation methods: package manager installation and runfile installation. The package manager installation method is relatively simple, but when I read other people’s blogs, I found that choosing this method may cause more problems during the installation process, and the probability of failure is greater. . In order to reduce unnecessary trouble, I choose runfile installation method.
Download the cuda installation package: download it from the cuda official website, select the corresponding version according to the system information, and choose the runfile file as the last item for runfile installation.

After downloading, check it with MD5. If the serial number is not consistent, you have to download again (because it was not saved at the time, I used someone else’s screenshot here, pay attention to the cuda version number here, the cuda_8.0 used by others)

Enter the command: $ md5sum cuda_9.0.176_linux.run

Four, runfile install cuda

Disable nouveau driver

Run in the terminal: $ lsmod | grep nouveau, if there is output, it means nouveau is being loaded. We need to manually disable nouveau.

How to disable Ubuntu nouveau:

a. Create the file blacklist-nouveau.conf in /etc/modprobe.d

Enter the command: $ sudo vi /etc/modprobe.d/blacklist-nouveau.conf (use the vi editor to edit and save the file)

Enter the following content in the file:

blacklist nouveau
options nouveau modeset=0

b. Implementation:

$ sudo update-initramfs –u

c. Re-execute:

$ lsmod | grep nouveau

If there is no content output, the disabling is successful. If there is still content output, please check the operation and repeat the above operation.

Note: vi is a commonly used editor under Linux terminal or console, the basic operation method is: vi /path/file name
For example: vi /etc/fstab means to display the contents of the /etc/fstab file. Use the Page Up and Page Down keys on the keyboard to turn pages up and down; press the Insert key, you can see the word "Insert" in the lower left corner of the window, indicating that the current state of insert editing, and the content entered from the keyboard will be inserted into the cursor Position; press the Insert key again, there will be "Replace" in the lower left corner, indicating that it is currently in the replace editing state, and the content input from the keyboard will replace the content at the cursor position. After editing the content, press the Esc key and enter ":wq", then press Enter to save and exit.
If you don't want to save and exit directly, press the Esc key, enter ":q!", and then press Enter. "Wq" means Write and Quit, that is, save and exit; "q!" means force exit without modification.

For the following operations, it is recommended to take photos on the mobile phone. It is recommended to rename the downloaded cuda_9.0.176_384.81_linux.run file to cuda.run and move it to the Home folder (for installation convenience)

Restart the computer. When entering the login interface, do not log in to the desktop (otherwise it may fail, if you accidentally enter, please restart the computer), directly press Ctrl+Alt+F1 to enter the text mode (command line interface) and log in to your account.
Enter $ sudo service lightdm stop to close the graphical interface
Switch to the path of the cuda installation file: $ cd Home/

Run $ sudo sh cuda_7.5.18_linux.run
Follow the prompts step by step

Note: a. Be sure to follow the prompts) to enter the corresponding characters, for example, some need to enter accept, some need to enter yes;

b. If you are prompted whether to install openGL, select no (if your computer is a dual display like mine, and the main display is a non-NVIDIA GPU, you need to select no, otherwise you can yes), other options are yes or default. (If your computer has dual graphics cards and you have selected yes in this step, then you are most likely to encounter a login interface loop problem after restarting the graphical interface after installing CUDA: after entering the password, it jumps back to the password input interface.

This is because your computer is dual-display, and the GPU used for display is not NVIDIA, then OpenGL Libraries should not be installed, otherwise the OpenGL Libraries of the GPU you are using (non-NVIDIA GPU) will be overwritten. Then the GUI cannot work. )

After the installation is successful, it will display installed, otherwise it will display failed.

Enter $ sudo service lightdm start to restart the graphical interface.

Press Alt + ctrl + F7 at the same time to return to the graphical login interface and enter the password to log in.
If you can log in successfully, it means that you will not encounter the problem of circular login, which basically means that the CUDA installation is successful.

If you encounter repeated logins, don’t worry about reinstalling the system. It is mentioned in the official tutorial. The reason is mentioned in the note in the previous step. You may not pay attention to choosing yes when installing openGL. Please uninstall cuda and reinstall it. Installed.
Uninstallation: Because we can't log in to the graphical user interface (GUI), but we can enter the text user interface (TUI)

In the login interface state, press Ctrl + Alt + f1 to enter TUI
carried out

$ sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
$ sudo /usr/bin/nvidia-uninstall

Then restart

$ sudo reboot

Reinstall .run Please pay attention when reinstalling. When prompted whether to install OpenGL, you should choose n for dual graphics cards.

Restart the computer and check Device Node Verification.

carried out

$ ls /dev/nvidia*

There may be two results, a and b, please check your seats.

a. If the result is displayed

/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm

Or display similar information, there should be three (including one similar to /dev/nvidia-nvm), then the installation is successful.

b. Most results may be like this

ls: cannot access/dev/nvidia*: No such file or directory

Or like this, only appears

/dev/nvidia0 /dev/nvidiactl

One or two of a, but there is no /dev/nvidia-num, that is, the file display is not complete.

Don't worry or worry about reinstalling the system (this was the case when I installed it), there are detailed solutions in the official guide, but my method is slightly different from the official.

First, add a startup script (there are roughly two ways to add a startup script, I use the most direct method, the other is to create a file first and then move it to the startup folder by mv, you can Baidu by yourself)
carried out

$ sudo vi /etc/rc.local

If you are opening this file for the first time, it should be empty (except for the #comment item line by line). The first line of this file is

#! /bin/sh -e

Remove -e (this step is very important, otherwise it will not load the content of this text)
Then copy the following contents except #!/bin/bash into it, (before exit 0) save and exit.

#! /bin/bash

/sbin/modprobe nvidia

if["$?"-eq 0]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`

N=`expr $N3D + $NVGA - 1`for i in`seq 0 $N`;do
mknod -m 666/dev/nvidia$i c 195 $i
done

mknod -m 666/dev/nvidiactl c 195255else
exit 1
fi

/sbin/modprobe nvidia-uvm

if["$?"-eq 0]; then
# Find out the major device number used by the nvidia-uvm driver
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

mknod -m 666/dev/nvidia-uvm c $D 0else
exit 1
fi

Next time you restart, you should be able to directly see the three nvidia files in the /dev directory
Input: $ ls /dev/nvidia*
The result shows: /dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm

success!

Set environment variables.

Enter $ sudo gedit /etc/profile in the terminal
At the end of the opened file, add the following two lines.

64 Bit system:

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\
       ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

32 Bit system:

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib\
       ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Save the file and restart. Because source /etc/profile is temporarily effective, restarting the computer is permanent.

Here is a little bit different from the official installation document, need explanation:
The official documentation says that you only need to run the above two export statements in the terminal, but if you don’t write them into the /etc/profile file, such environment variables will disappear after you exit the terminal and will not work. , So writing is permanent.

Restart the computer and check whether the above environment variables are set successfully.

a. Verify the driver version

Knock in

$ cat /proc/driver/nvidia/version

The results show that

NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.81 Sat Sep 2 02:43:11 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)

b. Verify CUDA Toolkit

Knock in

$ nvcc -V will output CUDA version information

If so:

The program ‘nvcc’ is currently not installed. You can install it by typing:
sudo apt-get install nvidia-cuda-toolkit

It may be that the environment configuration is not successful, please repeat the above step 7).

Five, try to compile the examples provided by cuda

1 ) Open the terminal and enter: $ cd /home/xxx/NVIDIA_CUDA-9.0_Samples where xxx is your own user name, enter the NVIDIA_CUDA-9.0_Samples directory with the command cd.

Then enter the terminal: $ make

The system will automatically enter the compilation process, the whole process takes about ten to twenty minutes, please be patient. If an error occurs, the system will immediately report an error and stop.

An error may be reported during the first run, and the error message prompted may be that there is no gcc in the system.

The solution is to re-install gcc through the command, and enter in the terminal: $ sudo apt-get install gcc After installing gcc, make it again.

If the compilation is successful, the Finished building CUDA samples will be displayed at the end, as shown in the figure below.

2 ) Run the compiled binary file.
The compiled binary file is stored in NVIDIA_CUDA-9.0_Samples/bin by default.
Then enter in the previous terminal: $ cd /home/lxxx/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release where xxx is your own username
Then enter in the terminal: $ ./deviceQuery

The result is shown in the following figure: See a display similar to the following picture, it means CUDA is installed and configured successfully, where Result = PASS means success, if it fails Result = FAIL

3 ) Finally, check the connection between the system and the CUDA-Capable device
Terminal input: $ ./bandwidthTest
See a display similar to the following picture, it means success

Finally, I wish everyone a happy installation and use of CUDA