Step 1 CUDA installation
To use Caffe and NVIDIA GPU together, you need to install CUDA Toolkit.
Step 2 cuDNN installation
Download cuDNN library for Linux, here you need to register Accelerated Computing Developer Program;
After downloading, unzip the file and copy it to the CUDA directory. Take cuDNN v5.1 as an example:
tar zvxf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda/
sudo cp lib64/lib*/usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/
- Establish a soft link and update:
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.5
sudo ln -s libcudnn.so.5.1.10 libcudnn.so.5
sudo ln -s libcudnn.so.5 libcudnn.so
sudo ldconfig -v
Note: cuDNN has poor compatibility in many projects. It may be necessary to install a specific historical version. You only need to modify the version in the above command.
Check whether caffe successfully uses cuDNN v5:
ldd ./build/tools/caffe.bin | grep cudnn
$ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev git
$ sudo apt-get install --no-install-recommends libboost-all-dev
$ sudo apt-get install libatlas-base-dev #Install ATLAS
$ sudo apt-get install libopenblas-dev #Install OpenBLAS
$ sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
$ git clone https://github.com/NVIDIA/nccl.git
$ cd nccl
$ sudo make install -j4
$ sudo ldconfig
The NCCL library and file header will be installed in /usr/local/lib and /usr/local/include.
NCCL is mainly used to speed up the process of collective in a multi-GPU environment. When multiple GPUs are used for training at the same time, it makes a synchronization, or reduction, to speed up the collective process.
What is its core idea? When doing data transmission, cut large blocks of data into small blocks, and use multiple links in the system at the same time. For example, it is a PCI-E link now. At the same time, use the uplink and downlink of PCI-E to avoid differences. Using a certain uplink or downlink channel for data at the same time may cause contention of the data and greatly reduce the transmission efficiency.
Step 5 compile Caffe
Download Caffe
$ git clone https://github.com/BVLC/caffe.git
$ cd caffe/
$ cp Makefile.config.example Makefile.config
- Edit Makefile.config, make changes:
Cancel USE_CUDNN :=Note 1 to enable cuDNN acceleration;
Cancel USE_NCCL :=Note 1 to enable NCCL required to run Caffe on multiple GPUs
- Compile and install Caffe
$ make all -j8
$ make test -j8
$ make pycaffe # python API
$ make matcaffe #matlab API, need to define matlab path
After completing the installation, you can get the Caffe binary file in build/tools/caffe.
ldd build/tools/caffe #View all libraries
ldd build/tools/caffe | grep cudnn #View cudnn version information
ldd build/tools/caffe | grep openblas #View openblas library information
/path/to/imagenet/train/n01440764/n01440764_10026.JPEG
/path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG
$ ./data/ilsvrc12/get_ilsvrc_aux.sh
# Change examples/imagenet/create_imagenet.TRAIN in sh script_DATA_ROOT and VAL_DATA_ROOT is the original image path after decompression
# Set up RESIZE=true to adjust the image to the proper size before adding it to the database
# Create image database
$ ./examples/imagenet/create_imagenet.sh
# Create the required image mean file
$ ./examples/imagenet/make_imagenet_mean.sh
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:$LD_LIBRARY_PATH
# Edit models/bvlc_alexnet/solver.prototxt file
$ ./build/tools/caffe train –solver=models/bvlc_alexnet/solver.prototxt –gpu 0
# You can specify multiple device IDs (such as 0, 1, 2, 3) or specify "-gpu all" to use all available GPUs in the system and train on multiple GPUs.
$ sudo apt-get install python-numpy python-scipy python-matplotlib python-sklearn python-skimage python-h5py python-protobuf python-leveldb python-networkx python-nose python-pandas python-gflags Cython ipython
$ sudo apt-get install python-pip wget
$ sudo apt-get install liblmdb-dev
or
$ sudo pip install lmdb
$ sudo conda install opencv # for Anaconda
or
$ sudo apt-get install python-opencv
$ sudo apt-get install python-skimage
$ sudo apt-get update
$ make pycaffe #Recompile the python API
# Set environment variables
$ sudo gedit /etc/profile
# Add environment variables:
export PYTHONPATH=${HOME}/caffe-master/distribute/python:$PYTHONPATH
export LD_LIBRARY_PATH=${HOME}/caffe-master/build/lib:$LD_LIBRARY_PATH
$ source /etc/profile #Make environment variables take effect
$ echo $<Environment variable name> #View environment variables
(The detailed error message is similar to: Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python)
Open the Makefile in the caffe directory.config file, find WITH_PYTHON_LAYER :=1 In this line, replace the preceding '#'Remove and recompile, it is best to reopen a terminal to compile
$ sudo pip install easydict
$ wget https://code.google.com/p/protobuf/wiki/Download?tm=2
$Unzip the file and enter the folder
$ ./configure
$ make
$ make check
$ make install
$ ./configure && make && cd python && python setup.py test && python setup.py install
Reference 1: http://code.google.com/p/protobuf/issues/detail?id=235
Quote 2: http://blog.csdn.net/qinglu000/article/details/17242011
$ sudo apt-get install python-protobuf
or
Use the Synaptic package to search for "python-protobuf" installation
- Step 1: In Makefile.line 85 of the config file, add/usr/include/hdf5/serial/To INCLUDE_DIRS, that is, change the first line of code below to the second line of code:
INCLUDE_DIRS :=$(PYTHON_INCLUDE)/usr/local/include
INCLUDE_DIRS :=$(PYTHON_INCLUDE)/usr/local/include /usr/include/hdf5/serial/
- Step 2: In line 173 of the Makefile, put hdf5_hl and hdf5 are modified to hdf5_serial_hl and hdf5_serial, that is, change the first line of code below to the second line of code:
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial
# NVIDIA NCCL is required to run Caffe on multiple GPUs
$ git clone https://github.com/NVIDIA/nccl.git
$ cd nccl
$ sudo make install -j4
# The NCCL library and file header will be installed in/usr/local/lib and/usr/local/include
$ sudo ldconfig #If this command is not executed, an error will occur: error while loading shared libraries: libnccl.so.1: cannot open shared object file: No such file or directory
$ sudo apt-get install python-protobuf
or
You can download the installation package first, compile and install it yourself. Can refer to: http://blog.csdn.net/paynetiger/article/details/8197326
The first method is recommended, the following is the key:
If you use Anaconda, whichever of the above two methods will install the prototxt related files to/usr/local/lib/python2.7/dist-packages, need to copy related files to Anaconda/lib/python2.7/site-packages can be used normally.
$ sudo apt-get install graphviz #Install graphviz
$ sudo pip install pydot #Install pydot
If you use Anaconda, you need to copy the relevant files from/usr/local/lib/python2.7/dist-Copy packages to Anaconda/lib/python2.7/site-under packages.
$ sudo ldconfig /usr/local/cuda/lib64
$ sudo ldconfig /usr/local/cuda-7.5/lib64
$ sudo ldconfig /usr/local/cuda-7.5/lib64
import sys
sys.path.append("/(Your caffe-master path)/caffe-master/python")
sys.path.append("/(Your caffe-master path)/caffe-master/python/caffe")import caffe
# Download protobuf-2.3.0:
$ wget http://protobuf.googlecode.com/files/protobuf-2.5.0.zip
$ unzip protobuf-2.5.0
$ cd protobuf-2.5.0
$ chmod 777 configure
$ ./configure
$ make -j4
$ make check -j4
$ make install
# Compile python interface
$ cd ./python
$ python setup.py build
$ python setup.py test
$ python setup.py install
$ protoc -version #Verify using commands
>> import google.protobuf
$ sudo apt-get install automake autoconf libtool
# This issue arises when you have libopenblas-base and libatlas3-base installed, but don't haveliblapack3 installed. This combination of packages installs conflicting versions of libblas.so(from OpenBLAS) and liblapack.so(from ATLAS).
# Solution 1(my favorite): You can keep both OpenBLAS and ATLAS on your machine if you also install liblapack3.
$ sudo apt-get install liblapack3
# Solution 2: Uninstall ATLAS(this will actually install liblapack3 for you automatically because of some deb package shenanigans)
$ sudo apt-get uninstall libatlas3-base
# Solution 3: Uninstall OpenBLAS
$ sudo apt-get uninstall libopenblas-base
----------------------------------------------------------------------------------------------------------------------------------------------------------------
# Bad configuration
$ dpkg -l | grep 'openblas\|atlas\|lapack'
ii libatlas3-base 3.10.1-4 amd64 Automatically Tuned Linear Algebra Software, generic shared
ii libopenblas-base 0.2.8-6ubuntu1 amd64 Optimized BLAS(linear algebra) library based on GotoBLAS2
$ update-alternatives --get-selections | grep 'blas\|lapack'
libblas.so.3 auto /usr/lib/openblas-base/libblas.so.3
liblapack.so.3 auto /usr/lib/atlas-base/atlas/liblapack.so.3
$ python -c 'import numpy'Traceback(most recent call last):
File "<string>", line 1,in<module>
File "/usr/lib/python2.7/dist-packages/numpy/__init__.py", line 153,in<module>from.import add_newdocs
File "/usr/lib/python2.7/dist-packages/numpy/add_newdocs.py", line 13,in<module>from numpy.lib import add_newdoc
File "/usr/lib/python2.7/dist-packages/numpy/lib/__init__.py", line 18,in<module>from.polynomial import*
File "/usr/lib/python2.7/dist-packages/numpy/lib/polynomial.py", line 19,in<module>from numpy.linalg import eigvals, lstsq, inv
File "/usr/lib/python2.7/dist-packages/numpy/linalg/__init__.py", line 50,in<module>from.linalg import*
File "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 29,in<module>from numpy.linalg import lapack_lite, _umath_linalg
ImportError:/usr/lib/liblapack.so.3: undefined symbol: ATL_chemv
# Solution 1
$ dpkg -l | grep 'openblas\|atlas\|lapack'
ii libatlas3-base 3.10.1-4 amd64 Automatically Tuned Linear Algebra Software, generic shared
ii liblapack3 3.5.0-2ubuntu1 amd64 Library of linear algebra routines 3- shared version
ii libopenblas-base 0.2.8-6ubuntu1 amd64 Optimized BLAS(linear algebra) library based on GotoBLAS2
$ update-alternatives --get-selections | grep 'blas\|lapack'
libblas.so.3 auto /usr/lib/openblas-base/libblas.so.3
liblapack.so.3 auto /usr/lib/lapack/liblapack.so.3
$ python -c 'import numpy'
# Solution 2
$ dpkg -l | grep 'openblas\|atlas\|lapack'
ii liblapack3 3.5.0-2ubuntu1 amd64 Library of linear algebra routines 3- shared version
ii libopenblas-base 0.2.8-6ubuntu1 amd64 Optimized BLAS(linear algebra) library based on GotoBLAS2
$ update-alternatives --get-selections | grep 'blas\|lapack'
libblas.so.3 auto /usr/lib/openblas-base/libblas.so.3
liblapack.so.3 auto /usr/lib/lapack/liblapack.so.3
$ python -c 'import numpy'
# Solution 3
$ dpkg -l | grep 'openblas\|atlas\|lapack'
ii libatlas3-base 3.10.1-4 amd64 Automatically Tuned Linear Algebra Software, generic shared
$ update-alternatives --get-selections | grep 'blas\|lapack'
libblas.so.3 auto /usr/lib/atlas-base/atlas/libblas.so.3
liblapack.so.3 auto /usr/lib/atlas-base/atlas/liblapack.so.3
$ python -c 'import numpy'
# Check environment variable settings
$ echo $PATH
$ echo $LD_LIBRARY_PATH
# Copy some files to/usr/local/Under the lib folder:
$ sudo cp /usr/local/cuda-7.5/lib64/libcudart.so.7.5/usr/local/lib/libcudart.so.7.5&& sudo ldconfig
$ sudo cp /usr/local/cuda-7.5/lib64/libcublas.so.7.5/usr/local/lib/libcublas.so.7.5&& sudo ldconfig
$ sudo cp /usr/local/cuda-7.5/lib64/libcurand.so.7.5/usr/local/lib/libcurand.so.7.5&& sudo ldconfig
$ sudo apt-get update
# An error occurred- W: GPG error: http://archive.ubuntukylin.com:10006 xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 8D5A09DC9B929006
# It is the key problem, the solution:
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 8D5A09DC9B929006
# Note that the prompted key is the same as your key in the above command, and everyone’s computer is different
$ sudo apt-get install python-yaml
Or source installation, such as PyYAML 3.11 :
$ wget http://pyyaml.org/download/pyyaml/PyYAML-3.11.tar.gz
$ tar -zxvf PyYAML-3.11.tar.gz
$ python setup.py install
# Install the program sudo apt through the terminal-Error when get install xxx:
# E: Could not get lock /var/lib/dpkg/lock -open(11: Resource temporarily unavailable)
# E: Unable to lock the administration directory(/var/lib/dpkg/), is another process using it
# This problem may occur because another program is running, causing the resource to be locked and unavailable. The reason for the resource being locked may be that the installation or update was not completed normally the last time the installation or update was run, and this situation occurred. The solution:
$ sudo rm /var/cache/apt/archives/lock
$ sudo rm /var/lib/dpkg/lock
# Problem Description:
# The server is Ubuntu14.04, NVIDIA driven by 352.39 is automatically upgraded to 352.63, the graphics card cannot be used, the error is: running nvidia-smi command, the information obtained is "Failed to initialize NVML: GPU access blocked by the operating sestem”
# System: Ubuntu14.04
# cuda:7.5
# solution:
# 1. First close all updates included in the system
$ sudo vim /etc/apt/apt.conf.d/50unattended-upgrades
# (Comment out the update part)
# Reference link: http://www.linuxdiyf.com/Linux/15997.html
# 2. Uninstall the cuda driver and reinstall
# (1) Completely uninstall
$ sudo apt-get remove --purge nvidia*
$ sudo apt-get autoremove
$ sudo apt-get clean
$ dpkg -l |grep ^rc|awk '{print $2}'|sudo xargs dpkg -P
# Reference link:
# https://devtalk.nvidia.com/default/topic/900899/cuda-setup-and-installation/unable-to-detect-cuda-capable-device-after-automatic-forced-nvidia-updated/
# http://zhidao.baidu.com/link?url=smwXar3NPdAi1WxnZJ2_sARCEPoNcxLwB0RwmEnDPiqyrbdz64aVCoabN9azod-AQrJP0OjeiL8-y8mFRHZDma
# (2) Reinstall cuda
# Since the previous system Ubuntu14.04After configuring the caffe environment, when compiling the matlab interface, change gcc from 4.8 is downgraded to 4.7. If you install cuda directly, you will get an error, "Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems,for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.”.If you follow this path to think, I try to add "--kernel-source-After "path", there are still problems. Continue to go down, you will be prompted to downgrade the system kernel.
# Considering that the gcc of the system had been downgraded before, which led to the above error, gcc was upgraded.
$ sudo apt-get install gcc-4.7
$ cd /usr/bin
$ sudo mv gcc gcc.bak
$ sudo ln -s gcc-4.7 gcc
$ sudo mv g++ g++.bak
$ sudo ln -s g++-4.7 g++
# Reference link: http://www.mamicode.com/info-detail-876185.html
# Then reinstall the cuda driver according to the conventional method to solve the problem.
# Transfer from: http://blog.csdn.net/u012494820/article/details/52289095
# Power on, press E in the GRUB selection interface, and the interface becomes an editor.
# Find ro quiet splash in the last few lines
# Then delete quiet, change to text, and then press F10
# At this time you have entered the operating system (ctrl+alt+F1-F6), enter the user name and password to log in.
# Then enter the following code:
$ sudo add-apt-repository ppa:bumblebee/stable
$ sudo apt-get update
$ sudo apt-get install bumblebee bumblebee-nvidia
# From the official website(http://opencv.org/downloads.html)Download OpenCV,And unzip it to the installation location, assuming it is unzipped to/home/opencv
# Create a compilation folder:
$ cd ~/opencv
$ mkdir build
$ cd build
# Configuration:
$ cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
# Compile:
make -j8 #-j8 means parallel computing, you can also make directly
# opencv installation
$ sudo make install
# The solution is to copy some files to/usr/local/Under the lib folder:
# Note the CUDA version number
$ sudo cp /usr/local/cuda-8.0/lib64/libcudart.so.8.0/usr/local/lib/libcudart.so.8.0&& sudo ldconfig
$ sudo cp /usr/local/cuda-8.0/lib64/libcublas.so.8.0/usr/local/lib/libcublas.so.8.0&& sudo ldconfig
$ sudo cp /usr/local/cuda-8.0/lib64/libcurand.so.8.0/usr/local/lib/libcurand.so.8.0&& sudo ldconfig
$ sudo apt-get install libmatio-dev
Or source installation:
# Download matio(https://sourceforge.net/projects/matio/)
$ tar zxf matio-X.Y.Z.tar.gz
$ cd matio-X.Y.Z
$ ./configure
$ make
$ make check
$ make install #installation
$ export LD_LIBRARY_PATH=/path/to/libmatio.so.2
# Makefile in caffe.INCLUDE in config_Add the src path of matio in DIRS, LIBRARY_Add src to DIRS/.libs, such as:
# INCLUDE_DIRS :=$(PYTHON_INCLUDE)/usr/local/include /path/to/matio-1.5.2/src
# LIBRARY_DIRS :=$(PYTHON_LIB)/usr/local/lib /usr/lib /path/to/matio-1.5.2/src/.libs
# Reference: http://blog.csdn.net/houqiqi/article/details/46469981
# cython_bbox and cython_nms problem
$ cd fast_rcnn_root/lib
$ python setup.py install
# setup.After py is installed,
$ cd python_root/Lib/site-packages/utils
# Two files can be found cython_bbox.so and cython_nms.so, copy these two files to fast_rcnn_root/lib/utils can.
# Reference: http://blog.csdn.net/happynear/article/details/46822109
Will common.cuh make the following modifications, pay attention to the final endif
1. # ifndef CAFFE_COMMON_CUH_
2. # define CAFFE_COMMON_CUH_
3.4.5. # include <cuda.h>6. #if!defined(__CUDA_ARCH__)|| __CUDA_ARCH__ >=6007. #else8.// CUDA: atomicAdd is not defined for doubles 9.static __inline__ __device__ double atomicAdd(double *address, double val){10. unsigned long long int* address_as_ull =(unsigned long long int*)address;11. unsigned long long int old =*address_as_ull, assumed;12.if(val==0.0)13.return__longlong_as_double(old);14.do{15. assumed = old;16. old =atomicCAS(address_as_ull, assumed,__double_as_longlong(val +__longlong_as_double(assumed)));17.}while(assumed != old);18.return__longlong_as_double(old);19.}20. #endif
21. # endif
At this point, you can basically make pass.
problem:
$ nvcc -V
>> The program 'nvcc' is currently not installed. You can install it by typing:
sudo apt-get install nvidia-cuda-toolkit
But cuda has been installed, and the folder cuda-XX can be seen in /usr/local. This requires the following settings:
$ sudo gedit ~/.bashrc
At the end, paste the following for 64-bit words:
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
$ source ~/.bashrc
Make it effective immediately
problem:
Sometimes you may use C for convenience when writing caffe code++For some new features after 98, if you do not modify the default compilation options of caffe, errors such as "xxx is not a member of'std'" or some functions do not match the declaration will be generated.
solution:
G in Makefile++Add in the compile command-std=c++11 flag.
In caffe's Makefile,
CXXXFLAGS +=-pthread -fPIC $(COMMON_FLAGS)$(WRANINGS)
Add after-std=c++11
From Compilation problem using C++ 11 features in caffe
problem:
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<boost::shared_ptr<caffe::DataReader::QueuePair> >::size() const'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<boost::shared_ptr<caffe::DataReader::QueuePair>>::push(boost::shared_ptr<caffe::DataReader::QueuePair>const&)'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<caffe::Datum*>::push(caffe::Datum* const&)'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<caffe::Datum*>::BlockingQueue()'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<boost::shared_ptr<caffe::DataReader::QueuePair> >::pop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<caffe::Datum*>::pop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>const&)'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<caffe::Datum*>::try_pop(caffe::Datum**)'
.. /lib/libcaffe.so.1.0.0: undefined reference to `caffe::BlockingQueue<boost::shared_ptr<caffe::DataReader::QueuePair>>::BlockingQueue()'
solution:
//In blocking_queue.Add a declaration of generic instantiation of data types in cpp
template classBlockingQueue<Datum*>;
template classBlockingQueue<shared_ptr<DataReader::QueuePair>>;
It may be a problem with the caffe library path:
import sys
sys.path.insert(0,caffe_root+'python)import caffe
Uncomment to support layers written in Python (will link against Python libs)
solution:
Edit: Makefile.config
WITH_PYTHON_LAYER :=1
The reason for the error is that the gfortran compiler is not installed
sudo apt-get install gfortran
Recommended Posts