During playing python, I saw a lot of crawlers made with python. It felt very fun. I started my crawler journey. I was inspired by some tutorials and wanted to try the school's educational administration system. Unfortunately, login requires a verification code, so I looked around. The solution was finally found to be understandable.
There are a lot of repetitions on the Internet, and many of them are directly ctrl+c/v. It is inevitable that you can't bear to look at article typesetting, and the installation process is a bit cumbersome. I will organize and record it here for the time being for a novice like myself. It.
libpng , libjpeg ,libtiff,zlibg-dev
command:
ldconfig -p | grep libpng
ldconfig -p | grep libjpeg
ldconfig -p | grep libtiff
ldconfig -p | grep zlibg
Install if not:
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlibg-dev
gcc、g++、automake
sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install automake
Go to the PIL homepage and download the appropriate python version of PIL: http://www.pythonware.com/products/pil/
My python is version 2.7, the download address is: http://effbot.org/downloads/Imaging-1.1.7.tar.gz
Unzip the compressed package:
sudo ar -zxvf Imaging-1.1.7.tar.gz
Enter the unzipped folder:
sudo cd Imaging-1.1.7
installation:
sudo python setup.py install
sudo apt-get install libjpeg-dev
sudo apt-get install libfreetype6-dev
sudo easy_install PIL
If libjpeg-dev is not installed, a "decoder jpeg not available" error will appear, as shown below.
If libfreetype6-dev is not installed, the error "The _imagingft C module is not installed" will appear, as shown below:
Download the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list I downloaded version 3.02.
Unzip the compressed package:
sudo tar -zxvf tesseract-ocr-3.02.02.tar.gz
Enter the unzipped folder:
sudo cd tesseract-ocr
installation:
sudo ./configure --prefix=/opt/tesseract
sudo make
sudo make install
After the installation is complete, configure the PATH and modify the .profile or .bash-profile in the user's home directory. Here I am modifying the .bash-profile. Add the following to the PATH.
: /opt/tesseract/bin
Such as the command:
export PATH=$PATH :/opt/tesseract/bin
Make the configuration file effective:
sudo .bash-profile
prompt:
1、 Use --prefix to specify the installation directory, my installation directory is /opt/tesseract
2、 If prompted: ./configure: line 1937: config.log: Permission denied
Need to use sudo
http://code.google.com/p/pytesser/downloads/list
There is currently only one version.
sudo unzip pytesser_v0.0.1.zip
It is recommended to create a folder, put the compressed package in the folder and decompress, because directly using unzip to decompress will extract the contents of the compressed package to the current directory, which is not easy to manage.
When decompressing, you can use -d to decompress the zip file to the specified folder, such as:
sudo unzip pytesser_v0.0.1.zip -d /opt/py
There are two image files "phototest.tif" and "fnord.tif" in the directory for testing. Here, select "fnord.tif" and write a python script directly in the directory for testing:
test.py:
from pytesser import*
im = Image.open('fnord.tif')
text =image_to_string(im)
print text
image:
effect:
At present, pytesser can only be used in the unzipped folder, and used outside the folder, even if it is written as the following code:
import sys
sys.path.append("/opt/pythonk/pytesser")from pytesser import*
im = Image.open('fnord.tif')
text =image_to_string(im)
print text
Will still report:
Traceback(most recent call last):
File "/home/wind/KuaiPan/text.py", line 11,in<module>from pytesser import*
ImportError: No module named pytesser
mistake.
It is said that pytesser calls tesseract, so tesseract needs to be installed, and tesseract needs to be installed leptonica, otherwise "configure: error: leptonica not found" appears when compiling tesseract.
Download and install leptonica
http://www.leptonica.org/download.html
or
http://code.google.com/p/leptonica/downloads/list
The latest is leptonica-1.69.tar.bz2
When we are doing Linux development, it often happens that certain libraries cannot be found. After we add these libraries, how to check whether the paths of these libraries are recognized? Here is a command:
ldconfig -p | grep lts
Description: Use the ldconfig -p command to print out the names of all the libraries saved in the current cache, and then use a pipe to pass to the grep lts command to resolve whether the path of the liblts.so shared library has been added to the cache.
The following is the explanation and usage of the ldconfig command: http://blog.163.com/cn_prince/blog/static/638790120078289157270/
ldconfig is a dynamic link library management command. In order to make the dynamic link library shared by the system, you also need to run the dynamic link library management command - ldconfig. The purpose of the ldconfig command is mainly to search in the default directory (/lib and /usr/lib ) And the directories listed in the dynamic library configuration file /etc/ld.so.conf, search for a sharable dynamic link library (format as described above, lib*.so*), and then create a dynamic loader ( ld.so) required connection and cache files. The cache file defaults to /etc/ld.so.cache, this file saves a sorted list of dynamic link library names.
ldconfig usually runs when the system starts, and when the user installs a new dynamic link library, you need to manually run this command.
The ldconfig command line usage is as follows:
ldconfig [-v|--verbose] [-n] [-N] [-X] [-f CONF] [-C CACHE] [-r ROOT] [-l] [-p|--print-cache] [-c FORMAT] [--format=FORMAT] [-V] [-?|--help|--usage] path...
The options available in ldconfig are described as follows:
(1) - v or --verbose: When using this option, ldconfig will display the directory being scanned, the dynamic link library searched, and the name of the connection it creates.
(2) - n: When using this option, ldconfig only scans the directories specified by the command line, not the default directories (/lib, /usr/lib), nor the directories listed in the configuration file /etc/ld.so.conf.
(3) - N: This option instructs ldconfig not to rebuild the cache file (/etc/ld.so.cache). If the -X option is not used, ldconfig updates the file connection as usual.
(4) - X: This option instructs ldconfig not to update the file connection. If the -N option is not used, the cache file is updated normally.
(5) - f CONF: This option specifies that the configuration file of the dynamic link library is CONF, and the system defaults to /etc/ld.so.conf.
(6) - C CACHE: This option specifies that the generated cache file is CACHE, the system default is /etc/ld.so.cache, this file stores a list of sorted and shareable dynamic link libraries.
(7) - r ROOT: This option changes the root directory of the application to ROOT (implemented by calling the chroot function). When this option is selected, the system default configuration file /etc/ld.so.conf, the actual corresponding to ROOT/etc/ld .so.conf. For example, when -r /usr/zzz is used, when the configuration file /etc/ld.so.conf is opened, the file /usr/zzz/etc/ld.so.conf is actually opened. With this option, Can greatly increase the flexibility of dynamic link library management.
(8) - l: Under normal circumstances, ldconfig will automatically establish a dynamic link library connection when searching for a dynamic link library. When you select this option, you will enter the expert mode and you need to set the connection manually. Normal users don't use this option.
(9) - p or --print-cache: This option instructs ldconfig to print out the names of all shared libraries saved in the current cache file.
(10) - c FORMAT or --format=FORMAT: This option is used to specify the format used by the cache file, there are three types: old (old format), new (new format) and compat (compatible format, this is the default format).
(11) - V: This option prints out the version information of ldconfig, and then exits. (12) -? or --help or --usage: These three options have the same effect, they all let ldconfig print out its help information, and then exit.
Install PyTesser under ubuntu 12.04 for OCR recognition
Install and use pytesser under linux, picture text recognition
(Turn) python image recognition applet, verification code recognition
Command to check if a library exists in Linux/ARMLinux
Recommended Posts