Install PyTesser under ubuntu 14.04 for OCR recognition

Preface#

During playing python, I saw a lot of crawlers made with python. It felt very fun. I started my crawler journey. I was inspired by some tutorials and wanted to try the school's educational administration system. Unfortunately, login requires a verification code, so I looked around. The solution was finally found to be understandable.

There are a lot of repetitions on the Internet, and many of them are directly ctrl+c/v. It is inevitable that you can't bear to look at article typesetting, and the installation process is a bit cumbersome. I will organize and record it here for the time being for a novice like myself. It.

1、 Essential library#

Check whether the following libraries have been installed in the system:

libpng   , libjpeg ,libtiff,zlibg-dev

command:

ldconfig -p | grep libpng
ldconfig -p | grep libjpeg
ldconfig -p | grep libtiff
ldconfig -p | grep zlibg

Install if not:

sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlibg-dev

Some tutorials require the following libraries:

gcc、g++、automake

sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install automake

2、 Install PIL

method one##

Go to the PIL homepage and download the appropriate python version of PIL: http://www.pythonware.com/products/pil/

My python is version 2.7, the download address is: http://effbot.org/downloads/Imaging-1.1.7.tar.gz

Unzip the compressed package:

sudo ar -zxvf Imaging-1.1.7.tar.gz

Enter the unzipped folder:

 sudo cd Imaging-1.1.7

installation:

 sudo python setup.py install

Method Two##

sudo apt-get install libjpeg-dev
sudo apt-get install libfreetype6-dev
sudo easy_install PIL

If libjpeg-dev is not installed, a "decoder jpeg not available" error will appear, as shown below.

If libfreetype6-dev is not installed, the error "The _imagingft C module is not installed" will appear, as shown below:

3、 Install Tesseract:

Download the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list I downloaded version 3.02.

Unzip the compressed package:

sudo tar -zxvf tesseract-ocr-3.02.02.tar.gz

Enter the unzipped folder:

sudo cd tesseract-ocr

installation:

sudo ./configure --prefix=/opt/tesseract
sudo make
sudo make install

After the installation is complete, configure the PATH and modify the .profile or .bash-profile in the user's home directory. Here I am modifying the .bash-profile. Add the following to the PATH.

: /opt/tesseract/bin

Such as the command:

export PATH=$PATH :/opt/tesseract/bin

Make the configuration file effective:

sudo .bash-profile

prompt:

1、 Use --prefix to specify the installation directory, my installation directory is /opt/tesseract

2、 If prompted: ./configure: line 1937: config.log: Permission denied

Need to use sudo

4. Install pytesser:

Download pytesser

http://code.google.com/p/pytesser/downloads/list

There is currently only one version.

Unzip the compressed package###

sudo unzip pytesser_v0.0.1.zip

Tip:

  1. It is recommended to create a folder, put the compressed package in the folder and decompress, because directly using unzip to decompress will extract the contents of the compressed package to the current directory, which is not easy to manage.

  2. When decompressing, you can use -d to decompress the zip file to the specified folder, such as:

sudo unzip pytesser_v0.0.1.zip -d /opt/py

test###

There are two image files "phototest.tif" and "fnord.tif" in the directory for testing. Here, select "fnord.tif" and write a python script directly in the directory for testing:
test.py:

from pytesser import*
im = Image.open('fnord.tif')
text =image_to_string(im)
print text

image:

effect:

addendum#

1、 Cannot call .py files outside the pytesser folder###

At present, pytesser can only be used in the unzipped folder, and used outside the folder, even if it is written as the following code:

import sys
sys.path.append("/opt/pythonk/pytesser")from pytesser import*
im = Image.open('fnord.tif')
text =image_to_string(im)
print text

Will still report:

Traceback(most recent call last):
 File "/home/wind/KuaiPan/text.py", line 11,in<module>from pytesser import*
ImportError: No module named pytesser

mistake.

2、 pytesser depends on PIL, so you need to install the PIL module first###

3、 leptonica

It is said that pytesser calls tesseract, so tesseract needs to be installed, and tesseract needs to be installed leptonica, otherwise "configure: error: leptonica not found" appears when compiling tesseract.

Download and install leptonica

http://www.leptonica.org/download.html

or

http://code.google.com/p/leptonica/downloads/list

The latest is leptonica-1.69.tar.bz2

4、 Explanation and usage of ldconfig command###

When we are doing Linux development, it often happens that certain libraries cannot be found. After we add these libraries, how to check whether the paths of these libraries are recognized? Here is a command:

ldconfig -p | grep lts

Description: Use the ldconfig -p command to print out the names of all the libraries saved in the current cache, and then use a pipe to pass to the grep lts command to resolve whether the path of the liblts.so shared library has been added to the cache.

The following is the explanation and usage of the ldconfig command: http://blog.163.com/cn_prince/blog/static/638790120078289157270/

ldconfig is a dynamic link library management command. In order to make the dynamic link library shared by the system, you also need to run the dynamic link library management command - ldconfig. The purpose of the ldconfig command is mainly to search in the default directory (/lib and /usr/lib ) And the directories listed in the dynamic library configuration file /etc/ld.so.conf, search for a sharable dynamic link library (format as described above, lib*.so*), and then create a dynamic loader ( ld.so) required connection and cache files. The cache file defaults to /etc/ld.so.cache, this file saves a sorted list of dynamic link library names.

ldconfig usually runs when the system starts, and when the user installs a new dynamic link library, you need to manually run this command.

The ldconfig command line usage is as follows:

ldconfig [-v|--verbose] [-n] [-N] [-X] [-f CONF] [-C CACHE] [-r ROOT] [-l] [-p|--print-cache] [-c FORMAT] [--format=FORMAT] [-V] [-?|--help|--usage] path...

The options available in ldconfig are described as follows:

(1) - v or --verbose: When using this option, ldconfig will display the directory being scanned, the dynamic link library searched, and the name of the connection it creates.

(2) - n: When using this option, ldconfig only scans the directories specified by the command line, not the default directories (/lib, /usr/lib), nor the directories listed in the configuration file /etc/ld.so.conf.

(3) - N: This option instructs ldconfig not to rebuild the cache file (/etc/ld.so.cache). If the -X option is not used, ldconfig updates the file connection as usual.

(4) - X: This option instructs ldconfig not to update the file connection. If the -N option is not used, the cache file is updated normally.

(5) - f CONF: This option specifies that the configuration file of the dynamic link library is CONF, and the system defaults to /etc/ld.so.conf.

(6) - C CACHE: This option specifies that the generated cache file is CACHE, the system default is /etc/ld.so.cache, this file stores a list of sorted and shareable dynamic link libraries.

(7) - r ROOT: This option changes the root directory of the application to ROOT (implemented by calling the chroot function). When this option is selected, the system default configuration file /etc/ld.so.conf, the actual corresponding to ROOT/etc/ld .so.conf. For example, when -r /usr/zzz is used, when the configuration file /etc/ld.so.conf is opened, the file /usr/zzz/etc/ld.so.conf is actually opened. With this option, Can greatly increase the flexibility of dynamic link library management.

(8) - l: Under normal circumstances, ldconfig will automatically establish a dynamic link library connection when searching for a dynamic link library. When you select this option, you will enter the expert mode and you need to set the connection manually. Normal users don't use this option.

(9) - p or --print-cache: This option instructs ldconfig to print out the names of all shared libraries saved in the current cache file.

(10) - c FORMAT or --format=FORMAT: This option is used to specify the format used by the cache file, there are three types: old (old format), new (new format) and compat (compatible format, this is the default format).

(11) - V: This option prints out the version information of ldconfig, and then exits. (12) -? or --help or --usage: These three options have the same effect, they all let ldconfig print out its help information, and then exit.

References#

Install PyTesser under ubuntu 12.04 for OCR recognition

Install and use pytesser under linux, picture text recognition

(Turn) python image recognition applet, verification code recognition

Install PIL under Ubuntu

Command to check if a library exists in Linux/ARMLinux

Recommended Posts

Install PyTesser under ubuntu 14.04 for OCR recognition
Install YouCompleteMe plugin for vim under Ubuntu
Install node.js under Ubuntu
Install mysql under Ubuntu 16.04
Install Thrift under ubuntu 14.10
Install OpenJDK10 under Ubuntu
Install Caffe under Ubuntu 14.04
2018-09-11 Install arduino under Ubuntu
Install ROS under ROS Ubuntu 18.04[2]
Install MySQL under Ubuntu
Install Yarm-PM2 under Ubuntu
Install server-side Shadowsocks under Ubuntu 16.04
Install rgl package under Ubuntu
Use QQ under Ubuntu 13.10, suitable for 14.10
Small tool: install screenshot tool under ubuntu
Install nodejs and npm under Ubuntu 16.04
How to install Audacious under Ubuntu
How to install Tensorflow under ubuntu 16.04
Compile and install QEMU under Ubuntu
Solution for Postgresql uninstall error under Ubuntu
Install Ubuntu with Parallels Desktop under Mac
Three ways to install software under Ubuntu
Install utility software collection under Ubuntu system
How to install vim editor under Linux (Ubuntu 18.04)
ubuntu18.04 install python2
Install MySQL 5.7 under CentOS 7 for middle class children!
ubuntu12.04 install python3
Ubuntu install guide
ubuntu install nodejs
ubuntu 16 install asp.net
ubuntu install leanote
ubuntu install Jenkins
Simple and clean install remote login for ubuntu18
Steps to add toolkit for Matlab_Linux under ubuntu
docker install ubuntu
ubuntu install elasticsearch
ubuntu16.0.1 install pagoda
ubuntu18.04 install python
ubuntu 18.04 install teamviewer
How to install gcc compiler faster under Ubuntu
ubuntu install sendmail
Install win7 and archlinux on hard disk under ubuntu
Install nvidia graphics driver under Ubuntu (easy installation method)
Compile and install OpenJDK8 from source code under Ubuntu 18.04.1
Install ubuntu dual system under win10 (no U disk)
Cpp web (1) Install and use Crow service under Ubuntu
How to install two versions of OpenCV under ubuntu?
Manual for Ubuntu Installation
install vscode on ubuntu18
Install mysql-pytho in Ubuntu
ubuntu install zsh terminal
Install Redis on Ubuntu
Install R4 on ubuntu20
Install nvtop on Ubuntu 18.04
Install postgresql-10 on Ubuntu 18.04
Install docker on Ubuntu
Install mysql5.7 under CentOS7
Ubuntu18.04 install opencv 3.2.0 solution
Install ActiveMQ under Centos7
Python MySQLd under Ubuntu
Install Docker on ubuntu18.04