Python image recognition OCR

Article Directory###

Python image recognition OCR

#1 demand##

#2 surroundings##

macOS / Linux
Python3.7.6

#3 installation##

#3.1 macOS

  1. Install tesseract
//Only install tesseract, do not install training tools
brew install tesseract
 
//Install the training tool while installing tesseract
brew install --with-training-tools tesseract
 
//When installing tesseract, install all languages at the same time. The language pack is relatively large. If it takes a long time to install, it is recommended not to install it. Choose according to your needs
brew install  --all-languages tesseract
 
//Install tesseract, and install training tools and languages
brew install --all-languages --with-training-tools tesseract 
  1. Download language pack

Address: https://github.com/tesseract-ocr/tessdata

I installed the Chinese language pack here

Chinese language pack: https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

Then copy the downloaded Chinese language pack to the following path:

/usr/local/Cellar/tesseract/4.0.0_1/share/tessdata

  1. View local language pack
tesseract --list-langs

#3.2 Linux(CentOS)

  1. Installation dependencies
yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel
  1. Install leptonica

Download: wget https://github.com/tesseract-ocr/tesseract/archive/4.1.0.tar.gz

Unzip and install

tar -xzvf leptonica-1.74.4.tar.gz
cd leptonica-1.74.4.tar.gz
. /configure --profix=/usr/local/leptonica
make
sudo make install
  1. Install tesseract-ocr
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip
cd tesseract-3.04/./configure
make && make install
sudo ldconfig

I installed the Chinese language pack here

Chinese language pack: https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

Then copy the downloaded Chinese language pack to the following path:

/usr/local/share/tessdata

#4 use##

#4.1 python install pytesseract library###

pip install pytesseract
pip install Pillow

#4.2 Python code###

from PIL import Image
import pytesseract
 
# Specify the image path and recognized language
data = pytesseract.image_to_string(Image.open('/Users/Documents/1.png'), lang='chi_sim')print(data)

#5 Online case##

address:

http://admin.minhung.me:20420/#/

Recommended Posts

Python image recognition OCR
Python realizes image recognition car function
python PIL open\display\save image
Python implements image stitching
Python on image processing PIL
Python implements panoramic image stitching
python opencv for image stitching
Python implements verification code recognition
Python implements image stitching function
Python reads .nii format image examples
Python PIL library image graying processing
Python dry goods | remote sensing image stitching
Python dry goods | remote sensing image stitching
Install PyTesser under ubuntu 14.04 for OCR recognition
Python implements image outer boundary tracking operation