Python crawler gerapy crawler management

13. Gerapy

learning target#####
  1. Understand what is Gerapy
  2. Master the installation of Gerapy
  3. Master Gerapy configuration startup
  4. Master scrapy project management through Gerapy configuration

1. Gerapy introduction:

​ Gerapy is a distributed crawler management framework, supports Python 3, based on Scrapy, Scrapyd, Scrapyd-Client, Scrapy-Redis, Scrapyd-API, Scrapy-Splash, Jinjia2, Django, Vue.js development , Gerapy can help us:

  1. Control crawler operation more conveniently
  2. View crawler status more intuitively
  3. View crawl results in more real time
  4. Simpler implementation of project deployment
  5. More unified host management

2. Gerapy installation###

​ 1. Execute the following command and wait for the installation to complete

pip3 install gerapy

​ 2. Verify that gerapy is installed successfully

​ Execute gerapy in the terminal and the following message will appear

​ “”"
​ Usage:
​ gerapy init [–folder=]
​ gerapy migrate
​ gerapy createsuperuser
​ gerapy runserver [host:port]
​ “”"

3. Gerapy configuration start

​ 1. Create a new project

gerapy init

​ After executing this command, a gerapy folder will be generated in the current directory, enter the folder, and you will find a folder named projects

​ 2. To initialize the database (operate in the gerapy directory), execute the following command

gerapy migrate

​ After the database is initialized, a SQLite database will be generated, which stores the host configuration information and deployment version, etc.

​ 3. Start gerapy service

gerapy runserver

​ At this time, the Gerapy service is enabled on port 8000 of the machine where the gerapy service is started. Enter http://localhost:8000 in the browser to enter the Gerapy management interface. Host management and interface management can be performed on the management interface.

4. Manage scrapy projects through Gerapy configuration###

  1. Configure the host
  2. Add scrapyd host

You need to add the IP, port, and name. Click Create to complete the addition. Click Back to see the list of currently added Scrapyd services. After the creation is successful, we can view the added services in the list

​ 2. To execute the crawler, click Schedule. Then run. (The premise is that the crawler has been released in the scrapyd we configured.)

  1. Configure Projects
  2. We can put the scarpy project directly under /gerapy/projects.

  1. You can see a project in the gerapy background

  1. Click Deploy and click the Deploy button to package and deploy. In the lower right corner, we can enter the description of the package, similar to Git’s commit information, and then click the package button to find that Gerapy will prompt that the package is successful, and the packaged is displayed on the left Result and package name.

  1. Select a site, click Deploy on the right to deploy the project to this site

  1. After successful deployment, the description and deployment time will be displayed

  1. Go to the clients interface, find the node where the project is deployed, and click Schedule

  1. Find the project in the project list in the node, click run on the right to run the project

Supplement:

1. Is Gerapy related to scrapyd??

​ We only use scrapyd to call scrapy for crawling. Just use the command line to start the crawler
​ curl http://127.0.0.1:6800/schedule.json -d project=project name-d spider=crawler name
​ Using Greapy is to turn the use of the command line to start the crawler into a "little hand". After we have configured scrapyd in gerapy, there is no need to use the command line, and the crawler can be started directly through the graphical interface.

summary#####
  1. Understand what is Gerapy
  2. Master the installation of Gerapy
  3. Master Gerapy configuration startup
  4. Master scrapy project management through Gerapy configuration

Recommended Posts

Python crawler gerapy crawler management
Python3 crawler learning.md
Python web crawler (practice)
python_ crawler basic learning
Context management in Python
Python know crawler and anti crawler
Python runtime exception management solution
Python implements parking management system
Python3 crawler data cleaning analysis
Analysis of JS of Python crawler
Python implements car management system
Selenium visual crawler for python crawler
Implementing student management system with python
Python crawler | Cognitive crawler request and response
Python realizes business card management system
Python crawler basic knowledge points finishing
Python3 realizes business card management system
Scrapy simulation login of Python crawler
Python business card management system development
Mongodb and python interaction of python crawler
Python version business card management system
Implementation of python student management system
Learning path of python crawler development
Is python crawler easy to learn
The usage of Ajax in Python3 crawler
Python crawler example to get anime screenshots