Gerapy is a distributed crawler management framework, supports Python 3, based on Scrapy, Scrapyd, Scrapyd-Client, Scrapy-Redis, Scrapyd-API, Scrapy-Splash, Jinjia2, Django, Vue.js development , Gerapy can help us:
1. Execute the following command and wait for the installation to complete
pip3 install gerapy
2. Verify that gerapy is installed successfully
Execute gerapy in the terminal and the following message will appear
“”"
Usage:
gerapy init [–folder=]
gerapy migrate
gerapy createsuperuser
gerapy runserver [host:port]
“”"
1. Create a new project
gerapy init
After executing this command, a gerapy folder will be generated in the current directory, enter the folder, and you will find a folder named projects
2. To initialize the database (operate in the gerapy directory), execute the following command
gerapy migrate
After the database is initialized, a SQLite database will be generated, which stores the host configuration information and deployment version, etc.
3. Start gerapy service
gerapy runserver
At this time, the Gerapy service is enabled on port 8000 of the machine where the gerapy service is started. Enter http://localhost:8000 in the browser to enter the Gerapy management interface. Host management and interface management can be performed on the management interface.
You need to add the IP, port, and name. Click Create to complete the addition. Click Back to see the list of currently added Scrapyd services. After the creation is successful, we can view the added services in the list
2. To execute the crawler, click Schedule. Then run. (The premise is that the crawler has been released in the scrapyd we configured.)
1. Is Gerapy related to scrapyd??
We only use scrapyd to call scrapy for crawling. Just use the command line to start the crawler
curl http://127.0.0.1:6800/schedule.json -d project=project name-d spider=crawler name
Using Greapy is to turn the use of the command line to start the crawler into a "little hand". After we have configured scrapyd in gerapy, there is no need to use the command line, and the crawler can be started directly through the graphical interface.
Recommended Posts