Implementation of Python headless crawler to download files

Some pages cannot directly use requests to obtain content, and will dynamically execute some js code to generate content. This article is mainly for those special pages, such as the situation where js calls must be made to download.

Install chrome

wget [https://dl.google.com/linux/direct/google-chrome-stable\_current\_x86\_64.rpm](https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm)
yum install ./google-chrome-stable\_current\_x86\_64.rpm
yum install mesa-libOSMesa-devel gnu-free-sans-fonts wqy-zenhei-fonts

Install chromedriver

Taobao source (recommended)

wget http://npm.taobao.org/mirrors/chromedriver/2.41/chromedriver_linux64.zip
unzip chromedriver\_linux64.zip
move chromedriver /usr/bin/
chmod +x /usr/bin/chromedriver

Thanks for this blog

For the above steps, you can choose the version that suits you to download. Note: Chrome and chrome driver must match the version, and chrome driver will note the supported chrome version number.

Actual operation

Need to introduce libraries

from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException

Chrome startup settings

chrome_options =Options()
chrome_options.add_argument('--no-sandbox')#Solve the error that the DevToolsActivePort file does not exist
chrome_options.add_argument('window-size=1920x3000') #Specify browser resolution
chrome_options.add_argument('--disable-gpu') #Google documentation mentions that this attribute needs to be added to avoid bugs
chrome_options.add_argument('--hide-scrollbars') #Hide scroll bar,Deal with some special pages
chrome_options.add_argument('blink-settings=imagesEnabled=false') #Don't load pictures,boost speed
chrome_options.add_argument('--headless') #The browser does not provide a visual page.If the system does not support visualization under linux, it will fail to start without adding this one

Also thanks to the blog above

Set additional parameters, such as download no pop-up and default download path

prefs ={'profile.default_content_settings.popups':0,'download.default_directory':'./filelist'}
chrome_options.add_experimental_option('prefs', prefs)

Initialize the driver

cls.driver=webdriver.Chrome(options=chrome_options)

Exit the driver

cls.driver.quit()

Request a url

cls.driver.get(url)

Execute the specified js code

cls.driver.execute_script('console.log("helloworld")')

Find the specified element

subtitle = cls.driver.find_element_by_class_name("fubiaoti").text

So far, this article on the implementation of Python headless crawler download files is introduced. For more relevant Python headless crawler download file content, please search for previous articles of ZaLou.Cn or continue to browse related articles below. Hope you will get more Support ZaLou.Cn!

Recommended Posts

Implementation of Python headless crawler to download files
Example of how to automatically download pictures in python
Analysis of JS of Python crawler
Python implementation of gomoku program
3 ways to encrypt Python files
Detailed implementation of Python plug-in mechanism
Implementation of reverse traversal of python list
Python implementation of IOU calculation case
Python preliminary implementation of word2vec operation
Implementation of python selenium operation cookie
Scrapy simulation login of Python crawler
Mongodb and python interaction of python crawler
Implementation of python student management system
Implementation of python gradient descent algorithm
Learning path of python crawler development
Is python crawler easy to learn
Basic analysis of Python turtle library implementation
How to verify successful installation of python
Implementation of JWT user authentication in python
The usage of Ajax in Python3 crawler
Python implementation of intersection and IOU tutorial
Python crawler example to get anime screenshots
Implementation principle of dynamic binding of Python classes
Python example method to open music files
500 lines of python code to achieve aircraft war
How to delete files and directories in python
Python implementation of AI automatic matting example analysis
Python implementation of hand drawing effect example sharing
Implementation of business card management system with python
How to read and write files with Python
Detailed examples of using Python to calculate KS
Python writes the game implementation of fishing master
How does Python store data to json files
7 features of Python3.9
Python renames files
Introduction to Python
Implementation of business card management system based on python
Detailed explanation of the implementation steps of Python interface development
How to understand the introduction of packages in Python
Python implements FTP to upload files in a loop
Python simulation to realize the distribution of playing cards
What are the ways to open files in python