reptile
Basic knowledge: basic principles of website, html, python, multi-process/multi-thread/coroutine, etc. (must learn)
HTML foundation, network request module: requests (must learn), urllib (you can understand)
Need to understand some common anti-climbing strategies and corresponding solutions: common ones include IP frequency restrictions, User-Agent, Referer, Origen verification, cookie restrictions, dynamic loading and verification codes, etc.,
Corresponding processing methods include IP proxy pool, forged header, cookie storage and processing (basic advanced)
Web page analysis extraction: Beautifulsoup&Xpath (choose one), regular expression (must learn)
Dynamic execution of JS, js encryption and Selenium, OCR recognition or coding platform (optional)
Data storage (file read and write, database, Excel/CSV module, etc.) (required)
Network packet capture analysis (optional)
Crawler framework: Scrapy (optional), pyspider (optional)
Distributed crawler (optional)
Data analysis and processing
Basic knowledge: python (functions, modules, object-oriented), regular expressions, JSON (must learn)
Related to the above crawler:
·Basic knowledge: basic principles of website, html, python, multi-process/multi-thread/coroutine, etc. (required)
·HTML basis, network request module: requests (must learn), urllib (you can understand)
·Need to understand some common anti-climbing strategies and corresponding solutions: common IP frequency restrictions, User-Agent, Referer, Origen verification, Cookie restrictions, dynamic loading and verification codes, etc.,
Corresponding processing methods include IP proxy pool, forged header, cookie storage and processing (basic advanced)
·Web page analysis and extraction: Beautifulsoup&Xpath (choose one), regular expression (must learn)
·Dynamic execution of JS, js encryption and Selenium, OCR recognition or coding platform (optional)
·Data storage (file read and write, database, Excel/CSV module, etc.) (required)
Data analysis related libraries: Pandas, Numpy, Scipy, stutter analysis, etc. (required)
Chart drawing and visualization: Matplotlip, word cloud (must learn)
Big data (data mining, machine learning)
Basic knowledge: python (basic + advanced) (must learn)
Finance, statistics, econometrics, investment (required)
Data storage (file read and write, database, Excel/CSV module, etc.) (required)
Data analysis related libraries: Pandas, Numpy, Scipy, stammering word segmentation (must learn)
Chart drawing and visualization: Matplotlip, etc. (required)
Machine learning related model knowledge: Naive Bayes, decision tree, Logistic regression, linear regression, KNN algorithm, SVM,
Boosting, clustering, recommendation system, pLSA, LDA, GDBT, Regularization, anomaly detection, EM algorithm, Apriori,
FP Growth, etc. (required)
Recommended Posts