A large inventory of commonly used third-party libraries in Python
The Python language has more than 120,000 third-party libraries, covering almost all areas of information technology. Here is a brief introduction to web crawling, automation, data analysis and visualization, WEB development, machine learning, and other commonly used third-party libraries. If there is a library you are interested in, you may wish to try its functions.
1、 Web Crawler
- requests-Highly encapsulate the HTTP protocol and support very rich link access functions.
- PySpider-a powerful web crawler system written by a Chinese with powerful WebUI.
- bs4-beautifulsoup4 library for parsing and processing HTML and XML.
- Scrapy- is a very powerful crawler framework for crawling websites and extracting structured data from their pages. Can be used for various purposes from data mining to monitoring and automated testing
- Crawley-High-speed crawling of the content of the corresponding website, supports relational and non-relational databases, and data can be exported as JSON, XML, etc.
- Portia-Visually crawling web content•cola-Distributed crawler framework
- newspaper-extract news, articles and content analysis
- lxml-lxml is a parsing library of python, this library supports the parsing of HTML and xml, and supports the parsing method of XPath
2、 Automation
•XlsxWriter-Operate the text, numbers, formulas, charts, etc. of Excel worksheets•win32com-Comprehensive application library for Windows system operations, Office (Word, Excel, etc.) file reading and writing, etc.•** pymysql**-operate MySQL database•pymongo-write data into MongoDB•smtplib-send email module•**selenium-**One call to browse The driver of the browser, through this library, can directly call the browser to complete certain operations, such as entering a verification code, which is often used to automate the browser. • pdfminer-a third-party library that can extract all kinds of information from PDF documents. Different from other PDF-related tools, it can fully obtain and analyze PDF text data•PyPDF2-a library that can split, merge and convert PDF pages. • openpyxl- a third-party Python library for processing Microsoft Excel documents, which supports reading and writing Excel's xls, xlsx, xlsm, xltx, xltm. • python-docx-a Python third-party library for processing Microsoft Word documents. It supports reading, querying and modifying doc, docx and other format files, and can program common styles of Word.
3、 Data analysis and visualization
- matplotlib-Matplotlib is a Python 2D plotting library that can generate various hard copy formats and cross-platform interactive environment data that can be used for publication quality. Matplotlib can be used for Python scripts, Python and IPython shells (such as MATLAB or Mathematica), web application servers and various graphical user interface toolkits. "
- numpy-NumPy is the basic package required for scientific computing using Python. Used to store and process large matrices, such as matrix operations, vector processing, N-dimensional data transformation, etc.
- pyecharts-Class library for generating Echarts charts
- pandas-a powerful tool set for analyzing structured data, based on numpy extension, provides a batch of standard data models and a large number of convenient functions and methods for processing data.
- Scipy: Matlab implementation based on Python, aiming to realize all the functions of matlab, adding numerous library functions commonly used in mathematics, science and engineering calculations on the basis of the numpy library.
- Plotly-The graphics library provided by Plotly can be used for online WEB interaction, and provide publication-quality graphics, support line graphs, scatter graphs, area graphs, bar graphs, error bars, block diagrams, histograms, heat maps, sub Graph, multi-axis, polar coordinate graph, bubble graph, rose graph, heat graph, funnel graph and many other graphs
- wordcloud-word cloud generator
- • ieba-Chinese word segmentation module
4、 WEB Development
- Django-an open source web application framework, written in Python. It is the most popular open source web application framework in the Python ecosystem. Django uses a model, template, and view writing mode, called MTV mode.
- Pyramid is a general-purpose, open source Python web application development framework. Its main purpose is to make it easier for Python developers to create web applications. Compared with Django, Pyramid is a relatively small, fast, and flexible open source Python web framework.
- Tornado-an open source version of web server software. Tornado is obviously different from current mainstream web server frameworks (including most Python frameworks): it is a non-blocking server, and it is quite fast
- Flask is a lightweight web application framework. Compared with Django and Pyramid, it is also called microframework. It is very convenient to use Flask to develop web applications, even a few lines of code can build a small website. The core of Flask is very simple and does not directly include abstract access layers such as database access, but is supported in the form of extension modules.
5、 Machine Learning
- NLTK-a third-party library for natural language processing, commonly used in the NLP field, can build word bag model (word count), support word frequency analysis (word occurrences), pattern recognition, association analysis, sentiment analysis (word frequency analysis + measurement Indicators), visualization (+matploylib for analysis graphs), etc.
- TensorFlow-Google's second-generation machine learning system is an open source software library that uses data flow graphs for numerical calculations.
- Keras-is a high-level neural network API, written in Python, capable of running on top of TensorFlow, CNTK or Theano. It aims to realize fast experiments and turn ideas into results with minimal delay, which is the key to conducting research.
- Caffe-a deep learning framework, mainly used for computer vision, it has a very good application effect on the classification of image recognition.
- theano- deep learning library. It is tightly integrated with Numpy, supports [GPU] (https://cloud.tencent.com/product/gpu?from=10680) calculation, unit testing, and self-verification. It is designed to perform operations on large-scale neural network algorithms in deep learning and is good at processing multi-dimensional arrays.
- Scikit-learn- is a simple and efficient data mining and data analysis tool, which is based on NumPy, SciPy and matplotlib. The basic functions of Scikit-learn mainly include 6 parts: classification, regression, clustering, data dimensionality reduction, model selection and data preprocessing. Scikit-learn is also called sklearn.
6、 Other commonly used
- IPython-a Python-based interactive shell, much easier to use than the default Python shell, supports variable auto-completion, auto-indentation, interactive help, magic commands, system commands, etc., and has many useful functions built-in And function
- PTVS-Visual Studio's Python tool
- pydub-supports multiple formats of sound files, can perform multiple signal processing, signal generation, sound effect registration, mute processing, etc.
- TimeSide-A Python framework capable of audio analysis, imaging, transcoding, streaming and label processing
- dnspython-DNS toolkit
- **pygame-**A module specially designed for video games
- PyQt5-pyqt5 is a Python third-party library of Qt5 application framework, an application interface for writing Python scripts
- PIL(Pillow)-PIL library is an important third-party library of Python language in image processing. It supports image storage, display and processing. It can handle almost all image formats and can complete image scaling and cropping. , Superimpose and add lines, images and text to the image.
- OpenCV-image and video work library
- Py2exe: Convert python scripts into executable programs that can be run independently on windows.
- WeRoBot is a WeChat Official Account Development Framework, also known as the WeChat Robot Framework. WeRoBot can parse the messages sent by the WeChat server and convert the messages into Message or Event types.