Python data analysis

Python data analysis

If you want to do good things, you must first sharpen your tools.” Python is by far the most commonly used programming language for data analysis. We can stand on the shoulders of giants and complete data analysis efficiently.

Let's first understand the development history of Python. Python language was born in the 1980s. It was developed by the Dutchman Guido van Rossum. We call Guido van Rossum the father of Python. It is worth mentioning the origin of the name Python. Python means python, but Guido's name has nothing to do with python. When Guido was implementing Python, he also read the script of Monty Python's Flying Circus, which is from a BBC comedy from the 1970s. Guido thought he needed a short, unique and slightly mysterious name, so he decided to call the language Python.

Python 1.0 version was released in January 1994. The main new features of this version are lambda, map, filter and reduce, but Guido doesn't like this version.

Six and a half years later, in October 2000, Python 2.0 was released. The main new features of this version are memory management and cycle detection garbage collector and support for Unicode. However, the most important change is the change in the development process, and Python now has a more transparent community.

2008 In December of the year, Python3.0 was released. Python3.x is not backward compatible with Python2.x, which means that Python3.x may not be able to run Python2.x code. Python3 represents the future of the Python language.

Today's Python has entered the 3,0 era, and the Python community is also booming. When you ask a Python problem, almost always someone has encountered the same problem and has already solved it.

Features of Python language:

Python is a completely object-oriented language. Functions, modules, numbers, and strings are all objects. In Python, everything is an object. Supports overloaded operators and also supports generic design.

Python has a powerful standard library. The core of the Python language only contains common types and functions such as numbers, strings, lists, dictionaries, and files. The Python standard library provides system management, network communication, text processing, database interfaces, and graphics. System, XML processing and other additional functions.

The Python community provides a large number of third-party modules, which are used in a similar way to the standard library. Their functions cover scientific computing, artificial intelligence, machine learning, Web development, database interfaces, and graphics systems.

Because Python has powerful functions, it is easy to use and easy to get started. We often hear people say "Life is short, I use Python". Research organization Tiobe released an analysis report for October 2020 this week, and the Python language ranked third for two consecutive years. In the latest data for November 2020, Python surpassed Java to become the second place with an unstoppable trend.

It is particularly important to choose a suitable programming language. The Python language is simple, easy to learn, fast, free and open source. It focuses on how to solve problems, a free and open community environment, and a wealth of third-party libraries. There is no need to waste time making wheels: various Web frameworks, crawler frameworks, data analysis frameworks, machine learning frameworks have everything, ready to use. From the perspective of Python’s popularity, it has been on the rise

We are now going to use Python to do data analysis, mainly from two aspects to consider the problem:

First: what development tool to choose.

Second: What knowledge to learn to solve the problem of data analysis.

I recommend Anaconda as a development tool. The specific software can be downloaded from the open source mirror website of Tsinghua University (https://mirror.tuna.tsinghua.edu.cn/help/anaconda/) and download the corresponding version of the installation package according to the software and hardware environment of the computer you are using. After the installation is complete, enter jupyter notebook in the console.

There is a detailed anaconda installation process in this official account. The article link is as follows:

Anaconda installation process
Brother Dabin, public account: Data Valley [Python Anaconda Installation] (http://mp.weixin.qq.com/s?__biz=MzUxMTAzOTkzMQ==&mid=2247484820&idx=2&sn=66db50025112ee7401a3fd8545ea2ca9&chksm=f9788540ce0f0c562938d717e4442f3890f9714273295723ed334a95ece8b9ade33fc7ea5d76#rd)

Knowledge points in Python and common scientific computing libraries used in data analysis also need to be listed for everyone:

Basic syntax: variables, data types, conditions, loops.

Data structure: collection, tuple, dictionary.

Input and output

Module

class

Scientific computing libraries: NumPy, Pandas, Matplotlib, Seaborn.

Python does data analysis mainly to solve the problems of data cleaning and data visualization. It is very important to master the basic grammatical rules of Python and call third-party modules to improve data analysis capabilities. NumPy and Pandas are the best tools for data cleaning, and Matplotlib and Seaborn are toolkits for data visualization. We can learn Python from a practical perspective to improve the ability and efficiency of data analysis.

Recommended Posts

Python data analysis
Python3 crawler data cleaning analysis
Python Data Science: Related Analysis
02. Python data types
Python data model
python data structure
Python data format-CSV
Automatically generate data analysis report with Python
Java or Python for big data analysis
Python data analysis-data update
Python linear interpolation analysis
Python data analysis-apply function
Python data analysis-data selection
Python basic data types
Python data analysis-data establishment
Python analysis of wav files
Python Data Science: Neural Networks
Python common data structure collation
Python parses simple XML data
Python Data Science: Logistic Regression
Analysis of Python Sandbox Escape
Python data structure and algorithm
Analysis of Python object-oriented programming
Python Data Science: Regularization Methods
Python Data Science: Linear Regression
Python Faker data forgery module
Python Data Science: Chi-Square Test
Analysis of usage examples of Python yield
Python Data Science: Linear Regression Diagnosis
Python uses Matlab command process analysis
FM algorithm analysis and Python implementation
Python file operation basic process analysis
Analysis of Python conditional control statements
Python realizes online microblog data visualization
Is python suitable for data mining
​Full analysis of Python module knowledge
Python tcp transmission code example analysis
Python requests module cookie instance analysis
Python multithreading
Python CookBook
Python FAQ
Python3 dictionary
Python process and thread summary case analysis
Python crawler advanced essential | a live data analysis platform encryption parameter sign analysis
Python3 module
python (you-get)
Python string
Python access to npy format data examples
Python descriptor
Python basics 2
Python notes
Detailed analysis of Python garbage collection mechanism
Python3 tuple
CentOS + Python3.6+
Python advanced (1)
Python decorator
Python IO
Python multithreading
Python toolchain
Python3 list
Python multitasking-coroutine