Python data analysis-data selection

Due to the rapid development of the Internet, more and more data and information are stored on the Internet. Through the analysis of these data, major companies can obtain some information that is helpful for decision-making.

For example, by analyzing the Taobao browsing record data of some users, it is possible to discover the potential consumption points of these customers, and to place advertisements at designated points by classification to increase the sales of goods.

Another example is the credit field. By analyzing the applicant's credit data, modeling and calculating the likelihood of the applicant being overdue, decide whether to lend, thereby increasing the use value of company funds.

Today, when data analysis is becoming more and more popular, learning to analyze data is an important weight for your promotion and salary increase.

This article is the second lesson of data analysis, teaching you how to select data in python.

Contents of this article

  1. Select a column in the data frame

  2. Select multiple columns in the data frame

  3. Select a row in the data frame

  4. Select multiple rows in a data frame

  5. Select sub data frame

  6. Select conditional data frame

Note: This article uses the data frame date_frame in the first lesson of data analysis [Python data analysis—data creation]:

** 1 Select a column in the data frame**

There are four ways to select a column of the data frame.

The first method: the name of the data frame. Column name.

The second method: the name of the data frame ['column name'].

The third method: the name of the data frame.iloc[:, column subscript], suppose you want to select the third column, its corresponding subscript is 2, and the two are minus one.

The fourth method: the name of the data frame.loc[:, ['column name']]

If I need to select the name column (second column) in the date_frame data frame, I can run the following statement in jupyter:

date_frame.name       #method one
date_frame['name']    #Method Two
date_frame.iloc[:,1]  #Method Three
date_frame.loc[:,['name']]  #Method Four

The results of the first three methods are as follows:

The result of the fourth method is as follows:

Note: The data type obtained by the first three methods is Series, and the data type obtained by the fourth method is DataFrame.

** 2 Select a few columns in the data frame**

If you need to select multiple columns in the data frame, you can use the following three methods:

The first method: the name of the data frame [['column name 1','column name 2',...,'column name n']].

The second method: the name of the data frame.loc[:, ['column name 1','column name 2',...,'column name n']].

The third method: the name of the data frame. iloc[:, start column subscript: end column subscript plus one].

If I need to select the name column and height column in the date_frame data frame, I can run the following statement in jupyter:

date_frame[['height','name']]         #method one
date_frame.loc[:,['height','name']]  #Method Two
date_frame.iloc[:,[1,4]]  #Method Three

The results obtained by the first two methods are as follows:

It can be found that the first two methods select certain columns, and you can filter out the order of the original data frame, and you can customize the order.

The results of the third method are as follows:

Note: In python, the corresponding subscript and the current position are minus one, and the 1:3 in date_frame.iloc[:, 1:3] cannot get the subscript 3, which includes the head but not the tail.

** 3 Select a row in the data frame**

If you need to select a row in the data frame, you can use the following three methods:

The first method: the name of the data frame [row subscript: row subscript plus one].

The second method: the name of the data frame. loc[row subscript, :].

The third method: the name of the data frame. iloc[row subscript, :].

If I need to select the first row of the date_frame data frame (the corresponding row subscript is 0), I can enter the following code in python:

date_frame[0:1]      #the first method
date_frame.loc[1,:]  #The second method
date_frame.iloc[1,:] #The third method

The first method gets a data frame:

The last two methods get a Series, and the specific results are as follows:

** 4 Select a few rows in the data frame**

If you need to select a few rows in the data frame, you can use the following three methods:

The first method: the name of the data frame [start row subscript: end row subscript plus one].

The second method: the name of the data frame. iloc[starting row subscript: ending row subscript plus one, :].

The third method: the name of the data frame. loc[start row subscript: end row subscript, :].

If I need to select the data from the first row to the second row in the date_frame data frame (the corresponding row subscript is 0:1), I can enter the following code in python:

date_frame[0:2]          #the first method
date_frame.iloc[0:2,:]   #The second method
date_frame.loc[0:1,:]    #The third method

The results obtained by the three methods are as follows:

It should be noted that in the third method, the subscript of the line selection contains the head and the tail. You can run the code on your computer to experience this.

** 5 Select a sub data frame**

We have previously selected certain rows and certain columns separately. If we want to select sub-data frames with row subscripts 1 and 2, and column subscripts 1 and 2 (the green part in the figure), what should we do?

You can superimpose the code for selecting the row data frame and the column data frame (in no particular order) to select.

For example, use the following code:

date_frame[1:3][['name','gender']]

The results are as follows:

** 6 Select conditional data frame**

Suppose we want to select the sub data frame whose gender is male in date_frame, we can enter the following code in jupyter:

date_frame[date_frame.gender=='male']

The results are as follows:

Suppose we want to select student information in date_frame whose age is greater than 17 years old and whose height is greater than 170, we can enter the following code in jupyter:

date_frame[(date_frame.height>1.7)&(date_frame.age>17)]

The results are as follows:

At this point, the basic operation of data selection in python has been completed. You can practice it and think about whether there is any better way to select data.

Recommended Posts

Python data analysis-data selection
Python data analysis-data update
Python data analysis-data establishment
02. Python data types
Python data model
Python data analysis
python data structure
Python data format-CSV
Python data analysis-apply function
Python basic data types
Python basic data types
Python Data Science: Neural Networks
Python common data structure collation
Python3 crawler data cleaning analysis
Python parses simple XML data
Python Data Science: Logistic Regression
Python data structure and algorithm
Python Data Science: Regularization Methods
Python Data Science: Related Analysis
Python Data Science: Linear Regression
Python Faker data forgery module
Python Data Science: Chi-Square Test
Python Data Science: Linear Regression Diagnosis
Python realizes online microblog data visualization
Is python suitable for data mining
Automatically generate data analysis report with Python
Python access to npy format data examples
Java or Python for big data analysis
Python uses pandas to process Excel data