Use of Pandas in Python development

1. Introduction

Two, create Pandas Series

You can use the pd.Series(data, index) command to create a Pandas Series, where data represents the input data and index is the index of the corresponding data. In addition, we can also add the parameter dtype to set the data type of the column.

python

import pandas as pd #Conventional abbreviation
pd.Series(data =[30,6,7,5], index =['eggs','apples','milk','bread'],dtype=float)

out:
eggs      30.0
apples     6.0
milk       7.0
bread      5.0
dtype: float64

In addition to entering a list, data can also be entered in a dictionary, or directly a scalar.

python

pd.Series(data={'name':'michong','age':18})

out:
name    michong
age          18
dtype: object

Three, access and delete elements in the Series

1、 access#####

One is similar to accessing data from a list by index, and the other is similar to accessing value from a dictionary by key.

python

s = pd.Series(data=8,index=['apple','milk','bread'])

s[0]
out:8

s['apple']
out:8
    
s.loc['apple']
s.iloc[1]
2、 modify#####

Remember to re-assign after modification

3、 delete#####

python

s.drop(['apple'])
out:
 milk     8
 bread    8
 dtype: int64

. The drop() function does not modify the original data. If you want to modify the original data, you can choose to add the parameter inplace = True or replace s = s.drop(label) with the original data

python

s.drop(['apple'],inplace=True)

Fourth, the use of DataFrame

1、 Create DataFrame

pd.DataFrame(data, index, columns)

python

data is data, you can enter an ndarray, or a dictionary (the dictionary can contain Series or arrays), or a DataFrame;

index is the index, enter the list, if this parameter is not set, it will count down from 0 by default;

columns is the name of the column, enter the list, if this parameter is not set, it will start counting from 0 to the right by default;

Code

d =[[1,2],[3,4]]
df = pd.DataFrame(data=d,index=['a','b'],columns=['one','two'])
df

out:
 	one	two
 a	12
 b	34
2、 Access elements in DataFrame

df.loc['a'] df.iloc[0] out: one 1 two 2 Name: a, dtype: int64

python

df.loc[['a','b']]
df.iloc[[0,1]]

out:
 	one	two
 a	12
 b	34

python

df.one
df['one']
df.iloc[:,0]

out:
 a    1
 b    3
 Name: one, dtype: int64

python

df[['one','two']]
df.iloc[:,0:2] #0-2,Does not contain 2, which is the third column

out:
 	one	two
 a	12
 b	34

python

df.iloc[0,1]    #Visit rows first, then columns
df['two']['a']  #Visit columns first, then rows

out:2
3、 Delete, add elements#####

Use the .drop function to delete elements, the default is to delete rows, add parameter axis = 1 to delete columns.

python

df.drop(['a'])
 out:
  one	two
 b	34

python

df.drop('one',axis=1)

out:
 	two
 a	2
 b	4

== It is worth noting that the drop function will not modify the original data. If you want to modify the original data directly, you can choose to add the parameter inplace = True or reassign and replace with the original variable name. ==

python

df.insert(2,'T',8) #Create a new column, the column name is T

out:
 	one	two	T
 a	128
 b	348
    

df.insert(2,'F',[9,10]) #Set the value of each row under column F
out:
  one	two	F	T
 a	1298
 b	34108

Code

data2 = pd.DataFrame([[8,9,10,11],[6,7,8,9]],
      columns=['one','two','F','T'],index=['c','d'])
df.append(data2,ignore_index=True)

out:
 	one	two	F	T
    01298134108289101136789
4、 Rename#####

df.rename(columns=('one':'first column')) out: first column two F T a 1 2 9 8 b 3 4 10 8

python

df.rename(index={'a':'first row'})
out:
   	one	two	   F   T
 The first row 1298
 b	    	34108
5、 Change index

Code

You can use the function set_index(index_label), Set the index of the data set to index_label。

In addition, you can also use the function reset_index()Reset the index of the data set to 0 and start counting.
6、 Missing value (NaN) processing

You can use the isnull() and notnull() functions to check whether there is missing data in the data set. Add the sum() function after the function to count the missing data. In addition, you can also use the count() function to count non-NaN data.

== Do not modify the original data ==

python

df.fillna(0)
out:01 	  F 	T	 one    two
a	0.00.09.08.01.02.0
b	0.00.010.08.03.04.005.06.00.00.00.00.0

Code

Use fillna()The function can replace NaN with a certain value. The parameters are as follows:
 value: the value used to replace NaN
    
 Method: There are two commonly used, one is ffill forward filling, the other is backfill backward filling
    
 axis: 0 is row, 1 is column
    
 inplace: whether to replace the original data, the default is False
    
 limit: accept int type input, you can limit the number of NaN before the replacement

Five, data analysis process and Pandas application

1、 open a file#####

python

# Open csv file
pd.read_csv('filename')
# Open excel file
pd.read_excel('filename')
# Tsv file handling Chinese characters
pd.read_csv('filename',sep ='\t',encoding ='utf-8')
2、 View data

python

# View the first five lines
df.head()
# View the last five lines
df.tail()
# View a random line
df.sample()
3、 View data information

python

# View the number of rows and columns in the data set
df.shape
# View data set information (column name, data type, data volume of each column-you can see the data missing)
df.info()
# View basic statistics of the data set
df.describe()
# View data set column names
df.columns
# View the missing data of the dataset
df.isnull().sum()
# View missing column data
df[df['col_name'].isnull()]
# View data set data duplication
sum(df.duplicated())
# View duplicate data
df[df.duplicated()]
# View the classification statistics of a column
df['col_name'].value_counts()
# View the unique value of a column
df['col_name'].unique()
# View the number of unique values in a column
df['col_name'].nunique()
# Sort the data set by a column
df.sort_values(by ='col_name',ascending = False)#False means from large to small
4、 Data Filter#####

python

# Fetch a row
df.iloc[row_index]
df.loc['row_name']
# Extract certain lines
df.iloc[row_index_1:row_index_2]
# Extract a column
df['col_name']
# Extract certain columns
df[['col_name_1','col_name_2']]
# Extract the value of a row and column
df.iloc[row_index,col_index]
df.loc['row_name','col_name']
# Filter data that meets a certain condition in a column
df[df['col_name']== value]#Data equal to a certain value, similarly satisfying all comparison operators
df.query('col_name == value')#Same code effect
df[(df['col_name_1']>= value_1)&(df['col_name_2']!= value_2)]#versus&,or|
df.query('(col_name_1 >= value_lower) & (col_name_2 <= value_upper)')
df.groupby('col_name').groups #Press col_name column for grouping and clustering
5、 Data cleaning

python

# Delete a row
df.drop(['row_name'],inplace = True)#If add inplace=True, the modified data will overwrite the original data
# Delete a column
df.drop(['col_name'],axis =1)
# Treatment of missing values
df.fillna(mean_value)#Replace missing values
df.dropna()#Delete rows containing missing values
df.dropna(axis =1, how ='all')#Only delete all columns with missing data
# Remove duplicate values
drop_duplicates(inplace = True)
# Change a line/Column/Location data
Replace and modify directly with iloc or loc
# Change data type
df['datetime_col']= pd.to_datetime(df['datetime_col'])
df['col_name'].astype(str)#Can also be int/float...
# Change column name
df.rename(columns={'A':'a','C':'c'}, inplace = True)
# apply function
# Talk about function application in col_name column, this method is much faster than using a for loop
df['col_name'].apply(function)

Recommended Posts

Use of Pandas in Python development
Use of numpy in Python development
Subscripts of tuples in Python
Use of Anaconda in Ubuntu
Installation and use of GDAL in Python under Ubuntu
The usage of wheel in python
Summary of logarithm method in Python
Use nohup command instructions in python
Detailed use of nmcli in CentOS8
What is the use of Python
Detailed usage of dictionary in Python
Usage of os package in python
Learning path of python crawler development
The usage of tuples in python
How to use SQLite in Python
Description of in parameterization in python mysql
Common exceptions and solutions in the use and development of Ubuntu system
Understanding the meaning of rb in python
How to use and and or in Python
Implementation of JWT user authentication in python
The usage of Ajax in Python3 crawler
Analysis of glob in python standard library
Method of installing django module in python
What are web development frameworks in python
What is the prospect of python development
Knowledge points of shell execution in python
Functions in python
7 features of Python3.9
Detailed explanation of the use of pip in Python | summary of third-party library installation
Python crawler-beautifulsoup use
What is the function of adb in python
How to use the round function in python
How to use the zip function in Python
Python realizes the development of student management system
Python implements the shuffling of the cards in Doudizhu
The meaning and usage of lists in python
How to use the format function in python
How to use code running assistant in python
The consequences of uninstalling python in ubuntu, very
Introduction to the use of Hanlp in ubuntu
Example of feature extraction operation implemented in Python
Installation and use of SSH in Ubuntu environment
03. Operators in Python entry
Join function in Python
12. Network Programming in Python3
print statement in python
Detailed explanation of the implementation steps of Python interface development
python development [first article]
Use supervisor in ubuntu
How to understand the introduction of packages in Python
Basics of Python syntax
Python3 external module use
How to understand a list of numbers in python
Python|The use of operators
Concurrent requests in Python
Basic syntax of Python
Basic knowledge of Python (1)
Install python in Ubuntu
Example of how to automatically download pictures in python
Context management in Python
Prettytable module of python