In the process of analyzing massive data, we may need to process text-based data into numerical data, which is convenient to put into the model for use.
It may also be necessary to process numerical data in segments, such as woeization of variables. These operations can be processed with the apply function in python.
Today I will introduce the fourth lesson of data analysis, and teach you how to use the apply function to perform some more complicated operations on the data frame in python.
Contents of this article
Note: This article uses the data frame date_frame in the first lesson of data analysis [Python Data Analysis—Data Creation]:
** 1 Process character data into numeric data**
Suppose you want to replace the gender column in the original data frame, replace the "male" character with 1, and replace the "female" character with 0, and generate a new column.
First, you can customize a replacement function. The specific statement is as follows:
def replace_gender_to_num(val):if val=='male':return1else:return0
Then use the apply function to call the function, the specific statement is as follows:
date_frame.gender.apply(replace_gender_to_num)
The results are as follows:
So far, the "male" character in the original gender column is replaced with 1, and the "female" character is replaced with 0.
Add the column to the original data frame, the specific statement is as follows:
date_frame['new_gender']= date_frame.gender.apply(replace_gender_to_num)
The results are as follows:
It can be found that the gender is male, the corresponding value is 1 in new_gender, and the gender is female, the corresponding value is 0 in new_gender.
** 2 Process numerical data in segments**
In the modeling process, to convert the values of different segments into corresponding woe, you need to use the apply function for processing.
Similarly, in this example, it is assumed that students with height higher than 1.8 are regarded as the first category, students with height higher than 1.65 are regarded as the second category, and the remaining students are regarded as the third category.
You can define a conversion function as follows:
def height_to_class(val):if val>=1.8:return1
elif val>=1.65:return2else:return3
Then use the apply function to call the function and save the result value to the original data frame. The specific statement is as follows:
date_frame['heigth_class']= date_frame.height.apply(height_to_class)
The results are as follows:
So far, the application of the apply function in python for data processing has been introduced. You can practice it and think about whether the apply function has other uses.
Recommended Posts