Compare Excel, learn Python window functions

Total Article 245/Zhang Junhong

Students who are familiar with Sql should have heard of the window function in Sql, and feel that if you master the window function, you can say that you are proficient in Sql, and there are similar window functions in Python.

Let’s take a look at an example. The following is a table of sales per day on a certain platform over a period of time. Those who are familiar with the business should have common sense, that is, the data between the sky and the sky are often not directly comparable, such as electricity The order volume on Monday and Sunday on the trading platform. Day to day is not comparable, but most situations are comparable between week and week, because 7 days a week basically covers all business situations.

The longer time between week and week involves how the week should be calculated. If it is calculated according to the natural week, a year will be about 52 weeks. For the so-called big data era, the data is definitely a bit small, so what should be done? Can it not only cover a relatively complete business scenario, but also increase data points? One way is to slide to add and slide to add for 7 days. For example, this is 7 days from Tuesday to next Monday, and covers a relatively complete business scenario, and for example, this Friday to next Thursday is also 7 days, and it also covers More comprehensive business scenarios.

We now have data on a daily basis, how can we achieve 7-day sliding summation? In fact, it is relatively simple in Excel. You can directly write the formula for the addition of the first 7 lines of the formula in the 7th line, and then fill the formula down to complete the 7-day sliding addition. As shown below:

If we want to implement this 7-day sliding sum function in Python, we can directly use the rolling function. One of the more important parameters in the rolling function is window, which is used to indicate the sliding days. The specific implementation code is as follows:

import pandas as pd
df = pd.read_csv(r''python_rolling.csv')
df.rolling(window =7).sum()

Running the above code will get the following results:

You can see that the results are completely consistent with the results obtained in Excel.

If you only perform rolling on df, df will only slide the specified number of days, but will not do any calculation on the number of days after the slide, and then call the sum function on the sliding data to perform the sum operation on the sliding data. After rolling df, you can perform other operations besides summation. The more commonly used operations are:

Count: count()
Find the mean: men()
Find the best value: min(), max()
Find the variance: var()
Find the standard deviation: std()

In addition to these commonly used summary operations, you can also use custom functions, such as summing the values of sliding for 7 days and then adding 1, which can be achieved in the following ways:

import numpy as np
def div(x):return np.sum(x)+1
 df.rolling(window =7).agg(div)

Run the above code to get the following results:

As you can see, the first accumulated value is 453, which adds 1 to 452.

Customize a function first, and then call the function through agg to realize the function of the custom function.

The above is a basic introduction to the usage of Python sliding window function.

Recommended Posts

Compare Excel, learn Python window functions
Python functions
Python beginners learn decorators
Learn about Python3 coroutine
python excel multi-line merge
Python implements udp chat window
How to learn python quickly
python functions, classes, modules, packages