Compared to pandas, numpy does not have a very direct rolling method, but numpy has a trick that allows NumPy to execute this loop inside C code.
This is achieved by adding an extra size that is the same as the window size and an appropriate stride.
import numpy as np
data = np.arange(20)
def rolling_window(a, window):
shape = a.shape[:-1]+(a.shape[-1]- window +1, window)
strides = a.strides +(a.strides[-1],)return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)rolling_window(data,10)
Out[12]:array([[0,1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9,10],[2,3,4,5,6,7,8,9,10,11],[3,4,5,6,7,8,9,10,11,12],[4,5,6,7,8,9,10,11,12,13],[5,6,7,8,9,10,11,12,13,14],[6,7,8,9,10,11,12,13,14,15],[7,8,9,10,11,12,13,14,15,16],[8,9,10,11,12,13,14,15,16,17],[9,10,11,12,13,14,15,16,17,18],[10,11,12,13,14,15,16,17,18,19]])
np.mean(rolling_window(data,10))
Out[13]:9.5
np.mean(rolling_window(data,10),-1)
Out[14]:array([4.5,5.5,6.5,7.5,8.5,9.5,10.5,11.5,12.5,13.5,14.5])
Supplementary knowledge: rolling window rolling function and expanded window expanding function in pandas
In data analysis, especially when analyzing time series data, it is often necessary to perform rolling calculation and analysis on a sequence with a fixed length window, such as calculating a moving average. As long as a new sequence needs to be obtained according to a sequence, window scrolling is often required. In pandas, both DataFrame and Seies have a function for rolling windows called rolling(). The specific parameters are: DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
The parameter window can be a positive integer or an offset (it can be considered as the length of the time interval), and the window length is set by this parameter; min_periods represents the minimum observation value required in the window, if the number of members in the window is less than this setting The value of this window will return NaN after calculation. For example, if min_periods is set to 3, but there are only two members in the current window, then the corresponding position of the window will return a null value; if the center parameter is set to True , Means that when taking the window coverage interval, take the current label as the center and take it to both sides. If it is False, it means take the current label as the rightmost of the window and take it to the left. The default is False. Note that, When True, if the length of the window is odd, the center position is well determined, which is the middle position, but if the length is even, the default center position is the middle right position; the win_type parameter indicates different window types , You can assign different weights to window members through this parameter, the default is equal weight; the on parameter indicates that a certain column is specified for rolling instead of the default for index. It should be noted that when the on parameter is specified, the specified The column must be a time series, otherwise the rolling function will fail.
Let's look at a simple example. In the following example, when the window length is 3 and min_periods is set to 2, it can be seen that the first element in the result is NaN, because the first window has only one value 1, and since min_periods is 2, it must contain at least two numbers. OK, so the first value is a null value, and there is a non-null value starting from the second element. This is the meaning of the min_periods parameter. When the center is set to True, if the window length is an even number 4, for example, for a window [a,b,c,d], the center value is the position to the right of the center, which is c, so the first window covers The elements are 1 and 2, so the sum is 3, as shown below.
import pandas as pd
import numpy as np
df=pd.DataFrame([1,2,3,5],columns=['a'])
df
a
01122335
df.rolling(3,min_periods=2).sum()
a
0 NaN13.026.0310.0
df.rolling(4,min_periods=2,center=True).sum()
a
03.016.0211.0310.0
The rolling function returns a window object or rolling subclass. The value of the returned window can be calculated by calling the object's mean(), sum(), std(), count() and other functions, and the object's apply(func ) Function, calculate the specific value of the window through a custom function, see the document for details.
It can be seen from the above that the rolling window can take values forward and both sides, but there is no backward value. In fact, you only need to arrange the original sequence in reverse order and then take the value forward to achieve the backward value. Let's talk about the expanding function again, which is DataFrame.expanding(min_periods=1, center=False, axis=0). The meaning of the parameter is the same as rolling, except that it is not a fixed window length, and its length is constantly expanding.
The above example of implementing rolling with python numpy is all the content shared by the editor. I hope to give you a reference.
Recommended Posts