How to calculate slope of Pandas dataframe column based on previous N rows-CodePudding

I have the following example dataframe:

import pandas as pd

d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}

df = pd.DataFrame(data=d)
print(df)

Output:

I need to calculate the slope of the previous N rows from col1 and save the slope value in a separate column (call it slope). The desired output may look like the following: (Given slope values below are just random numbers for the sake of example.)

       col1  slope
0      2
1      5
2      6
3      5
4      4     3
5      6     4
6      7     5
7      8     2
8      9     4
9      7     6
10     5     5

So, in the row with the index number 4, the slope is 3 and it is the slope of [2, 5, 6, 5, 4].

Is there an elegant way of doing it without using for loop?

CodePudding user response：

You can use rolling apply and scipy.stats.linregress:

from scipy.stats import linregress

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])

print(df)

output:

    col1  slope
0      2    NaN
1      5    NaN
2      6    NaN
3      5    NaN
4      4    0.4
5      6    0.0
6      7    0.3
7      8    0.9
8      9    1.2
9      7    0.4
10     5   -0.5

CodePudding user response：

Let us do with numpy

def slope_numpy(x,y):
    fit = np.polyfit(x, y, 1)
    return np.poly1d(fit)[0]
df.col1.rolling(5).apply(lambda x : slope_numpy(range(5),x))
0     NaN
1     NaN
2     NaN
3     NaN
4     3.6
5     5.2
6     5.0
7     4.2
8     4.4
9     6.6
10    8.2
Name: col1, dtype: float64