I have the following example dataframe:
import pandas as pd
d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}
df = pd.DataFrame(data=d)
print(df)
Output:
col1
0 2
1 5
2 6
3 5
4 4
5 6
6 7
7 8
8 9
9 7
10 5
I need to calculate the slope of the previous N rows from col1 and save the slope value in a separate column (call it slope). The desired output may look like the following: (Given slope values below are just random numbers for the sake of example.)
col1 slope
0 2
1 5
2 6
3 5
4 4 3
5 6 4
6 7 5
7 8 2
8 9 4
9 7 6
10 5 5
So, in the row with the index number 4, the slope is 3 and it is the slope of [2, 5, 6, 5, 4].
Is there an elegant way of doing it without using for loop?
CodePudding user response:
You can use rolling
apply
and scipy.stats.linregress
:
from scipy.stats import linregress
df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])
print(df)
output:
col1 slope
0 2 NaN
1 5 NaN
2 6 NaN
3 5 NaN
4 4 0.4
5 6 0.0
6 7 0.3
7 8 0.9
8 9 1.2
9 7 0.4
10 5 -0.5
CodePudding user response:
Let us do with numpy
def slope_numpy(x,y):
fit = np.polyfit(x, y, 1)
return np.poly1d(fit)[0]
df.col1.rolling(5).apply(lambda x : slope_numpy(range(5),x))
0 NaN
1 NaN
2 NaN
3 NaN
4 3.6
5 5.2
6 5.0
7 4.2
8 4.4
9 6.6
10 8.2
Name: col1, dtype: float64