I'm preparing a dataset for ML and want to create features like "rain in the last 3 hours", "rain in the last 12 hours", "rain in the last 24 hours" and so on. I know how to use the shift function (like "temperature yesterday at same time"), but how do I sum the values from shift ranges (like shift-1 shift-2 shift-3) in a proper and efficient way.
e.g. lets use this simplified dataframe:
np.random.seed(3)
rng = pd.date_range('2021-01-01', periods=10, freq='D')
val = np.random.randint(10, size=(1,10)).ravel()
test = pd.DataFrame({ 'Date': rng, 'Val' : val})
test
Date Val
0 2021-01-01 8
1 2021-01-02 9
2 2021-01-03 3
3 2021-01-04 8
4 2021-01-05 8
5 2021-01-06 0
6 2021-01-07 5
7 2021-01-08 3
8 2021-01-09 9
9 2021-01-10 9
Now lets say I want to create a new column, where I sum up the Val
from the three days before (in finale data there will be multiple time ranges). The column last3d
would look like:
Date Val last3d
0 2021-01-01 8 NaN
1 2021-01-02 9 NaN
2 2021-01-03 3 NaN
3 2021-01-04 8 20.0
4 2021-01-05 8 20.0
5 2021-01-06 0 19.0
6 2021-01-07 5 16.0
7 2021-01-08 3 13.0
8 2021-01-09 9 8.0
9 2021-01-10 9 17.0
CodePudding user response:
test["last3d"]=test["Val"].rolling(3).sum()
test["last3d"]=test["last3d"].shift(1)
test
Date Val last3d
0 2021-01-01 8 NaN
1 2021-01-02 9 NaN
2 2021-01-03 3 NaN
3 2021-01-04 8 20.0
4 2021-01-05 8 20.0
5 2021-01-06 0 19.0
6 2021-01-07 5 16.0
7 2021-01-08 3 13.0
8 2021-01-09 9 8.0
9 2021-01-10 9 17.0
CodePudding user response:
This solves it in one line and returns the dataframe object with the column.
test.assign(last3d=lambda d: d["Val"].rolling(3).sum().shift())