Home > Blockchain >  Pandas create new column with sum from last x days
Pandas create new column with sum from last x days

Time:12-22

I'm preparing a dataset for ML and want to create features like "rain in the last 3 hours", "rain in the last 12 hours", "rain in the last 24 hours" and so on. I know how to use the shift function (like "temperature yesterday at same time"), but how do I sum the values from shift ranges (like shift-1 shift-2 shift-3) in a proper and efficient way.

e.g. lets use this simplified dataframe:

np.random.seed(3)
rng = pd.date_range('2021-01-01', periods=10, freq='D')
val = np.random.randint(10, size=(1,10)).ravel()
test = pd.DataFrame({ 'Date': rng, 'Val' : val}) 

test

    Date    Val
0   2021-01-01  8
1   2021-01-02  9
2   2021-01-03  3
3   2021-01-04  8
4   2021-01-05  8
5   2021-01-06  0
6   2021-01-07  5
7   2021-01-08  3
8   2021-01-09  9
9   2021-01-10  9

Now lets say I want to create a new column, where I sum up the Val from the three days before (in finale data there will be multiple time ranges). The column last3d would look like:

    Date    Val     last3d
0   2021-01-01  8   NaN
1   2021-01-02  9   NaN
2   2021-01-03  3   NaN
3   2021-01-04  8   20.0
4   2021-01-05  8   20.0
5   2021-01-06  0   19.0
6   2021-01-07  5   16.0
7   2021-01-08  3   13.0
8   2021-01-09  9   8.0
9   2021-01-10  9   17.0

CodePudding user response:

test["last3d"]=test["Val"].rolling(3).sum()
test["last3d"]=test["last3d"].shift(1)


test

    Date       Val  last3d
0   2021-01-01  8   NaN
1   2021-01-02  9   NaN
2   2021-01-03  3   NaN
3   2021-01-04  8   20.0
4   2021-01-05  8   20.0
5   2021-01-06  0   19.0
6   2021-01-07  5   16.0
7   2021-01-08  3   13.0
8   2021-01-09  9   8.0
9   2021-01-10  9   17.0

CodePudding user response:

This solves it in one line and returns the dataframe object with the column.

test.assign(last3d=lambda d: d["Val"].rolling(3).sum().shift())
  • Related