I have such a dataframe:
DATE | VALUE
23/07/21 01 | 20
23/07/21 02 | 10
23/07/21 03 | 15
23/07/21 04 | 200
What I want to get is new dataframe which sums up the entries of two consecutive rows of column VALUE
and keeps the entry of the second row for column DATE
.
It should look like this one:
DATE | VALUE
23/07/21 02 | 30
23/07/21 04 | 215
How can I do this?
CodePudding user response:
Try:
df.groupby(np.arange(len(df))//2).agg({'DATE':'last', 'VALUE':'sum'})
CodePudding user response:
You can try with rolling
:
>>> df.set_index("DATE").rolling(2).sum().iloc[1::2].reset_index()
DATE VALUE
0 2021-07-21 02 30.0
1 2021-07-21 04 115.0
This becomes considerably faster with a larger DataFrame:
df = pd.concat([df]*1000)
>>> %timeit df.set_index("DATE").rolling(2).sum().iloc[1::2].reset_index()
1.33 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.groupby(np.arange(len(df))//2).agg({'DATE':'last', 'VALUE':'sum'})
2.15 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)