Home > Mobile >  How to count entries of two consecutive rows and keep only second row in a new dataframe?
How to count entries of two consecutive rows and keep only second row in a new dataframe?

Time:10-28

I have such a dataframe:

DATE |  VALUE
23/07/21 01 | 20
23/07/21 02 | 10
23/07/21 03 | 15
23/07/21 04 | 200

What I want to get is new dataframe which sums up the entries of two consecutive rows of column VALUE and keeps the entry of the second row for column DATE.

It should look like this one:

DATE |  VALUE
23/07/21 02 |   30
23/07/21 04 |   215

How can I do this?

CodePudding user response:

Try:

df.groupby(np.arange(len(df))//2).agg({'DATE':'last', 'VALUE':'sum'})

CodePudding user response:

You can try with rolling:

>>> df.set_index("DATE").rolling(2).sum().iloc[1::2].reset_index()

            DATE  VALUE
0  2021-07-21 02   30.0
1  2021-07-21 04  115.0

This becomes considerably faster with a larger DataFrame:

df = pd.concat([df]*1000)

>>> %timeit df.set_index("DATE").rolling(2).sum().iloc[1::2].reset_index()
1.33 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit df.groupby(np.arange(len(df))//2).agg({'DATE':'last', 'VALUE':'sum'})
2.15 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • Related