How do you groupby with pandas that sums values in the same column but is offset by a set number of-CodePudding

I have a table that looks like the table below. I want to group by id, start_time, and approach so I can add the right, thru, left, and u-turn for each similar timestamp.

intersection_id	start_time	approach	movement	volume
799028	12:00:00 AM	Southbound	Right	2
799028	12:15:00 AM	Southbound	Right	4
799028	12:30:00 AM	Southbound	Right	6
799028	12:00:00 AM	Southbound	Thru	4
799028	12:15:00 AM	Southbound	Thru	6
799028	12:30:00 AM	Southbound	Thru	8
799028	12:00:00 AM	Southbound	Left	6
799028	12:15:00 AM	Southbound	Left	8
799028	12:30:00 AM	Southbound	Left	10
799028	12:00:00 AM	Southbound	U-turn	10
799028	12:15:00 AM	Southbound	U-turn	12
799028	12:30:00 AM	Southbound	U-turn	14

Example results:

intersection_id	start_time	approach	movement	volume
799028	12:00:00 AM	Southbound	Right	24
799028	12:15:00 AM	Southbound	Right	30
799028	12:30:00 AM	Southbound	Right	38
799028	12:00:00 AM	Southbound	Thru	nan
799028	12:15:00 AM	Southbound	Thru	nan
799028	12:30:00 AM	Southbound	Thru	nan
799028	12:00:00 AM	Southbound	Left	nan
799028	12:15:00 AM	Southbound	Left	nan
799028	12:30:00 AM	Southbound	Left	nan
799028	12:00:00 AM	Southbound	U-turn	nan
799028	12:15:00 AM	Southbound	U-turn	nan
799028	12:30:00 AM	Southbound	U-turn	nan

This will repeat itself until it goes through all the IDs and approaches.

I have tried a few different ways:

df['app_sum'] = df.groupby(['intersection_id','start_time','approach'], as_index=False)['volume'].transform('sum')

However, this code will not group correctly and will not provide nan values; it will repeat the values once it gets through the initial set of timestamps.

The second code I tried was

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
df['app_sum'] = (
    df.groupby(['intersection_id','start_time','approach'])['volume'].rolling(window=indexer).sum().droplevel(0))

I know this is wrong because there is no window.

Any suggestions?

CodePudding user response：

You could use:

df['volume'] = (df.groupby(['intersection_id', 'start_time', 'approach'])
                  ['volume'].transform('sum')
                  .where(df['movement'].eq('Right'))
                )

output:

    intersection_id    start_time     approach movement  volume
0            799028  12:00:00 AM   Southbound    Right     22.0
1            799028  12:15:00 AM   Southbound    Right     30.0
2            799028  12:30:00 AM   Southbound    Right     38.0
3            799028  12:00:00 AM   Southbound     Thru      NaN
4            799028  12:15:00 AM   Southbound     Thru      NaN
5            799028  12:30:00 AM   Southbound     Thru      NaN
6            799028  12:00:00 AM   Southbound     Left      NaN
7            799028  12:15:00 AM   Southbound     Left      NaN
8            799028  12:30:00 AM   Southbound     Left      NaN
9            799028  12:00:00 AM   Southbound   U-turn      NaN
10           799028  12:15:00 AM   Southbound   U-turn      NaN
11           799028  12:30:00 AM   Southbound   U-turn      NaN