Home > OS >  How do you groupby with pandas that sums values in the same column but is offset by a set number of
How do you groupby with pandas that sums values in the same column but is offset by a set number of

Time:10-19

I have a table that looks like the table below. I want to group by id, start_time, and approach so I can add the right, thru, left, and u-turn for each similar timestamp.

intersection_id start_time approach movement volume
799028 12:00:00 AM Southbound Right 2
799028 12:15:00 AM Southbound Right 4
799028 12:30:00 AM Southbound Right 6
799028 12:00:00 AM Southbound Thru 4
799028 12:15:00 AM Southbound Thru 6
799028 12:30:00 AM Southbound Thru 8
799028 12:00:00 AM Southbound Left 6
799028 12:15:00 AM Southbound Left 8
799028 12:30:00 AM Southbound Left 10
799028 12:00:00 AM Southbound U-turn 10
799028 12:15:00 AM Southbound U-turn 12
799028 12:30:00 AM Southbound U-turn 14

Example results:

intersection_id start_time approach movement volume
799028 12:00:00 AM Southbound Right 24
799028 12:15:00 AM Southbound Right 30
799028 12:30:00 AM Southbound Right 38
799028 12:00:00 AM Southbound Thru nan
799028 12:15:00 AM Southbound Thru nan
799028 12:30:00 AM Southbound Thru nan
799028 12:00:00 AM Southbound Left nan
799028 12:15:00 AM Southbound Left nan
799028 12:30:00 AM Southbound Left nan
799028 12:00:00 AM Southbound U-turn nan
799028 12:15:00 AM Southbound U-turn nan
799028 12:30:00 AM Southbound U-turn nan

This will repeat itself until it goes through all the IDs and approaches.

I have tried a few different ways:

df['app_sum'] = df.groupby(['intersection_id','start_time','approach'], as_index=False)['volume'].transform('sum')

However, this code will not group correctly and will not provide nan values; it will repeat the values once it gets through the initial set of timestamps.

The second code I tried was

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
df['app_sum'] = (
    df.groupby(['intersection_id','start_time','approach'])['volume'].rolling(window=indexer).sum().droplevel(0))

I know this is wrong because there is no window.

Any suggestions?

CodePudding user response:

You could use:

df['volume'] = (df.groupby(['intersection_id', 'start_time', 'approach'])
                  ['volume'].transform('sum')
                  .where(df['movement'].eq('Right'))
                )

output:

    intersection_id    start_time     approach movement  volume
0            799028  12:00:00 AM   Southbound    Right     22.0
1            799028  12:15:00 AM   Southbound    Right     30.0
2            799028  12:30:00 AM   Southbound    Right     38.0
3            799028  12:00:00 AM   Southbound     Thru      NaN
4            799028  12:15:00 AM   Southbound     Thru      NaN
5            799028  12:30:00 AM   Southbound     Thru      NaN
6            799028  12:00:00 AM   Southbound     Left      NaN
7            799028  12:15:00 AM   Southbound     Left      NaN
8            799028  12:30:00 AM   Southbound     Left      NaN
9            799028  12:00:00 AM   Southbound   U-turn      NaN
10           799028  12:15:00 AM   Southbound   U-turn      NaN
11           799028  12:30:00 AM   Southbound   U-turn      NaN
  • Related