I have a table that looks like the table below. I want to group by id, start_time, and approach so I can add the right, thru, left, and u-turn for each similar timestamp.
intersection_id | start_time | approach | movement | volume |
---|---|---|---|---|
799028 | 12:00:00 AM | Southbound | Right | 2 |
799028 | 12:15:00 AM | Southbound | Right | 4 |
799028 | 12:30:00 AM | Southbound | Right | 6 |
799028 | 12:00:00 AM | Southbound | Thru | 4 |
799028 | 12:15:00 AM | Southbound | Thru | 6 |
799028 | 12:30:00 AM | Southbound | Thru | 8 |
799028 | 12:00:00 AM | Southbound | Left | 6 |
799028 | 12:15:00 AM | Southbound | Left | 8 |
799028 | 12:30:00 AM | Southbound | Left | 10 |
799028 | 12:00:00 AM | Southbound | U-turn | 10 |
799028 | 12:15:00 AM | Southbound | U-turn | 12 |
799028 | 12:30:00 AM | Southbound | U-turn | 14 |
Example results:
intersection_id | start_time | approach | movement | volume |
---|---|---|---|---|
799028 | 12:00:00 AM | Southbound | Right | 24 |
799028 | 12:15:00 AM | Southbound | Right | 30 |
799028 | 12:30:00 AM | Southbound | Right | 38 |
799028 | 12:00:00 AM | Southbound | Thru | nan |
799028 | 12:15:00 AM | Southbound | Thru | nan |
799028 | 12:30:00 AM | Southbound | Thru | nan |
799028 | 12:00:00 AM | Southbound | Left | nan |
799028 | 12:15:00 AM | Southbound | Left | nan |
799028 | 12:30:00 AM | Southbound | Left | nan |
799028 | 12:00:00 AM | Southbound | U-turn | nan |
799028 | 12:15:00 AM | Southbound | U-turn | nan |
799028 | 12:30:00 AM | Southbound | U-turn | nan |
This will repeat itself until it goes through all the IDs and approaches.
I have tried a few different ways:
df['app_sum'] = df.groupby(['intersection_id','start_time','approach'], as_index=False)['volume'].transform('sum')
However, this code will not group correctly and will not provide nan values; it will repeat the values once it gets through the initial set of timestamps.
The second code I tried was
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=4, offset=-3)
df['app_sum'] = (
df.groupby(['intersection_id','start_time','approach'])['volume'].rolling(window=indexer).sum().droplevel(0))
I know this is wrong because there is no window.
Any suggestions?
CodePudding user response:
You could use:
df['volume'] = (df.groupby(['intersection_id', 'start_time', 'approach'])
['volume'].transform('sum')
.where(df['movement'].eq('Right'))
)
output:
intersection_id start_time approach movement volume
0 799028 12:00:00 AM Southbound Right 22.0
1 799028 12:15:00 AM Southbound Right 30.0
2 799028 12:30:00 AM Southbound Right 38.0
3 799028 12:00:00 AM Southbound Thru NaN
4 799028 12:15:00 AM Southbound Thru NaN
5 799028 12:30:00 AM Southbound Thru NaN
6 799028 12:00:00 AM Southbound Left NaN
7 799028 12:15:00 AM Southbound Left NaN
8 799028 12:30:00 AM Southbound Left NaN
9 799028 12:00:00 AM Southbound U-turn NaN
10 799028 12:15:00 AM Southbound U-turn NaN
11 799028 12:30:00 AM Southbound U-turn NaN