I have this code to get data from a dataset
df = get(datetime(2022, 4,19), 'validations')
df.head()
The result is:
card_id route_id timestamp
0 3941139920 34 2022-04-19 04:00:03
1 32111423 1305 2022-04-19 04:00:15
I need to aggregate the number of observations within one week from 19.4 to 26.4 per each route_id
i guess data from another day might be useful, so
card_id route_id timestamp
0 40629176 2040 2022-04-20 04:00:07
1 1961463432 4003 2022-04-20 04:00:11
so the final dataframe that I need:
route id trips
1 N
2 N
3 N
CodePudding user response:
If need get counts per route_id
and weeks starting by Sunday first get counts and then for aggregate per route_it
use sum
:
print (df)
card_id route_id timestamp
0 3941139920 34 2022-04-19 04:00:03
1 32111423 1305 2022-04-29 04:00:15
2 3941139920 34 2022-04-23 04:00:03
3 32111423 1305 2022-04-25 04:00:15
4 3941139920 34 2022-04-26 04:00:03
5 32111423 1305 2022-04-27 04:00:15
6 3941139920 34 2022-04-25 04:00:03
7 32111423 1305 2022-04-21 04:00:15
print (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')]).size())
route_id timestamp
34 2022-04-24 2
2022-05-01 2
1305 2022-04-24 1
2022-05-01 3
dtype: int64
df = (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')])
.size()
.groupby(level=0).sum()
.reset_index(name='count'))
print (df)
route_id count
0 34 4
1 1305 4
CodePudding user response:
You can groupby and aggregate by time using pd.Grouper(), and you can find more details about other types of time frequencies here.
df.groupby(pd.Grouper(key='timestamp', freq='W'))['route_id'].count()