Home > Software design >  aggregate within a week
aggregate within a week

Time:04-29

I have this code to get data from a dataset

df = get(datetime(2022, 4,19), 'validations')
df.head()

The result is:

    card_id     route_id    timestamp
0   3941139920  34      2022-04-19 04:00:03
1   32111423    1305    2022-04-19 04:00:15

I need to aggregate the number of observations within one week from 19.4 to 26.4 per each route_id

i guess data from another day might be useful, so

    card_id     route_id    timestamp
0   40629176    2040    2022-04-20 04:00:07
1   1961463432  4003    2022-04-20 04:00:11

so the final dataframe that I need:

route id    trips
1           N  
2           N
3           N

CodePudding user response:

If need get counts per route_id and weeks starting by Sunday first get counts and then for aggregate per route_it use sum:

print (df)
      card_id  route_id           timestamp
0  3941139920        34 2022-04-19 04:00:03
1    32111423      1305 2022-04-29 04:00:15
2  3941139920        34 2022-04-23 04:00:03
3    32111423      1305 2022-04-25 04:00:15
4  3941139920        34 2022-04-26 04:00:03
5    32111423      1305 2022-04-27 04:00:15
6  3941139920        34 2022-04-25 04:00:03
7    32111423      1305 2022-04-21 04:00:15

print (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')]).size())
route_id  timestamp 
34        2022-04-24    2
          2022-05-01    2
1305      2022-04-24    1
          2022-05-01    3
dtype: int64

df = (df.groupby(['route_id', pd.Grouper(freq='W', key='timestamp')])
   .size()
   .groupby(level=0).sum()
   .reset_index(name='count'))

print (df)
   route_id  count
0        34      4
1      1305      4

CodePudding user response:

You can groupby and aggregate by time using pd.Grouper(), and you can find more details about other types of time frequencies here.

 df.groupby(pd.Grouper(key='timestamp', freq='W'))['route_id'].count()
  • Related