Home > Software design >  How to use pandas groupby, grouper with ngroup?
How to use pandas groupby, grouper with ngroup?

Time:04-07

I have a dataframe with a date time column. I am trying to group based on 24 hour window but I am not sure what I'm missing. Please let me know where am i going wrong.

For example my dataframe is as below

                Dates
0 2021-07-26 07:30:00
1 2021-07-26 13:05:00
2 2021-07-28 08:00:00
3 2021-07-29 00:36:00
4 2021-07-29 16:15:00

I am trying to group the dataframe and give them a unique number if it falls under a 24 hours window based on the first date in each group.

Meaning it should group and assign a unique number as below. It picks the first value and groups all the row values coming after it where the time falls within the 24 hours window. So in this example it should group everything which fall between (2021-07-26 07:30:00 to 2021-07-27 07:30:00) as 1 and (2021-07-28 08:00:00 to 2021-07-29 08:00:00) as 2 and (2021-07-29 16:15:00 to 2021-07-30 16:15:00) as 3

Expected O/P

                 date  groupedbytime
0 2021-07-26 07:30:00   1
1 2021-07-26 13:05:00   1
2 2021-07-28 08:00:00   2
3 2021-07-29 00:36:00   2
4 2021-07-29 16:15:00   3

I am using groupby and grouper but I am getting the o/p as below where its grouping as per the days but not as per the 24 hour window. Kindly advise how to approach this

tempdf['groupedbytime'] = tempdf.groupby(pd.Grouper(key="Dates",freq='24H')).ngroup() 1

O/P

                 date  groupedbytime
0 2021-07-26 07:30:00   1
1 2021-07-26 13:05:00   1
2 2021-07-28 08:00:00   2
3 2021-07-29 00:36:00   3
4 2021-07-29 16:15:00   3

CodePudding user response:

You can working with timedeltas created by subtract first value with integer division, for consecutive order is added factorize:

s = df['Dates'].sub(df['Dates'].iat[0]).dt.total_seconds() // (3600 * 24)
df['groupedbytime'] = pd.factorize(s)[0]   1
print (df)
                Dates  groupedbytime
0 2021-07-26 07:30:00              1
1 2021-07-26 13:05:00              1
2 2021-07-28 08:00:00              2
3 2021-07-29 00:36:00              2
4 2021-07-29 16:15:00              3

With Grouper:

s = df['Dates'].sub(df['Dates'].iat[0])
s = s.to_frame().groupby(pd.Grouper(key="Dates",freq='24H'))['Dates'].ngroup()
df['groupedbytime'] = pd.factorize(s)[0]   1
print (df)
0 2021-07-26 07:30:00              1
1 2021-07-26 13:05:00              1
2 2021-07-28 08:00:00              2
3 2021-07-29 00:36:00              2
4 2021-07-29 16:15:00              3
  • Related