I want to find the number of eligible minutes while having the hour as a group. As I am working with a large dataset, is it possible to use pandas aggregate function to have this output?
user minutes hour
1 778 12
1 779 12
1 780 13
1 781 13
1 782 13
1 783 13
I tried using df.groupby('hour').count()
but the output is wrong.
user hour minutes
1 12 2
1 13 4
The correct output that I want is wrong as I want the 780th minute to be inclusive in hour 12 too.
user hour minutes
1 12 3
1 13 4
Your help is greatly appreciated.
CodePudding user response:
Use numpy.where
and Groupby.count
:
In [2396]: import numpy as np
# Check if the minute is divisible by 60, then subtract 1 from the hour
In [2397]: df['hour'] = np.where(df['minutes'] % 60 == 0, df['hour'] - 1, df['hour'])
# Group on hour and count
In [2398]: df.groupby('hour').count()
Out[2398]:
user minutes
hour
12 3 3
13 3 3
CodePudding user response:
IIUC, you can use divmod
to get the division and remainder by 60 and add missing rows with pandas.concat
:
s1,s2 = df['minutes'].divmod(60)
(pd
.concat([df,
# select rows where hour is multiple of 60
# and matching hour is not already the previous hour
# decrement hour
df[s1.ne(df['hour']-1)&s2.eq(0)].eval('hour = hour-1')
])
.groupby('hour')
.agg({'hour': 'first', 'minutes': 'count'})
)
Or, if you already know that hour and minutes match, simplify by using only the mod
:
mask = df['minutes'].mod(60).eq(0)
(pd
.concat([df,
df[mask].eval('hour = hour-1')
])
.groupby('hour')
.agg({'hour': 'first', 'minutes': 'count'})
)
output:
hour minutes
hour
12 12 3
13 13 4