Counting eligible minutes with hours as a group -python-CodePudding

I want to find the number of eligible minutes while having the hour as a group. As I am working with a large dataset, is it possible to use pandas aggregate function to have this output?

user minutes hour
1    778     12
1    779     12
1    780     13
1    781     13
1    782     13
1    783     13

I tried using df.groupby('hour').count() but the output is wrong.

user hour minutes
1    12   2
1    13   4

The correct output that I want is wrong as I want the 780th minute to be inclusive in hour 12 too.

user hour minutes
1    12   3
1    13   4

Your help is greatly appreciated.

CodePudding user response：

Use numpy.where and Groupby.count:

In [2396]: import numpy as np

# Check if the minute is divisible by 60, then subtract 1 from the hour
In [2397]: df['hour'] = np.where(df['minutes'] % 60 == 0, df['hour'] - 1, df['hour'])

# Group on hour and count
In [2398]: df.groupby('hour').count()
Out[2398]: 
      user  minutes
hour               
12       3        3
13       3        3

CodePudding user response：

IIUC, you can use divmod to get the division and remainder by 60 and add missing rows with pandas.concat:

s1,s2 = df['minutes'].divmod(60)
(pd
 .concat([df,
          # select rows where hour is multiple of 60
          # and matching hour is not already the previous hour
                                          # decrement hour
          df[s1.ne(df['hour']-1)&s2.eq(0)].eval('hour = hour-1')
          ])
 .groupby('hour')
 .agg({'hour': 'first', 'minutes': 'count'})
)

Or, if you already know that hour and minutes match, simplify by using only the mod:

mask = df['minutes'].mod(60).eq(0)
(pd
 .concat([df,
          df[mask].eval('hour = hour-1')
          ])
 .groupby('hour')
 .agg({'hour': 'first', 'minutes': 'count'})
)

output:

      hour  minutes
hour               
12      12        3
13      13        4