Home > Mobile >  Groupby specific hour interval per day
Groupby specific hour interval per day

Time:06-01

I have a data frame that looks something like:

df = 

                     val1     val2
date
----------------------------------
2021-06-01 00:00     1        3
2021-06-01 01:00     2        4
2021-06-01 02:00     3        5
2021-06-01 03:00     4        6
2021-06-01 04:00     5        7
2021-06-02 00:00     6        8
2021-06-02 01:00     7        9
2021-06-02 02:00     8        10
2021-06-02 03:00     9        11
2021-06-02 04:00     10       12
2021-06-03 00:00     11       13
2021-06-03 01:00     12       14
2021-06-03 02:00     13       15
2021-06-03 03:00     14       16
2021-06-03 04:00     15       17

If I then do something like:

df_new = df.groupby(df.index.date).mean()

The resulting data frame will be:

df_new = 

                     val1     val2
date
----------------------------------
2021-06-01           3        5
2021-06-02           8        10
2021-06-03           13       15

However, I would like this to happen only for a specific time interval for each day. So in this case maybe only from 00:00-03:00 and not the 04:00 hour. Hence, the resulting data frame should instead be:

df_true = 

                     val1     val2
date
----------------------------------
2021-06-01           2.5      4.5
2021-06-02           7.5      9.5
2021-06-03           12.5     14.5

CodePudding user response:

First idea is filter rows by boolean indexing and then use Grouper in groupby or DataFrame.resample:

df_new = df[df.index.hour < 4].groupby(pd.Grouper(freq='D')).mean()
df_new = df[df.index.hour < 4].resample('D').mean()
print (df_new)
            val1  val2
date                  
2021-06-01   2.5   4.5
2021-06-02   7.5   9.5
2021-06-03  12.5  14.5

Or use DataFrame.between_time:

df_new = df.between_time('0:00', '3:00').groupby(pd.Grouper(freq='D')).mean()
df_new = df.between_time('0:00', '3:00').resample('D').mean()
print (df_new)
            val1  val2
date                  
2021-06-01   2.5   4.5
2021-06-02   7.5   9.5
2021-06-03  12.5  14.5

CodePudding user response:

I like @jezrael's method better, but sticking close to your current code, you can do this:

m = (df.index.hour >=0) & (df.index.hour <= 3)
df[m].groupby(df[m].index.date).mean()

Output:

            val1  val2
2021-06-01   2.5   4.5
2021-06-02   7.5   9.5
2021-06-03  12.5  14.5
  • Related