I have a data frame that looks something like:
df =
val1 val2
date
----------------------------------
2021-06-01 00:00 1 3
2021-06-01 01:00 2 4
2021-06-01 02:00 3 5
2021-06-01 03:00 4 6
2021-06-01 04:00 5 7
2021-06-02 00:00 6 8
2021-06-02 01:00 7 9
2021-06-02 02:00 8 10
2021-06-02 03:00 9 11
2021-06-02 04:00 10 12
2021-06-03 00:00 11 13
2021-06-03 01:00 12 14
2021-06-03 02:00 13 15
2021-06-03 03:00 14 16
2021-06-03 04:00 15 17
If I then do something like:
df_new = df.groupby(df.index.date).mean()
The resulting data frame will be:
df_new =
val1 val2
date
----------------------------------
2021-06-01 3 5
2021-06-02 8 10
2021-06-03 13 15
However, I would like this to happen only for a specific time interval for each day. So in this case maybe only from 00:00-03:00
and not the 04:00
hour. Hence, the resulting data frame should instead be:
df_true =
val1 val2
date
----------------------------------
2021-06-01 2.5 4.5
2021-06-02 7.5 9.5
2021-06-03 12.5 14.5
CodePudding user response:
First idea is filter rows by boolean indexing
and then use Grouper
in groupby
or DataFrame.resample
:
df_new = df[df.index.hour < 4].groupby(pd.Grouper(freq='D')).mean()
df_new = df[df.index.hour < 4].resample('D').mean()
print (df_new)
val1 val2
date
2021-06-01 2.5 4.5
2021-06-02 7.5 9.5
2021-06-03 12.5 14.5
Or use DataFrame.between_time
:
df_new = df.between_time('0:00', '3:00').groupby(pd.Grouper(freq='D')).mean()
df_new = df.between_time('0:00', '3:00').resample('D').mean()
print (df_new)
val1 val2
date
2021-06-01 2.5 4.5
2021-06-02 7.5 9.5
2021-06-03 12.5 14.5
CodePudding user response:
I like @jezrael's method better, but sticking close to your current code, you can do this:
m = (df.index.hour >=0) & (df.index.hour <= 3)
df[m].groupby(df[m].index.date).mean()
Output:
val1 val2
2021-06-01 2.5 4.5
2021-06-02 7.5 9.5
2021-06-03 12.5 14.5