I have a data frame with a column that records date and time from Jan - Dec as ‘start_date’. I want to group the data by hour and find the mean. When I use .resample(‘H’) it groups into hours for each month, but I want each month to be grouped into hour.
CodePudding user response:
I'm not sure what you are asking for. If next time you provide an actual example of what you want to work with then you can be more sure to get the help that you need.
My guess is that you have something like the following:
df = pd.DataFrame({
'start_time': ['2022-01-01 08:17:23.12', '2022-02-01 08:22:58.76', '2022-02-01 08:19:02.57', '2022-01-01 08:55:43.99','2022-01-01 08:41:23.10', '2022-01-01 09:14:59.99', '2022-02-01 09:15:02.02', '2022-01-01 09:44:43.30','2022-02-01 09:54:23.71', '2022-02-01 10:15:00.00', '2022-01-01 10:15:02.99', '2022-01-01 10:19:43.52'],
'score': [2, 1, 3, 3, 5, 4, 6, 6, 4, 10, 9, 14],
})
and that you want the averages to be per hour regardless of month. Then I'd do something like
df["start_hour"] = pd.DatetimeIndex(pd.to_datetime(df["start_time"]).round("1h")).time
df = df.groupby("start_hour").mean()
first rounding the hours then converting to only time. The result is
score
start_hour
08:00:00 2.0
09:00:00 4.5
10:00:00 8.6