Is it possible to select all the rows within the same date and hour of a datetime, then take the mean of each of the columns of all these rows and hereby create a new row with the minutes and seconds rounded to 00:00 and where the columns contains the mean values?
Our dataframe looks like the following:
datetime name temperature humidity wind speed
2020-02-01 00:04:12 air 8.2 87 0.4
2020-02-01 00:22:54 air 7.8 84 0.3
2020-02-01 00:47:32 air 7.5 84 0.4
2020-02-01 01:09:35 air 7.2 86 0.1
2020-02-01 01:58:02 air 6.9 80 0.0
2020-02-01 02:14:51 air 6.8 81 0.2
...
What we want and therefore what the dataframe should be changed to:
datetime name temperature humidity wind speed
2020-02-01 00:00:00 air 7.83 85 0.36
2020-02-01 01:00:00 air 7.05 83 0.05
2020-02-01 02:00:00 air 6.8 81 0.2
...
We imagine that a proper solution would be to use the groupby method, where we can group by date hour, but we can't find a proper way of getting the output described above
df = pd.read_csv("file.csv")
df.groupby({date hour}).mean()
CodePudding user response:
df.groupby([df.datetime.astype(str).str[:13] ":00:00", df.name]).mean().reset_index()
CodePudding user response:
First you'd need to have your datetime field as a DatetimeIndex so it can by split to intervals automatically, since relying on string manipulations is error-prone in cases your timestamp field is malformed or the format changes.
You can do this by running:
df['datetime'] = pd.to_datetime(df['datetime'])
Then groupby and resample your dataframe like this (H
will do by hour):
df.groupby('name').resample('H', on='datetime').mean()
example:
data = {
'datetime':['2020-02-01 00:04:12', '2020-02-01 00:22:54', '2020-02-01 01:22:54'],
'temp':[10,20,30],
'other_metric':[100,200,300],
'name':['air','air','air']
}
df = pd.DataFrame(data)
df['datetime'] = pd.to_datetime(df['datetime'])
df.groupby('name').resample('H', on='datetime').mean()
will turn this:
datetime temp other_metric name
2020-02-01 00:04:12 10 100 air
2020-02-01 00:22:54 20 200 air
2020-02-01 01:22:54 30 300 air
into that:
temp other_metric
name datetime
air 2020-02-01 00:00:00 15.0 150.0
2020-02-01 01:00:00 30.0 300.0