Home > Mobile >  How to group datetime in date hour and take mean of the rows using Python
How to group datetime in date hour and take mean of the rows using Python

Time:12-18

Is it possible to select all the rows within the same date and hour of a datetime, then take the mean of each of the columns of all these rows and hereby create a new row with the minutes and seconds rounded to 00:00 and where the columns contains the mean values?

Our dataframe looks like the following:

datetime               name    temperature    humidity    wind speed
2020-02-01 00:04:12    air     8.2            87          0.4
2020-02-01 00:22:54    air     7.8            84          0.3
2020-02-01 00:47:32    air     7.5            84          0.4
2020-02-01 01:09:35    air     7.2            86          0.1
2020-02-01 01:58:02    air     6.9            80          0.0
2020-02-01 02:14:51    air     6.8            81          0.2
...

What we want and therefore what the dataframe should be changed to:

datetime               name    temperature    humidity    wind speed
2020-02-01 00:00:00    air     7.83           85          0.36
2020-02-01 01:00:00    air     7.05           83          0.05
2020-02-01 02:00:00    air     6.8            81          0.2
...

We imagine that a proper solution would be to use the groupby method, where we can group by date hour, but we can't find a proper way of getting the output described above

df = pd.read_csv("file.csv")
df.groupby({date   hour}).mean()

CodePudding user response:

df.groupby([df.datetime.astype(str).str[:13]   ":00:00", df.name]).mean().reset_index()

CodePudding user response:

First you'd need to have your datetime field as a DatetimeIndex so it can by split to intervals automatically, since relying on string manipulations is error-prone in cases your timestamp field is malformed or the format changes.

You can do this by running:

df['datetime'] = pd.to_datetime(df['datetime'])

Then groupby and resample your dataframe like this (H will do by hour):

df.groupby('name').resample('H', on='datetime').mean()

example:

data = {
    'datetime':['2020-02-01 00:04:12', '2020-02-01 00:22:54', '2020-02-01 01:22:54'],
    'temp':[10,20,30], 
    'other_metric':[100,200,300],
    'name':['air','air','air']
}
df = pd.DataFrame(data)
df['datetime'] = pd.to_datetime(df['datetime'])
df.groupby('name').resample('H', on='datetime').mean()

will turn this:

             datetime  temp  other_metric name
2020-02-01 00:04:12    10           100  air
2020-02-01 00:22:54    20           200  air
2020-02-01 01:22:54    30           300  air

into that:

                          temp  other_metric
name datetime
air  2020-02-01 00:00:00  15.0         150.0
     2020-02-01 01:00:00  30.0         300.0
  • Related