I have some data where measurements are taken very 15 minutes during the day. There could be times where the measurements are not available but generally the equipment should measure every 15 minutes.
The dataset is as follows:
timestamp,concentrate
2019-08-01 00:00:00,872.0
2019-08-01 00:15:00,668.0
2019-08-01 00:30:00,604.0
2019-08-01 00:45:00,788.0
2019-08-01 01:00:00,608.0
2019-08-01 01:15:00,692.0
2019-08-01 01:30:00,716.0
2019-08-01 01:45:00,692.0
2019-08-01 02:00:00,672.0
2019-08-01 02:15:00,636.0
2019-08-01 02:30:00,596.0
2019-08-01 02:45:00,748.0
2019-08-01 03:00:00,424.0
2019-08-01 03:15:00,596.0
2019-08-01 03:30:00,936.0
2019-08-01 03:45:00,976.0
2019-08-01 04:00:00,912.0
2019-08-01 04:15:00,1100.0
2019-08-01 04:30:00,1312.0
2019-08-01 04:45:00,1904.0
2019-08-01 05:00:00,2232.0
2019-08-01 05:15:00,3104.0
2019-08-01 05:30:00,4100.0
2019-08-01 05:45:00,4836.0
2019-08-01 06:00:00,4476.0
2019-08-01 06:15:00,4032.0
2019-08-01 06:30:00,1744.0
2019-08-01 06:45:00,2416.0
2019-08-01 07:00:00,2396.0
2019-08-01 07:15:00,1400.0
2019-08-01 07:30:00,5336.0
2019-08-01 07:45:00,4872.0
2019-08-01 08:00:00,5820.0
2019-08-01 08:15:00,5376.0
2019-08-01 08:30:00,5528.0
2019-08-01 08:45:00,5344.0
2019-08-01 09:00:00,5356.0
2019-08-01 09:15:00,5036.0
2019-08-01 09:30:00,5116.0
...
Now what I want to do is basically compute the mean and variance of the data grouped by various times of the day i.e. for all the days the average at 00:00:00
, 00:15:00
, 00:30:00
, 00:45:00
, 01:00:00
.... etc.
How can I group and average based on the different times of the day?
CodePudding user response:
You can insert df.timestamp.dt.minute
inside df.groupby
:
df.timestamp = pd.to_datetime(df.timestamp)
df.groupby(df.timestamp.dt.minute).mean()