Home > Software design >  Group by timestamp and get mean Dataframe
Group by timestamp and get mean Dataframe

Time:12-13

I generate for each week as csv file. The weeks are then merged into one. The merged CSV with the Dataframe looks like this:

  machineId    |       id     |     mean      |   min         |    max
  machine1     |        2     | 00:00:03.47   | 00:00:00.02   |   00:00:06.11
  machine1     |        1     | 00:00:01.30   | 00:00:00.74   |   00:00:01.86
  machine1     |        2     | 00:00:00.35   | 00:00:00.01   |   00:00:00.99
  machine1     |        2     | 00:00:01.63   | 00:00:00.67   |   00:00:02.60
  machine1     |        3     | 00:00:00.66   | 00:00:00.03   |   00:00:01.91

Then i want to group by the same rows and calculate the mean from the row. The first, thirt and fourth should be grouped by and the average of the columns should be calculated

I already used this method:

df = df.groupby(['machineId','id']).agg({'mean': 'mean','min':'mean','max':'mean})

but there is an error:

TypeError: Could not convert 00:00:03.47 to numeric

CodePudding user response:

If need aggregate datetimes need native unix format, so firt convert values to numeric, aggregate and then convert back to original format:

cols = ['mean','min','max']

df[cols]= df[cols].apply(pd.to_datetime).astype('int64')

df = (df.groupby(['machineId','id'], as_index=False)
         .agg({'mean': 'mean','min':'mean','max':'mean'}))

df[cols]= df[cols].apply(lambda x:pd.to_datetime(x).dt.strftime('%H:%M:%S.%f'))
print (df)

  machineId  id             mean              min              max
0  machine1   1  00:00:01.300000  00:00:00.740000  00:00:01.860000
1  machine1   2  00:00:01.816666  00:00:00.233333  00:00:03.233333
2  machine1   3  00:00:00.660000  00:00:00.030000  00:00:01.910000
  • Related