I generate for each week as csv file. The weeks are then merged into one. The merged CSV with the Dataframe looks like this:
machineId | id | mean | min | max
machine1 | 2 | 00:00:03.47 | 00:00:00.02 | 00:00:06.11
machine1 | 1 | 00:00:01.30 | 00:00:00.74 | 00:00:01.86
machine1 | 2 | 00:00:00.35 | 00:00:00.01 | 00:00:00.99
machine1 | 2 | 00:00:01.63 | 00:00:00.67 | 00:00:02.60
machine1 | 3 | 00:00:00.66 | 00:00:00.03 | 00:00:01.91
Then i want to group by the same rows and calculate the mean from the row. The first, thirt and fourth should be grouped by and the average of the columns should be calculated
I already used this method:
df = df.groupby(['machineId','id']).agg({'mean': 'mean','min':'mean','max':'mean})
but there is an error:
TypeError: Could not convert 00:00:03.47 to numeric
CodePudding user response:
If need aggregate datetime
s need native unix format, so firt convert values to numeric, aggregate and then convert back to original format:
cols = ['mean','min','max']
df[cols]= df[cols].apply(pd.to_datetime).astype('int64')
df = (df.groupby(['machineId','id'], as_index=False)
.agg({'mean': 'mean','min':'mean','max':'mean'}))
df[cols]= df[cols].apply(lambda x:pd.to_datetime(x).dt.strftime('%H:%M:%S.%f'))
print (df)
machineId id mean min max
0 machine1 1 00:00:01.300000 00:00:00.740000 00:00:01.860000
1 machine1 2 00:00:01.816666 00:00:00.233333 00:00:03.233333
2 machine1 3 00:00:00.660000 00:00:00.030000 00:00:01.910000