Indexes and levels in Pandas still drive me nuts when using it. My dataframe structure looks like that:
Indexstruktur Datensatz Workload Selectivity Dimensionen Anzahl Datenpunkte Operation Measure Wert
I now group by all columns except the last one ("Wert"). This one will be used to aggregate, where I calculate average and standard deviation:
mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"]).agg({"Wert":['mean','std']})
mean_df.reset_index(inplace=True)
The result are two levels:
index Indexstruktur Datensatz Workload Selectivity Dimensionen Anzahl Datenpunkte Operation Measure Wert
mean std
How can I get rid of "Wert" and just make "avg" and "std" two columns in the same level as the rest?
mean_df.columns returns:
MultiIndex([( 'Indexstruktur', ''),
( 'Datensatz', ''),
( 'Workload', ''),
( 'Selectivity', ''),
( 'Dimensionen', ''),
('Anzahl Datenpunkte', ''),
( 'Operation', ''),
( 'Measure', ''),
( 'Wert', 'mean'),
( 'Wert', 'std')],
)
I tried reset_index, droplevel, as_index=False, but nothing changes anything. What is the solution for this?
CodePudding user response:
Maybe this should work for you
mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"])["Wert"].agg(['mean','std']).reset_index()
This would still have
"Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure" as columns though, but also agg and std. If you don't want that just do
mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"])["Wert"].agg(['mean','std']).reset_index(drop=True)