Home > Blockchain >  Don't make a second level when aggregating in Pandas
Don't make a second level when aggregating in Pandas

Time:12-24

Indexes and levels in Pandas still drive me nuts when using it. My dataframe structure looks like that:

Indexstruktur   Datensatz   Workload    Selectivity     Dimensionen     Anzahl Datenpunkte  Operation   Measure     Wert

I now group by all columns except the last one ("Wert"). This one will be used to aggregate, where I calculate average and standard deviation:

mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"]).agg({"Wert":['mean','std']})
mean_df.reset_index(inplace=True)

The result are two levels:

index   Indexstruktur   Datensatz   Workload    Selectivity     Dimensionen     Anzahl Datenpunkte  Operation   Measure     Wert
                                                                                                                       mean     std

How can I get rid of "Wert" and just make "avg" and "std" two columns in the same level as the rest?

mean_df.columns returns:

MultiIndex([(     'Indexstruktur',     ''),
            (         'Datensatz',     ''),
            (          'Workload',     ''),
            (       'Selectivity',     ''),
            (       'Dimensionen',     ''),
            ('Anzahl Datenpunkte',     ''),
            (         'Operation',     ''),
            (           'Measure',     ''),
            (              'Wert', 'mean'),
            (              'Wert',  'std')],
           )

I tried reset_index, droplevel, as_index=False, but nothing changes anything. What is the solution for this?

CodePudding user response:

Maybe this should work for you mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"])["Wert"].agg(['mean','std']).reset_index() This would still have "Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure" as columns though, but also agg and std. If you don't want that just do mean_df = df.groupby(["Indexstruktur", "Datensatz","Workload", "Selectivity", "Dimensionen", "Anzahl Datenpunkte", "Operation", "Measure"])["Wert"].agg(['mean','std']).reset_index(drop=True)

  • Related