Home > Software engineering >  Convert a GroupBy object with no aggregation function into a dataframe while keeping the grouping
Convert a GroupBy object with no aggregation function into a dataframe while keeping the grouping

Time:12-15

Say I have the following dataframe df

gender | country | age
------- --------- -----
M         DK       20
M         ES       19
M         ES       21
M         DK       30
F         DK       20
F         ES       19
F         ES       21
F         DK       12

Is there a way to get the Grouped data as a dataframe e.g

import pandas as pd
df_grp = df.group_by(["gender","country"])["age"] #Group it
df_grouped_df = pd.DataFrame(df_grp) #Convert to ordinary dataframe

print(df_grouped_df)

                   age
gender  country
------ ---------- 
                   20
 M      DK         30
        -------------
        ES         19
                   21

---------------------
         DK        20
F                  30
         ------------
         ES        19
                   21

EDIT:

Using set_index does not seem to work, since it does not group the second columns specified (e.g country);

df = pd.DataFrame({"gender":["M"]*4 ["F"]*4,"country":["DK","ES","ES","DK"]*2,"age":[1,2,3,4,5,6,7,8]})

 gender country age
    M   DK  1
    M   ES  2
    M   ES  3
    M   DK  4
    F   DK  5
    F   ES  6
    F   ES  7
    F   DK  8

df.set_index(["gender","country"])


                     age
   gender   country 
       M      DK        1
              ES        2
              ES        3
              DK        4
       F      DK        5
              ES        6
              ES        7
              DK        8

As noted it is not grouped for each country as required

CodePudding user response:

Probably not clean but has the visual aspect:

out = df.sort_values(['gender', 'country'], ascending=[False, True]).reset_index(drop=True)
out.loc[out.duplicated(['gender', 'country']), 'country'] = ''
out.loc[out.duplicated('gender'), 'gender'] = ''
print(out.set_index(['gender', 'country'])

# Output:
                age
gender country     
M      DK        20
                 30
       ES        19
                 21
F      DK        20
                 12
       ES        19
                 21

CodePudding user response:

First groupby in pandas is like iterator, so is necessary specify aggregate function(s) after it.

df1 = df.groupby(["gender","country"])["age"].sum().to_frame()

If need MultiIndex:

df1 = df.set_index(["gender","country"]).sort_index()

EDIT: You can check docs:

We’ve “sparsified” the higher levels of the indexes to make the console output a bit easier on the eyes. Note that how the index is displayed can be controlled using the multi_sparse option in pandas.set_options():

df1 = df.set_index(["gender","country"]).sort_index()

print (df1)
                age
gender country     
F      DK         5
       DK         8
       ES         6
       ES         7
M      DK         1
       DK         4
       ES         2
       ES         3

with pd.option_context("display.multi_sparse", False):
    print (df1)
                age
gender country     
F      DK         5
F      DK         8
F      ES         6
F      ES         7
M      DK         1
M      DK         4
M      ES         2
M      ES         3
  • Related