Home > Enterprise >  Group by on the columns of a data frame and on the values ​of a dictionary
Group by on the columns of a data frame and on the values ​of a dictionary

Time:12-22

I have a data frame, df, like:

Year    Month      Country    Organizer      Participation
2020      1         China     FAO                True
2020      1         Japan     FAO                True
2020      1         France    EU                 False
2020      2         France    FAO                False
2020      2         Japan     FAO                True
2020      2         Germany   EU                 True
2020      2         Finland   EU                 False
2020      2         India     FAO                True
2020      3         Senegal   FAO                True

and a dictionary, codes:

codes = {
    'Asia': ['China', 'Japan', 'India'],
    'Europe': ['France', 'Germany', 'Finland'],
    'Africa': ['Senegal']
}

My goal is to perform a group by considering the fields 'Year', 'Month', 'Organizer' and the continent (in the df I have 'Country'), counting the number of rows. 'Participation' is not important.

To be more precise my final output should be:

Year    Month      Organizer    Continent    Number
2020      1         FAO          Asia          2
2020      1         EU           Europe        1
2020      2         FAO          Europe        1
2020      2         FAO          Asia          2
2020      2         EU           Europe        2
2020      3         FAO          Africa        1  

Is there a compact way to do this? The alternative way would be to add a column 'Continent' and also use this column in the group by, but I'm afraid this is inefficient and inelegant solution.

CodePudding user response:

You can use:

new_codes={i:k for k,v in codes.items() for i in v} #{'China': 'Asia', 'Japan': 'Asia', 'India': 'Asia', 'France': 'Europe', 'Germany': 'Europe', 'Finland': 'Europe', 'Senegal': 'Africa'}
df['Continent'] = df['Country'].replace(new_codes)
dfx = df.groupby(['Year','Month','Organizer','Continent']).agg(number=('Continent',"count"))

Output:

   Year  Month Organizer Continent  number
0  2020      1        EU    Europe       1
1  2020      1       FAO      Asia       2
2  2020      2        EU    Europe       2
3  2020      2       FAO      Asia       2
4  2020      2       FAO    Europe       1
5  2020      3       FAO    Africa       1

CodePudding user response:

Do a map using the dictionary to create a column 'Continent' and then do value_counts.

out = df.assign(
    Continent=df.Country.map({v: key for key, values in codes.items() for v in values})
).drop(
    ["Country", "Participation"], axis=1
).value_counts().to_frame().reset_index().rename(
    {0: "Number"}, axis="columns"
)

print(out)

   Year  Month Organizer Continent  Number
0  2020      1       FAO      Asia       2
1  2020      2        EU    Europe       2
2  2020      1        EU    Europe       1
3  2020      2       FAO      Asia       1
4  2020      2       FAO    Europe       1
5  2020      3       FAO    Africa       1
  • Related