Home > Net >  How to aggregate values in one column for the same values in a different one in pandas?
How to aggregate values in one column for the same values in a different one in pandas?

Time:09-08

I have a dataset looking like this:

country company_name company_size company_activity
DE McDonalds 50 food
FR McDonalds 50 food
NL 7 eleven 5 food

I want to get it into this format:

country company_name company_size company_activity
DE,FR McDonalds 50 food
NL 7 eleven 5 food

I have this code:

df_cross = df.groupby(["company_name"]).agg({"country": ",".join, "company_activity": ",".join, "company_size": "first"}).reset_index().groupby(["country"]).agg({"company_name": "first", "company_activity": ",".join, "company_size": "first"}).reset_index()

This is not giving me my full dataset back though plus the code feels too long. Does anyone have a more elegant solution to this?

CodePudding user response:

Thanks to @RanA this is the solution:

df_cross = df.groupby(["organization_name"]).agg({"country": ",".join, "source_website": "first", "advertiser_type":"first", "organization_activity": ",".join, "organization_size": "first"}).reset_index()

CodePudding user response:

(df.groupby(['company_name'])
          .agg({'country': lambda x: ','.join(map(str, x.tolist())), "company_size": "first" , "company_activity": "first"})
          .reset_index())

Output:

    company_name    country company_size    company_activity
0   7               NL      5               food
1   Mc              DE,FR   50              food
  • Related