How to aggregate values in one column for the same values in a different one in pandas?-CodePudding

I have a dataset looking like this:

country	company_name	company_size	company_activity
DE	McDonalds	50	food
FR	McDonalds	50	food
NL	7 eleven	5	food

I want to get it into this format:

country	company_name	company_size	company_activity
DE,FR	McDonalds	50	food
NL	7 eleven	5	food

I have this code:

df_cross = df.groupby(["company_name"]).agg({"country": ",".join, "company_activity": ",".join, "company_size": "first"}).reset_index().groupby(["country"]).agg({"company_name": "first", "company_activity": ",".join, "company_size": "first"}).reset_index()

This is not giving me my full dataset back though plus the code feels too long. Does anyone have a more elegant solution to this?

CodePudding user response：

Thanks to @RanA this is the solution:

df_cross = df.groupby(["organization_name"]).agg({"country": ",".join, "source_website": "first", "advertiser_type":"first", "organization_activity": ",".join, "organization_size": "first"}).reset_index()

CodePudding user response：

(df.groupby(['company_name'])
          .agg({'country': lambda x: ','.join(map(str, x.tolist())), "company_size": "first" , "company_activity": "first"})
          .reset_index())

Output:

    company_name    country company_size    company_activity
0   7               NL      5               food
1   Mc              DE,FR   50              food