I have a dataset looking like this:
country | company_name | company_size | company_activity |
---|---|---|---|
DE | McDonalds | 50 | food |
FR | McDonalds | 50 | food |
NL | 7 eleven | 5 | food |
I want to get it into this format:
country | company_name | company_size | company_activity |
---|---|---|---|
DE,FR | McDonalds | 50 | food |
NL | 7 eleven | 5 | food |
I have this code:
df_cross = df.groupby(["company_name"]).agg({"country": ",".join, "company_activity": ",".join, "company_size": "first"}).reset_index().groupby(["country"]).agg({"company_name": "first", "company_activity": ",".join, "company_size": "first"}).reset_index()
This is not giving me my full dataset back though plus the code feels too long. Does anyone have a more elegant solution to this?
CodePudding user response:
Thanks to @RanA this is the solution:
df_cross = df.groupby(["organization_name"]).agg({"country": ",".join, "source_website": "first", "advertiser_type":"first", "organization_activity": ",".join, "organization_size": "first"}).reset_index()
CodePudding user response:
(df.groupby(['company_name'])
.agg({'country': lambda x: ','.join(map(str, x.tolist())), "company_size": "first" , "company_activity": "first"})
.reset_index())
Output:
company_name country company_size company_activity
0 7 NL 5 food
1 Mc DE,FR 50 food