I know there are a lot of similar questions already, but none seem to apply to my problem (or I don't know how to apply them)
So I have a Pandas DataFrame with duplicated entries in the first column, but different values in the rest of the columns.
Like so:
location year GDP ...
0 AUT 1998 14...
1 AUT 2018 98...
2 AUS 1998 24...
3 AUS 2018 83...
...
I would like to get only unique entries in 'location' but keep all the columns and their values in a list:
location year GDP ...
0 AUT (1998, 2018) (14..., 98...)
1 AUS (1998, 2018) (24..., 83...)
...
I tried:
grouped = df_gdp_pop_years.groupby("Location")['Year'].apply(list).reset_index(name = "Year")
and I tried to do the same with a lambda function, but I always end up with the same problem: I only keep the years.
How can I get the results I want?
CodePudding user response:
You can try something like
df_gdp_pop_years.groupby("Location").agg({"Year": list, "GDP": list})
CodePudding user response:
If your other columns may be changing or there may be more added, you can accomplish this with a generalized .agg()
on those columns:
df_gdp_pop_years.groupby('location').agg(lambda x: x.tolist())
I found this by searching for 'opposite of pandas explode' which led me to a different SO question: How to implode(reverse of pandas explode) based on a column