I have this dataframe:
df = pd.DataFrame()
df["A"] = [2,2,4,4,4,8,9]
df["B"] = [2,2,4,4,4,7,9]
df["C"] = list("axcdxef")
print(df.to_string(index=False))
A B C
2 2 a
2 2 x
4 4 c
4 4 d
4 4 x
8 7 e
9 9 f
I want to convert to this, unstacking column C for rows where columns A and B are duplicates:
df2 = pd.DataFrame()
df2["A"] = [2,4,8,9]
df2["B"] = [2,4,7,9]
df2["C"] = ["a,x", "c,d,x", "e", "f"]
print()
print(df2.to_string(index=False))
A B C
2 2 a,x
4 4 c,d,x
8 7 e
9 9 f
I've looked at pivot() and unstack(), but haven't found the right recipe yet. Any ideas?
CodePudding user response:
use .groupby
and .agg(','.join)
df1 = df.groupby(['A','B'],as_index=False)['C'].agg(','.join)
print(df1)
A B C
0 2 2 a,x
1 4 4 c,d,x
2 8 7 e
3 9 9 f