So I have a dataframe in which there are a couple of columns and a lot of rows.
Now I want to create a new column (C) which adds values of another column (A) as a string together if a third column (B) is identical.
So each 'group' (that is identical in B) should have a different string than the other groups in that column in the end.
A | B | New Column C |
---|---|---|
First | 1 | First_Third |
Second | 22 | Second_Fourth |
Third | 1 | First_Third |
Fourth | 22 | Second_Fourth |
Something like this pseudo code:
for x in df[B]:
if (x "is identical to" x "of another row"):
df[C] = df[C].cat(df[A])
How do I code an algorithm that can do this?
CodePudding user response:
Try this:
df['C'] = df.groupby('B')['A'].transform(lambda x: '_'.join(x))
CodePudding user response:
You can use:
df['C'] = df.groupby('B')['A'].transform('_'.join)
Or, if you want to keep only unique values:
df['C'] = df.groupby('B')['A'].transform(lambda x: '_'.join(x.unique()))
output:
A B C
0 First 1 First_Third
1 Second 22 Second_Fourth
2 Third 1 First_Third
3 Fourth 22 Second_Fourth