I have a data like below:
# df
Col1 Col2 Col3 Col4
AAA
AAA AAA BBB
AAA CCC
I want to concat them without duplicates.
Col1 Col2 Col3 Col4 COMBINE
AAA AAA
AAA AAA BBB AAA/BBB
AAA CCC AAA/CCC
The code I tried:
df["COMBINE"] = df[df.filter(regex='^Col',axis=1).columns].apply(lambda x: '/'.join(pd.unique(x)),axis=1)
But I got:
Col1 Col2 Col3 Col4 COMBINE
AAA AAA/
AAA AAA BBB AAA/BBB/
AAA CCC AAA/CCC/
CodePudding user response:
Try:
df['Combine'] = df.apply(lambda x: '/'.join(x.dropna().unique()), axis=1)
Note: if you have empty string instead of NaN
, then do: df.replace('', np.nan).apply(...)
.
Output:
Col1 Col2 Col3 Col4 Combine
0 AAA None None NaN AAA
1 AAA AAA BBB NaN AAA/BBB
2 AAA CCC None NaN AAA/CCC
CodePudding user response:
You could cleanup after:
df['Combine'] = df.fillna('').apply('/'.join, axis=1).str.replace('/ (?=/|$)', '', regex=True)
How it works:
If one or several /
are followed by another /
or the end of line -> remove
[AAA, None, None, NaN] -> [AAA, '', '', ''] -> AAA/// -> AAA
output:
Col1 Col2 Col3 Col4 Combine
0 AAA None None NaN AAA
1 AAA AAA BBB NaN AAA/AAA/BBB
2 AAA CCC None NaN AAA/CCC
CodePudding user response:
This happen because you have empty string in your columns, you can change code like below and get waht you want:
>>> df.replace('', np.nan, inplace=True)
>>> df["COMBINE"] = df[df.filter(regex='^Col',axis=1).columns].apply(lambda x: '/'.join(x.dropna().unique()),axis=1)
>>> df
Col1 Col2 Col3 Col4 COMBINE
AAA AAA
AAA AAA BBB AAA/BBB
AAA CCC AAA/CCC