Home > other >  Concat strings in multiple columns without duplicates
Concat strings in multiple columns without duplicates

Time:10-22

I have a data like below:

# df
Col1   Col2   Col3   Col4
AAA
AAA    AAA    BBB
AAA    CCC

I want to concat them without duplicates.

Col1   Col2   Col3   Col4   COMBINE
AAA                         AAA
AAA    AAA    BBB           AAA/BBB
AAA    CCC                  AAA/CCC

The code I tried:

df["COMBINE"] = df[df.filter(regex='^Col',axis=1).columns].apply(lambda x: '/'.join(pd.unique(x)),axis=1)

But I got:

Col1   Col2   Col3   Col4   COMBINE
AAA                         AAA/
AAA    AAA    BBB           AAA/BBB/
AAA    CCC                  AAA/CCC/

CodePudding user response:

Try:

df['Combine'] = df.apply(lambda x: '/'.join(x.dropna().unique()), axis=1)

Note: if you have empty string instead of NaN, then do: df.replace('', np.nan).apply(...).

Output:

  Col1  Col2  Col3  Col4  Combine
0  AAA  None  None   NaN      AAA
1  AAA   AAA   BBB   NaN  AAA/BBB
2  AAA   CCC  None   NaN  AAA/CCC

CodePudding user response:

You could cleanup after:

df['Combine'] = df.fillna('').apply('/'.join, axis=1).str.replace('/ (?=/|$)', '', regex=True)

How it works: If one or several / are followed by another / or the end of line -> remove

[AAA, None, None, NaN] -> [AAA, '', '', ''] -> AAA/// -> AAA

output:

  Col1  Col2  Col3  Col4      Combine
0  AAA  None  None   NaN          AAA
1  AAA   AAA   BBB   NaN  AAA/AAA/BBB
2  AAA   CCC  None   NaN      AAA/CCC

CodePudding user response:

This happen because you have empty string in your columns, you can change code like below and get waht you want:

>>> df.replace('', np.nan, inplace=True)
>>> df["COMBINE"] = df[df.filter(regex='^Col',axis=1).columns].apply(lambda x: '/'.join(x.dropna().unique()),axis=1)

>>> df

Col1   Col2   Col3   Col4   COMBINE
AAA                         AAA
AAA    AAA    BBB           AAA/BBB
AAA    CCC                  AAA/CCC
  • Related