Home > Software engineering >  Remove duplicates from lists in columns
Remove duplicates from lists in columns

Time:12-30

I have two columns, Col_A and Col_B, in a dataframe, df.

enter image description here

Col_A                   Col_B
[1.222, 1.222, 1.333]   [cla:pl:dr, cla:pl:dr]
[]                      [clp:dp, xr.ld, xr.ld]
[1.29.1, 1.1, 1.1]      [ru:pun, ru:pun, hm:dm]

I want to remove duplicated values in ea. list of ea. row for Col_A and Col_B as shown below.

enter image description here

type(df['Col_A'][0]) returns list

Examples I've tried return unhashable type errors. Ways I've tried to avoid this error to no avail include:

df['Col_A'].map(lambda x: tuple(set(x)))

How can I solve this problem?

Edit: Copy pasted data.

CodePudding user response:

Looks like You're using string as data.

 data = {'col_A': [['1.222', '1.222', '1.333'], [], ['1.29.1', '1.1', '1.1']] ,
         'col_B': [['cla:pl:dr', 'cla:pl:dr'], ['clp:dp', 'xr.ld', 'xr.ld'], ['ru:pun', 'ru:pun', 'hm:dm']] }

 df = pd.DataFrame(data)
 df['col_A'] = df['col_A'].apply(lambda x: list(set(x)))
 df['col_B'] = df['col_B'].apply(lambda x: list(set(x)))

OutPut DF

       col_A                 col_B
0   [1.222, 1.222, 1.333]   [cla:pl:dr, cla:pl:dr]
1   []                      [clp:dp, xr.ld, xr.ld]
2   [1.29.1, 1.1, 1.1]      [ru:pun, ru:pun, hm:dm]

OutPut DF(after removing duplicates)

    col_A           col_B
0   [1.222, 1.333]  [cla:pl:dr]
1   []              [clp:dp, xr.ld]
2   [1.29.1, 1.1]   [ru:pun, hm:dm]
  • Related