I have two columns, Col_A and Col_B, in a dataframe, df.
Col_A Col_B
[1.222, 1.222, 1.333] [cla:pl:dr, cla:pl:dr]
[] [clp:dp, xr.ld, xr.ld]
[1.29.1, 1.1, 1.1] [ru:pun, ru:pun, hm:dm]
I want to remove duplicated values in ea. list of ea. row for Col_A and Col_B as shown below.
type(df['Col_A'][0])
returns list
Examples I've tried return unhashable type errors. Ways I've tried to avoid this error to no avail include:
df['Col_A'].map(lambda x: tuple(set(x)))
How can I solve this problem?
Edit: Copy pasted data.
CodePudding user response:
Looks like You're using string as data.
data = {'col_A': [['1.222', '1.222', '1.333'], [], ['1.29.1', '1.1', '1.1']] ,
'col_B': [['cla:pl:dr', 'cla:pl:dr'], ['clp:dp', 'xr.ld', 'xr.ld'], ['ru:pun', 'ru:pun', 'hm:dm']] }
df = pd.DataFrame(data)
df['col_A'] = df['col_A'].apply(lambda x: list(set(x)))
df['col_B'] = df['col_B'].apply(lambda x: list(set(x)))
OutPut DF
col_A col_B
0 [1.222, 1.222, 1.333] [cla:pl:dr, cla:pl:dr]
1 [] [clp:dp, xr.ld, xr.ld]
2 [1.29.1, 1.1, 1.1] [ru:pun, ru:pun, hm:dm]
OutPut DF(after removing duplicates)
col_A col_B
0 [1.222, 1.333] [cla:pl:dr]
1 [] [clp:dp, xr.ld]
2 [1.29.1, 1.1] [ru:pun, hm:dm]