Python, Remove duplicate values from dataframe column of lists-CodePudding

I've got a dataframe column containing lists, and I want to remove duplicate values from the individual lists.

d = {'colA': [['UVB', 'NER', 'GGR', 'NER'], ['KO'], ['ERK1', 'ERK1', 'ERK2'], []]}
df = pd.DataFrame(data=d)

I want to remove the duplicate 'NER' and 'ERK1' from the lists.

I've tried:

df['colA'] = set(tuple(df['colA']))

I get the error message: TypeError: unhashable type: 'list'

CodePudding user response：

problem is that you have a tuple of lists, thats why set command doesnt work. You should iterate over entire tuple.

ans = tuple(df['colA']) for i in range(len(ans)) df['colA'].iloc[i]=set(ans[i])

CodePudding user response：

You can remove duplicates values from the list using apply() method of pandas function as follows.

import pandas as pd
d = {'colA': [['UVB', 'NER', 'GGR', 'NER'], ['KO'], ['ERK1', 'ERK1', 'ERK2'], []]}
df = pd.DataFrame(data=d)

df['colA'].apply(lambda x: list(set(x)))

#output
0    [NER, UVB, GGR]
1               [KO]
2       [ERK2, ERK1]
3                 []
Name: colA, dtype: object