I have a dataframe with duplicate values in either list or string format.
df = Name Email years score
john [[email protected],[email protected], [email protected]] 8 good
[devan,smith ,devan] [[email protected]] [8,6,8] good
I want to remove duplicate values within that particular cell, not to compare corresponding to different cells.
df_updated = Name Email years score
john [[email protected],[email protected]] 8 good
[devan,smith] [[email protected]] [8,6] good
CodePudding user response:
Use DataFrame.applymap
for elementwise processing with custom function for remove duplicates if type is list
:
df = pd.DataFrame({'Name':['John', ['aa','devan','smith','devan']],
'years':[8, [8,6,8]]})
print (df)
Name years
0 John 8
1 [aa, devan, smith, devan] [8, 6, 8]
df1 = df.applymap(lambda x: list(dict.fromkeys(x)) if isinstance(x, list) else x)
print (df1)
Name years
0 John 8
1 [aa, devan, smith] [8, 6]
If ordering is not important use set
s:
df2 = df.applymap(lambda x: list(set(x)) if isinstance(x, list) else x)
print (df2)
Name years
0 John 8
1 [devan, smith, aa] [8, 6]
CodePudding user response:
Without the main dataframe, it is hard to guess how your dataframe functions. Anyway, here is what you probably need:
df["Email"].apply(set)
Note that Email column should be list. If you are interested in removing duplicated from other columns, let's say Name column, try replacing Name in place of Email in the abovementioned cell.