Home > OS >  How to remove duplicate elements from numpy arrays in columns in pandas?
How to remove duplicate elements from numpy arrays in columns in pandas?

Time:11-16

I have following dataset and numpy array in column B and I want to make "new_column" by removing the duplicated elements of arrays in column B as shown.

A   B                                            new Column
1   ["A","a","123","123","A"]                    ["A","a","123"]  
2   ["abc","a","1234","123","abc"]               ["abc","a","1234","123"]
3   ["abcd","abcd","abcd"]                       ["abcd"]
4   ["hello","mello"]                            ["hello","mello"]
5   ["hi","hi","why"]                            ["hi","why"]

I am using following codes but they are not giving the desired output.Please help.

def u_value(a):
   return np.unique(a)

or

def ddpe(a):
    a=list(dict.fromkeys(a))
    return a

CodePudding user response:

Here is problem values are not lists, but strings, so use ast.literal_eval for lists:

import ast

def ddpe(a):
   return list(dict.fromkeys(ast.literal_eval(a)))

df['new Column'] = df['B'].apply(ddpe)
  • Related