Home > Software design >  Check for identical lists rowise in pandas dataframe
Check for identical lists rowise in pandas dataframe

Time:11-03

Hi i have a column of lists and i want to return the rows where the lists are identical, having the same order.


d = {'id':[1,2,3], 'lst' : [['GG','PP', 'DD'],['DD','PP', 'GG'], ['GG','PP', 'DD']]}

dd = pd.DataFrame(d)
print(dd)
    id       lst
0   1   [GG, PP, DD]
1   2   [DD, PP, GG]
2   3   [GG, PP, DD]

I do this but i get the wrong output

dd[dd.apply(lambda row: row.lst==row.lst, axis=1)]
    id       lst
0   1   [GG, PP, DD]
1   2   [DD, PP, GG]
2   3   [GG, PP, DD]

My desired output is this

   id       lst
0   1   [GG, PP, DD]
2   3   [GG, PP, DD]

CodePudding user response:

Use Series.duplicated with keep=False with tuples from lists:

df = dd[dd['lst'].apply(tuple).duplicated(keep=False)]
print (df)
   id           lst
0   1  [GG, PP, DD]
2   3  [GG, PP, DD]

CodePudding user response:

interesting when lst is a list of a list the dataframe duplicated says it is not hashable. I think pandas should modify duplicated allow list of a list by converting the list of lists to a list of tuples, as the accepted answer.

def find_duplicate(df, col):
    """
      df: dataframe
      col: column name
     """
     df_dup = df[df.duplicated(subset=col,keep=False)]
     return df_dup

 d = {'id':[1,2,3], 'lst' : ['c','b','c']}

 dd = pd.DataFrame(d)
 print(dd)

 print(find_duplicate(dd, 'lst'))

output:

   id lst
0   1   c
2   3   c
  • Related