How to check whether entry in one row appears in another row but order of column values changed?-CodePudding

If I have some dataframe with Firs Name and Last Name columns, how can I check whether below case exist in dataframe and therefore extract all of the such cases:

First Name	Last Name
Albert	Einstein
Einstein	Albert

In short, how can I check whether the criss cross cases exist in dataframe? The task at hand that I am trying to tackle is to locate all of the duplicates in terms of names in a dataframe. I am consider other cases (below), but they can be tackled separately.

Different last names, same first name
Same last name, different first name
Same last name, same first name

Thanks!

CodePudding user response：

You can create an ordered tuple of your columns:

data = {'First Name': ['John', 'John', 'Kennedy'],
        'Middle Name': ['Fitzgerald', 'Kennedy', 'John'],
        'Last Name': ['Kennedy', 'Fitzgerald', 'Fitzgerald']}
df = pd.DataFrame(data)

df['key'] = df.apply(lambda x: tuple(sorted(x)), axis=1)

Output result:

>>> df
  First Name Middle Name   Last Name                          key
0       John  Fitzgerald     Kennedy  (Fitzgerald, John, Kennedy)
1       John     Kennedy  Fitzgerald  (Fitzgerald, John, Kennedy)
2    Kennedy        John  Fitzgerald  (Fitzgerald, John, Kennedy)

Now you have a key to use duplicated or drop_duplicates or whatever you want.

CodePudding user response：

Doing the np.sort and check them with duplicated

df1 = df.copy()
df1[:] = np.sort(df1.values,1)
df1.duplicated(keep=False)
Out[144]: 
0    True
1    True
dtype: bool