I have two string columns in a Pandas dataframe.
What I would like to check is if two rows have the same value in one column, then they have the same value in the other column.
idx col1 col2
1 A X
2 B Y
3 B Y
4 A X
5 C Z
In the above example, col1 and col2 have different values, but two columns are virtually the same thing because both columns can be divided to indices {1,4}, {2,3}, and {5}.
idx col1 col2
1 A X
2 B X
3 B Y
4 A X
5 C Z
In the above table, it does not meet the requirement. How can I check if two columns meet this requirement in Pandas or other python libraries?
CodePudding user response:
Compare factorize
d columns if all True
s:
same = np.all(pd.factorize(df['col1'])[0] == pd.factorize(df['col2'])[0])
print (same)
True