Home > Net >  Check if two categorical variables are virtually same
Check if two categorical variables are virtually same

Time:11-25

I have two string columns in a Pandas dataframe.

What I would like to check is if two rows have the same value in one column, then they have the same value in the other column.

idx  col1  col2
1    A     X
2    B     Y
3    B     Y
4    A     X
5    C     Z

In the above example, col1 and col2 have different values, but two columns are virtually the same thing because both columns can be divided to indices {1,4}, {2,3}, and {5}.

idx  col1  col2
1    A     X
2    B     X
3    B     Y
4    A     X
5    C     Z

In the above table, it does not meet the requirement. How can I check if two columns meet this requirement in Pandas or other python libraries?

CodePudding user response:

Compare factorized columns if all Trues:

same = np.all(pd.factorize(df['col1'])[0] == pd.factorize(df['col2'])[0])
print (same)
True
  • Related