Home > Net >  Check consistent mapping between two columns in python
Check consistent mapping between two columns in python

Time:11-19

I have a dataframe df, which contains two columns, word and color. There are only two possible entries for color ("red" or "blue"), but there are many words. Every word appears twice in df

It looks something like:

  word   color
0 pasta  red
1 soup   red
2 salad  blue
3 candy  blue
...
57 candy blue
58 soup  red
59 pasta red
60 salad blue

Is there a way to check for consistent mapping between the two columns - i.e., can I check that each word always corresponds to the same color? For example, do both entries of "pasta" correspond to "red", do both entries of "salad" correspond to blue, etc?

CodePudding user response:

You can get all the rows where there is a difference in color using:

>>> df.loc[~df.sort_values(["word", "color"]).duplicated(keep=False)]
Full working example:
#changed your example dataframe to have different colors for salad
>>> df
    word color
0  pasta   red
1   soup   red
2  salad  blue
3  candy  blue
4  candy  blue
5   soup   red
6  pasta   red
7  salad   red

>>> df.loc[~df.sort_values(["word", "color"]).duplicated(keep=False)]
    word color
2  salad  blue
7  salad   red

CodePudding user response:

You can use crosstab:

out = pd.crosstab(df['word'], df['color'])
out['is_consistent'] = (out['blue'].eq(2) & out['red'].eq(0)) \
                       | (out['blue'].eq(0) & out['red'].eq(2))
print(out)

# Output:

color  blue  red  is_consistent
word                           
candy     2    0           True
pasta     0    2           True
salad     1    1          False
soup      0    2           True

CodePudding user response:

You can try the .all() method to check if all counts of the values are equal to 2:

import pandas as pd

df = pd.DataFrame({"word": ["pasta", "soup", "salad", "candy", "candy", "soup", "pasta", "salad"],
                   "color": ["red", "red", "blue", "blue", "blue", "red", "red", "blue"]})

if (df.value_counts() == 2).all():
    print("consistent mapping")
else:
    print("inconsistent mapping")

Output:

consistent mapping
  • Related