I have a dataframe df
, which contains two columns, word
and color
. There are only two possible entries for color
("red" or "blue"), but there are many words. Every word appears twice in df
It looks something like:
word color
0 pasta red
1 soup red
2 salad blue
3 candy blue
...
57 candy blue
58 soup red
59 pasta red
60 salad blue
Is there a way to check for consistent mapping between the two columns - i.e., can I check that each word always corresponds to the same color? For example, do both entries of "pasta" correspond to "red", do both entries of "salad" correspond to blue, etc?
CodePudding user response:
You can get all the rows where there is a difference in color using:
>>> df.loc[~df.sort_values(["word", "color"]).duplicated(keep=False)]
Full working example:
#changed your example dataframe to have different colors for salad
>>> df
word color
0 pasta red
1 soup red
2 salad blue
3 candy blue
4 candy blue
5 soup red
6 pasta red
7 salad red
>>> df.loc[~df.sort_values(["word", "color"]).duplicated(keep=False)]
word color
2 salad blue
7 salad red
CodePudding user response:
You can use crosstab
:
out = pd.crosstab(df['word'], df['color'])
out['is_consistent'] = (out['blue'].eq(2) & out['red'].eq(0)) \
| (out['blue'].eq(0) & out['red'].eq(2))
print(out)
# Output:
color blue red is_consistent
word
candy 2 0 True
pasta 0 2 True
salad 1 1 False
soup 0 2 True
CodePudding user response:
You can try the .all()
method to check if all counts of the values are equal to 2
:
import pandas as pd
df = pd.DataFrame({"word": ["pasta", "soup", "salad", "candy", "candy", "soup", "pasta", "salad"],
"color": ["red", "red", "blue", "blue", "blue", "red", "red", "blue"]})
if (df.value_counts() == 2).all():
print("consistent mapping")
else:
print("inconsistent mapping")
Output:
consistent mapping