I have a dataframe like
animal, small_animal, count
cat, dog, 1
cat, moo, 2
dog, cat, 3
moo, moo, 5
squirrel, moo, 1
moo, cat, 3
I would like to have stored together
cat, dog
and dog, cat
So I need to check if something both appears but in different "orders" in the column and keep the third column. I thought of multiple different dataframes or a dictionary.
So far I did a groupby
but I still can't work around other things.
CodePudding user response:
You could create a new column with the label
df["label_col"] = df[["animal", "small_animal"]].apply(
lambda x: "-".join(sorted(x)), axis=1
)
"""
Output
animal small_animal count label_col
0 cat dog 1 cat-dog
1 cat moo 2 cat-moo
2 dog cat 3 cat-dog
3 moo moo 5 moo-moo
4 squirrel moo 1 moo-squirrel
5 moo cat 3 cat-moo
"""
Then you can group by or do whatever you want with the ordered key label_col
CodePudding user response:
You can try to compare the two columns joined in different order and filter out the same animals in two columns.
m = (df['animal'] df['small_animal']).isin(df['small_animal'] df['animal'])
out = df[m & df['animal'].ne(df['small_animal'])]
print(out)
animal small_animal count
0 cat dog 1
1 cat moo 2
2 dog cat 3
5 moo cat 3
CodePudding user response:
Records with different names which occur in reflected form
names = ['animal', 'small_animal']
# include all pairs of animal names which occur in reflected form
is_reflected = pd.Index(df[names]).isin(pd.Index(df[reversed(names)]))
# exclude records where names are duplicated, sort of ('moo', 'moo') pairs
is_different = df.animal != df.small_animal
# extract counts for records with reflected and different names
df[is_reflected & is_different]['count']