I have a DataFrame in Pandas where I want to check if any of several columns are equal to several others. For example:
mydf = pd.DataFrame({'a1': [1, 2, 3, 4, 5], 'a2': [6, 7, 8, 9, 10], 'a3': [2, 3, 4, 5, 6], 'b1': [1, 7, 2, 3, 4], 'b2': [8, 9, 4, 2, 2], 'b3': [1, 3, 3, 3, 5]})
a1 a2 a3 b1 b2 b3
0 1 6 2 1 8 1
1 2 7 3 7 9 3
2 3 8 4 2 4 3
3 4 9 5 3 2 3
4 5 10 6 4 2 5
I want to check if any of the 'a' columns are equal to any of the 'b' columns producing this DataFrame:
a1 a2 a3 b1 b2 b3 x
0 1 6 2 1 8 1 1
1 2 7 3 7 9 3 1
2 3 8 4 2 4 3 1
3 4 9 5 3 2 3 0
4 5 10 6 4 2 5 1
Is there a nice way to do this? I only currently know quite a nasty way with loops. Thanks a lot.
CodePudding user response:
One way is to use list comprehension:
print ([any(set(x)&set(y)) for x, y in zip(df.filter(like="a").values, df.filter(like="b").values)])
[True, True, True, False, True]
CodePudding user response:
We can use numpy broadcasting to compare the values:
a = mydf.filter(like='a').to_numpy()
b = mydf.filter(like='b').to_numpy()
mydf['x'] = (a[None, :].T == b).any(2).any(0)
a1 a2 a3 b1 b2 b3 x
0 1 6 2 1 8 1 True
1 2 7 3 7 9 3 True
2 3 8 4 2 4 3 True
3 4 9 5 3 2 3 False
4 5 10 6 4 2 5 True