Home > Software engineering >  Check if values in a column exist elsewhere in a dataframe row
Check if values in a column exist elsewhere in a dataframe row

Time:12-20

Suppose I have a dataframe as below:

df = pd.DataFrame({'a':[1,2,3,4],'b':[2,3,4,5],'c':[3,4,5,6],'d':[5,3,2,4]})

I want to check if elements in column d exist elsewhere in its corresponding row. So the outcome I want is

[False, True, False, True]

Towards that end, I used

df.apply(lambda x: x['d'] in x[['a','b','c']], axis=1)

but this is somehow giving me [False, False, False, False].

CodePudding user response:

Try:

    out = (df[['a','b','c']].T==df['d']).any()

Output:

    0    False
    1     True
    2    False
    3     True
    dtype: bool

CodePudding user response:

Taking advantage of numpy broadcasting, you can do it easily:

(df[['d']].to_numpy() == df.to_numpy())[:, :-1].any(axis=1)

Output:

array([False,  True, False,  True])

Using numpy is about as fast as you can get.

CodePudding user response:

Use eq to broadcast equality comparison across the columns. Then check if there are any matches in the row:

df.drop(columns='d').eq(df['d'], axis=0).any(axis=1).array
<PandasArray>
[False, True, False, True]
Length: 4, dtype: bool

Speedwise, a numpy option is best.

CodePudding user response:

Try with

df.eq(df.pop('d'),axis=0).any(1)#.values
0    False
1     True
2    False
3     True
dtype: bool
  • Related