Pandas Create mask conditioned on 3 columns-CodePudding

I have a Dataframe of 3M rows. I would like to create a mask which returns the indices of rows and values where 3 columns are of equal value.

I thought this should work:

mask =  df.loc[ df['column_1'] == df['column_2'] & df['column_2'] == df['column_3'] & df['column_1'] == df['column_3'] ]

However I get the truth value of a series is ambiguous

I tried this but get the same error:

mask = df.loc[ np.any(df['column_1'] == df['column_2'] & df['column_2'] == df['column_3'] & df['column_1'] == df['column_3']) ]

mask = df.loc[ np.where(df['column_1'] == df['column_2'] & df['column_2'] == df['column_3'] & df['column_1'] == df['column_3']) ]

I appreciate the help with this

CodePudding user response：

For the dataframe:

df = (pd.DataFrame()
     .assign(a=['hej', 'ful'],
             b=['hej', 'ful'],
             c=['hej', 'ful']))

You can use:

(df
 .assign(same=lambda x: (x.a == x.b) & (x.b == x.c))
 .loc[lambda x: x.same == True]
)

CodePudding user response：

place the conditions within parenthesis to ensure precedence of operations i.e., column1 and colums2 are compared for equality instead of column 2 "and" column2

mask =  df.loc[ 
    (df['column_1'] == df['column_2']) & 
    (df['column_2'] == df['column_3']) & 
    (df['column_1'] == df['column_3']) ]

CodePudding user response：

Another possible solution:

df.loc[df.apply(lambda x: (x == x.min()).all(), axis=1)]

Output:

   col1  col2  col3
1     5     5     5
3     3     3     3

Data:

from io import StringIO

text = """
col1    col2    col3
1   3   2
5   5   5
2   5   1
3   3   3
3   2   5
"""

df = pd.read_csv(StringIO(text), sep='\s ')
df = df.astype(str)

Complete code:

import pandas as pd
import numpy as np
from io import StringIO

text = """
col1    col2    col3
1   3   2
5   5   5
2   5   1
3   3   3
3   2   5
"""

df = pd.read_csv(StringIO(text), sep='\s ')
df = df.astype(str)

df.loc[df.apply(lambda x: (x == x.min()).all(), axis=1)]