I have a Dataframe of 3M rows. I would like to create a mask which returns the indices of rows and values where 3 columns are of equal value.
I thought this should work:
mask = df.loc[ df['column_1'] == df['column_2'] & df['column_2'] == df['column_3'] & df['column_1'] == df['column_3'] ]
However I get the truth value of a series is ambiguous
I tried this but get the same error:
mask = df.loc[ np.any(df['column_1'] == df['column_2'] & df['column_2'] == df['column_3'] & df['column_1'] == df['column_3']) ]
mask = df.loc[ np.where(df['column_1'] == df['column_2'] & df['column_2'] == df['column_3'] & df['column_1'] == df['column_3']) ]
I appreciate the help with this
CodePudding user response:
For the dataframe:
df = (pd.DataFrame()
.assign(a=['hej', 'ful'],
b=['hej', 'ful'],
c=['hej', 'ful']))
You can use:
(df
.assign(same=lambda x: (x.a == x.b) & (x.b == x.c))
.loc[lambda x: x.same == True]
)
CodePudding user response:
place the conditions within parenthesis to ensure precedence of operations i.e., column1 and colums2 are compared for equality instead of column 2 "and" column2
mask = df.loc[
(df['column_1'] == df['column_2']) &
(df['column_2'] == df['column_3']) &
(df['column_1'] == df['column_3']) ]
CodePudding user response:
Another possible solution:
df.loc[df.apply(lambda x: (x == x.min()).all(), axis=1)]
Output:
col1 col2 col3
1 5 5 5
3 3 3 3
Data:
from io import StringIO
text = """
col1 col2 col3
1 3 2
5 5 5
2 5 1
3 3 3
3 2 5
"""
df = pd.read_csv(StringIO(text), sep='\s ')
df = df.astype(str)
Complete code:
import pandas as pd
import numpy as np
from io import StringIO
text = """
col1 col2 col3
1 3 2
5 5 5
2 5 1
3 3 3
3 2 5
"""
df = pd.read_csv(StringIO(text), sep='\s ')
df = df.astype(str)
df.loc[df.apply(lambda x: (x == x.min()).all(), axis=1)]