I would like to assign True/False
value to each row in my pandas dataframe if the row contains certain values for columns that I specified. For example, suppose I had the following dataframe:
d = {
"col1": ["alpha", "beta", "alpha", "gamma", "alpha"],
"col2": [1, 2, 2, 3, 1],
"col3": ["a", "a", "b", "c", "d"],
}
df = pd.DataFrame(d)
df
col1 col2 col3
0 alpha 1 a
1 beta 2 a
2 alpha 2 b
3 gamma 3 c
4 alpha 1 d
I know that if I wanted to create a row mask using values for a single column I can use pandas's isin
. For example, if I want all rows that contain alpha
in col1
, I can do the following: df['col1'].isin(["alpha"])
.
0 True
1 False
2 False
3 False
4 True
How do I get similar behaviour but with multiple columns? What if I wanted to set all rows that have alpha
in col1
AND 1
in col2
to True
and everything else False
?
I would like to have a function that takes in a dataframe and a dictionary as an input. The dictionary would have keys corresponding to column names and values as lists of values that I want. The function returns a pandas Series of booleans where each row is True
if the corresponding row in the dataframe contains values in the dictionary and False
otherwise. For example:
def multi_column_isin(df, d):
<implementation>
s = pandas Series where each row is True if it has values in d else False
return s
s = multi_column_isin(df, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
s
0 True
1 False
2 False
3 True
4 True
CodePudding user response:
You can script a function to do this:
def multi_column_isin(df, d):
# get rid of the index to avoid index alignment
booleans = [df[col].isin(arr).array for col, arr in d.items()]
return np.logical_and.reduce(booleans)
multi_column_isin(df, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
array([ True, False, False, True, True])
# alternative route with pipe
df.pipe(multi_column_isin, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
array([ True, False, False, True, True])
A simpler way would be to pass a dictionary to isin
- sometimes though, this can result in incorrect/unwanted output, depending on index alignment:
(df
.drop(columns='col3')
.isin({"col1": ["gamma", "alpha"], "col2": [1, 3]})
.all(1)
)
0 True
1 False
2 False
3 True
4 True
dtype: bool
CodePudding user response:
IIUC:
(df.col1 == 'alpha') & (df.col2 == 1)
Should get you there.