Tring to conditionally execute a query, only when ColumnA = 'New' and ColumnB = 'Left' (in each individual row). I know that str.contains() works when I only have 1 condition, however, I'm getting a ValueError ("ValueError: The truth value of a Series is ambiguous..."). Can this approach not be successfully applied, for my given scenario? Alternatively, is there a better approach?
Current code:
if df1['ColumnA'].str.contains('New') and df1['ColumnB'].str.contains('Left'):
do something...
Very basic example of the dataframe:
ColumnA | Column B |
---|---|
New | Left |
Used | Right |
Scrap | Down |
New | Right |
First row would be the desired row to carry forward (since it meets the criteria).
CodePudding user response:
You have the right idea, however it doesn't appear like your code is expressing exactly what you want to do.
df1['ColumnA'].str.contains('New')
will return a Series with true and false values corresponding to the indices where the condition is true, not a true or false value for whether the entire column contains 'new'. To accomplish this consider doing something like the following:
'new' in df['ColumnA'].values
If you are trying to do it on a row by row basis then you must use the bitwise operator to compare truth values across Series (&).
This will return a boolean like you expected, hopefully this helps (:
CodePudding user response:
Use bitwise &
on two mask arrays and generate another column.
>>> import pandas as pd
>>> df = pd.DataFrame({'A':['New','Used','Scrap','New'], 'B':['Left','Right','Down','Right']})
>>> df
A B
0 New Left
1 Used Right
2 Scrap Down
3 New Right
>>> df['C'] = df['A'].str.contains('New') & df['B'].str.contains('Left')
>>> df
A B C
0 New Left True
1 Used Right False
2 Scrap Down False
3 New Right False
>>>