There is this dataframe called frame
with columns: Age
, Maturity
, Gender
, Height
which has values 'PASSED
' or 'FAILED
'.
I want to create a new column called result
and count the number of a subset of the overall columns as seen in check_columns
which has the value PASSED
.
I tried to use a comprehension expression which as seen in this case I wanted it to be evaluated to 2 since the columns Age
and Maturity
have PASSED
while Gender
has FAILED
which are present in a subset of columns in check_columns
.
frame = pd.DataFrame(
data = {'Age':['PASSED'], 'Maturity':['PASSED'],'Gender':['FAILED'], 'Height':['PASSED']}
)
check_columns = ['Age','Maturity','Gender']
frame['result'] = sum([1 if frame[column] =='PASSED' else 0 for column in check_columns ])
I tried to used a comprehension with a list but it says this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
Select only the desired columns, check which ones are equal to 'PASSED' using DataFrame.eq
, and count the True values using DataFrame.sum
(pass axis=1
to count column-wise)
import pandas as pd
frame = pd.DataFrame(
data = {'Age':['PASSED'], 'Maturity':['PASSED'],'Gender':['FAILED'], 'Height':['PASSED']}
)
check_columns = ['Age','Maturity','Gender']
frame['results'] = frame[check_columns].eq('PASSED').sum(axis=1)
Output:
>>> frame
Age Maturity Gender Height results
0 PASSED PASSED FAILED PASSED 2