I want to count the number of rows of a pandas DataFrame where the column values of certain columns is True.
For example in the following sample DataFrame:
import pandas as pd
from pandas import DataFrame
names = {'First_name': ['Jon','Bill','Maria','Emma'], 'Last_name': ['Bobs', 'Vest', 'Gong', 'Hill'],
'Roll': ['Absent', 'Present', 'Present', 'Absent']}
df = DataFrame(names)
keys = ['Jon', 'Maria', 'Gong', 'Hill', 'Present', 'No']
pattern = r"(?i)" "|".join(keys)
df['bool1'] = df['First_name'].str.contains(pattern)
df['bool2'] = df['Last_name'].str.contains(pattern)
df
output:
First_name Last_name Roll bool1 bool2
0 Jon Bobs Absent True False
1 Bill Vest Present False False
2 Maria Gong Present True True
3 Emma Hill Absent False True
I want to get a total count of the rows where either the values of the column 'bool1' or the column 'bool2' are True. That is, I should get the final sum equal to 3.
I have tried the following code, but it adds up the rows individually.
df.loc[(df['bool1'] == True) | (df['bool2'] == True)].sum()
I have also tried an if statement, but it doesn't seem to be correct.
if (df['bool1'] == True) and (df['bool2'] == True):
len(df.index)
I would really appreciate it if someone could help fix it. Thank you in advance.
CodePudding user response:
What you want may be the length of filtered dataframe
len(df[(df['bool1'] == True) | (df['bool2'] == True)])
# or
len(df[(df['bool1']) | (df['bool2'])])
You can try any
along bool
like columns
out = df.filter(like='bool').any(axis=1).sum()
print(out)
3