Home > database >  How to count number of rows of a pandas DataFrame given conditions
How to count number of rows of a pandas DataFrame given conditions

Time:08-11

I want to count the number of rows of a pandas DataFrame where the column values of certain columns is True.

For example in the following sample DataFrame:

import pandas as pd
from pandas import DataFrame

names = {'First_name': ['Jon','Bill','Maria','Emma'], 'Last_name': ['Bobs', 'Vest', 'Gong', 'Hill'],
        'Roll': ['Absent', 'Present', 'Present', 'Absent']}

df = DataFrame(names)
keys = ['Jon', 'Maria', 'Gong', 'Hill', 'Present', 'No']

pattern = r"(?i)"   "|".join(keys)
df['bool1'] = df['First_name'].str.contains(pattern)
df['bool2'] = df['Last_name'].str.contains(pattern)
df

output:

    First_name  Last_name   Roll    bool1   bool2
0   Jon         Bobs        Absent  True    False
1   Bill        Vest        Present False   False
2   Maria       Gong        Present True    True
3   Emma        Hill        Absent  False   True

I want to get a total count of the rows where either the values of the column 'bool1' or the column 'bool2' are True. That is, I should get the final sum equal to 3.

I have tried the following code, but it adds up the rows individually.

df.loc[(df['bool1'] == True) | (df['bool2'] == True)].sum()

I have also tried an if statement, but it doesn't seem to be correct.

if (df['bool1'] == True) and (df['bool2'] == True):
        len(df.index)

I would really appreciate it if someone could help fix it. Thank you in advance.

CodePudding user response:

What you want may be the length of filtered dataframe

len(df[(df['bool1'] == True) | (df['bool2'] == True)])
# or
len(df[(df['bool1']) | (df['bool2'])])

You can try any along bool like columns

out = df.filter(like='bool').any(axis=1).sum()
print(out)

3
  • Related