Home > Mobile >  Count occurrences of stings in a row Pandas
Count occurrences of stings in a row Pandas

Time:09-01

I'm trying to count the number of instances of a certain sting in a row in a pandas dataframe.

In the example here I utilized a lambda function and pandas .count() to try and count the number of times 'True' exists in each row.

Though instead of a count of 'True' it is just returning a boolean whether or not it exists in the row...

#create dataframe 
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 
'Result': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result1': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result2': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
#count instances of Trus or False in each row
df['Count'] = df.apply(lambda row: row.astype(str).str.count('True').any(), axis=1)
print(df)

The desired outcome is:

Period  Result  Result1 Result2 Count
   1    True    True    True      3
   2    None    None    None      0
   3    False   False   False     0
   4    True    True    True      3
   1    False   False   False     0
   2    True    True    True      3
   3    False   False   False     0
  ...    ...     ...     ...    ......

CodePudding user response:

You can use np.where:

df['count'] = np.where(df == 'True', 1, 0).sum(axis=1)

Regarding why your apply returns a boolean: both any and all returns boolean, not numbers

Edit: You can include df.isin for multiple conditions:

df['count'] = np.where(df.isin(['True', 'False']), 1, 0).sum(axis=1)

CodePudding user response:

Use eq with sum:

df.eq("True").sum(axis=1)

Use apply with lambda function.

df.apply(lambda x: x.eq("True").sum(), axis=1)

For more than 1 text matching try

df.iloc[:,1:].apply(lambda x: x.str.contains("True|False")).sum(axis=1)

CodePudding user response:

Avoiding using the apply function, as it can be slow:

df[["Result", "Result1", "Result2"]].sum(axis=1).str.count("True")

This also will work for when you have strings that are like:

"this sentence contains True"

CodePudding user response:

Your lambda is not working correctly, try this:

import pandas as pd

#create dataframe 
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 
'Result': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result1': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result2': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
#count instances of Trues or False in each row
df['Count'] = df.apply(lambda row: sum(row[1:4] == 'True') ,axis=1)
print(df)

# Output:
# >>     Period Result Result1 Result2  Count
# >> 0        1   True    True    True      3
# >> 1        2   None    None    None      0
# >> 2        3  False   False   False      0
# >> 3        4   True    True    True      3
# >> 4        1  False   False   False      0
# >> 5        2   True    True    True      3
# >> 6        3  False   False   False      0
# >> 7        4   True    True    True      3
# >> 8        1   True    True    True      3
# >> 9        2  False   False   False      0
# >> 10       3  False   False   False      0
# >> 11       4   True    True    True      3
# >> 12       1  False   False   False      0
# >> 13       2   True    True    True      3
# >> 14       3  False   False   False      0
# >> 15       4  False   False   False      0
  • Related