I'm trying to count the number of instances of a certain sting in a row in a pandas dataframe.
In the example here I utilized a lambda
function and pandas .count()
to try and count the number of times 'True' exists in each row.
Though instead of a count of 'True' it is just returning a boolean whether or not it exists in the row...
#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'Result': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result1': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result2': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
#count instances of Trus or False in each row
df['Count'] = df.apply(lambda row: row.astype(str).str.count('True').any(), axis=1)
print(df)
The desired outcome is:
Period Result Result1 Result2 Count
1 True True True 3
2 None None None 0
3 False False False 0
4 True True True 3
1 False False False 0
2 True True True 3
3 False False False 0
... ... ... ... ......
CodePudding user response:
You can use np.where
:
df['count'] = np.where(df == 'True', 1, 0).sum(axis=1)
Regarding why your apply
returns a boolean: both any and all
returns boolean, not numbers
Edit: You can include df.isin
for multiple conditions:
df['count'] = np.where(df.isin(['True', 'False']), 1, 0).sum(axis=1)
CodePudding user response:
Use eq
with sum
:
df.eq("True").sum(axis=1)
Use apply
with lambda
function.
df.apply(lambda x: x.eq("True").sum(), axis=1)
For more than 1 text matching try
df.iloc[:,1:].apply(lambda x: x.str.contains("True|False")).sum(axis=1)
CodePudding user response:
Avoiding using the apply function, as it can be slow:
df[["Result", "Result1", "Result2"]].sum(axis=1).str.count("True")
This also will work for when you have strings that are like:
"this sentence contains True"
CodePudding user response:
Your lambda is not working correctly, try this:
import pandas as pd
#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'Result': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result1': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False'],
'Result2': ['True','None','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
#count instances of Trues or False in each row
df['Count'] = df.apply(lambda row: sum(row[1:4] == 'True') ,axis=1)
print(df)
# Output:
# >> Period Result Result1 Result2 Count
# >> 0 1 True True True 3
# >> 1 2 None None None 0
# >> 2 3 False False False 0
# >> 3 4 True True True 3
# >> 4 1 False False False 0
# >> 5 2 True True True 3
# >> 6 3 False False False 0
# >> 7 4 True True True 3
# >> 8 1 True True True 3
# >> 9 2 False False False 0
# >> 10 3 False False False 0
# >> 11 4 True True True 3
# >> 12 1 False False False 0
# >> 13 2 True True True 3
# >> 14 3 False False False 0
# >> 15 4 False False False 0