Home > Blockchain >  Identify instances where string exists more than once in a row Python, Pandas, Dataframe
Identify instances where string exists more than once in a row Python, Pandas, Dataframe

Time:02-18

I'm want to write a script that will identify instances where a word (string) appears in a row of a pandas dataframe more than once.

Using a lambda function I can identify the existence of a string in a row but but I can't find any information on how to identify '2 or more' instances of the string, this is an example of what I have currently:

df = pd.DataFrame({'ID':[1,2,3],'Ans1':['Yes','Yes','Yes'],'Ans2':['No','Yes','No'],'Ans3':['No','No','No']})
df['Result'] = df.apply(lambda row: row.astype(str).str.contains('Yes').any(), axis=1)

df

Pseudocode for what I'm trying to get:

if 'Yes' isin row > 1:
   df['Results'] == True

Desired result:

ID  Ans1    Ans2    Ans3    Result
1   Yes     No      No      False
2   Yes     Yes     No      True
3   Yes     No      No      False

CodePudding user response:

Try, you can do column filtering if you don't want to check the entire dataframe for yes, then use eq, equals to, and sum with axis=1 to sum values along rows then check to see if that sum is gt, greater than, 1:

df['Result'] = df.eq('Yes').sum(1).gt(1)

Output:

   ID Ans1 Ans2 Ans3  Result
0   1  Yes   No   No   False
1   2  Yes  Yes   No    True
2   3  Yes   No   No   False

CodePudding user response:

You could also do:

df['Result'] = df[df == 'Yes'].count(axis=1).gt(1)

CodePudding user response:

This code should do the trick for your specific case. It quite literally implements your pseudocode to every row.

def check_row(row):
    count = 0
    for i in row:
        if i == 'Yes':
            count  = 1
    if count > 1:
        return True
    else:
        return False
df['Results'] = df.apply(check_row, axis=1)
  • Related