Home > OS >  Python Pandas: Comparing two columns and returning string value if first column include character in
Python Pandas: Comparing two columns and returning string value if first column include character in

Time:07-07

I have 2 columns first with string including X or F and second is empty. If there is any X in column 1 I want to assign 'YES' to second column if there is no X assign 'NO' Every time I run my code it is assigning value 'YES' to all of them

This example how it should look like:

My code:

for row in df['Column2']:
    if df['Column1'].str.contains('X').any():
        df['Column2'] = 'YES'
    else:
        df['Column2'] = 'NO'

CodePudding user response:

You are executing vectorized operation each time through the loop. Every time through the loop you are assigning 'YES' to the entire Column2.

Using numpy you could do:

import numpy as np

df['Column2'] = np.where(df['Column1'].str.contains('X'), 'YES', 'NO')
print(df)

Result

             Column1 Column2
0  ....X.X.X.X..X.X.     YES
1  ....X.X.X.X..X.X.     YES
2  ....X.X.X.X..X.X.     YES
3  ....X.X.X.X..X.X.     YES
4          ....F.F.F‬      NO
5          ....F.F.F      NO
6          ....F.F.F      NO
7          ....F.F.F      NO

CodePudding user response:

you can use regex to find 'X'

if df['Column1'].str.find(r'X')>1:

you can even avoid the loop as follows

(df['Column1'].str.find(r'X')>1).map({True: 'Yes', False: 'No'})

  • Related