Home > database >  Using If statement with string values in Python
Using If statement with string values in Python

Time:12-30

I have a df where column A is either blank or has a string in it. I tried to write the if statement (all columns are strings) below. Basically, if there is something (any value) in df[A], then the new column value will be a concatenation of columns A, B and C. If there is no value in df[A], then it will concatenate columns B and C.

the part where it's idf df[A] returns a true or false value, right? just like if I were to write bool(df[A]). So if the value is true, then it should execute the first block, if not, then it should execute the 'else' block.

if df[A]:
     df[new_column] = df[column_A]   df[column_B]   df[column_C]
else: 
     df[new_column] = df[column_B] df[column_C]

I get this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

As far as I understand your question, you want to perform the IF-condition for each element. The " " seems to be a string concatenation, since there are strings in df['A'].

In this case, you don't need the IF-condition at all, because adding an empty string to another leads to the same result as not adding the string.

import pandas as pd

d = {'A': ['Mr ', '', 'Mrs '], 'B': ['Max ', 'John ', 'Marie '], 'C': ['Power', 'Doe', 'Curie']}
df = pd.DataFrame(data=d)

df['new'] = df['A']   df['B']   df['C']

Results in:

>>> df
      A       B      C              new
0   Mr     Max   Power     Mr Max Power
1         John     Doe         John Doe
2  Mrs   Marie   Curie  Mrs Marie Curie

In the case that "blank" refers to NaN and not to an empty string you can do the following:

df['new'] = df.apply(lambda x: ''.join(x.dropna().astype(str)), axis=1)

Have a look at this question, which seems to be similar: questions 33098383

CodePudding user response:

this happens because df['A'] returns a object which is Series and a object with some information can never be false like [0,0,0] or [None] so it will always return true if it is object. And pandas series doesn't allow you to compare it as a boolean as it's ambiguous

so try this:

if df[A].any():
     df[new_column] = df[column_A]   df[column_B]   df[column_C]
else: 
     df[new_column] = df[column_B] df[column_C]

what this code does is it returns true if there is any value present in whole column. You can use df[A].all() if you need all elements in column to be true.

  • Related