I have a df where column A is either blank or has a string in it. I tried to write the if statement (all columns are strings) below. Basically, if there is something (any value) in df[A], then the new column value will be a concatenation of columns A, B and C. If there is no value in df[A], then it will concatenate columns B and C.
the part where it's idf df[A] returns a true or false value, right? just like if I were to write bool(df[A]). So if the value is true, then it should execute the first block, if not, then it should execute the 'else' block.
if df[A]:
df[new_column] = df[column_A] df[column_B] df[column_C]
else:
df[new_column] = df[column_B] df[column_C]
I get this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
As far as I understand your question, you want to perform the IF-condition for each element. The " " seems to be a string concatenation, since there are strings in df['A'].
In this case, you don't need the IF-condition at all, because adding an empty string to another leads to the same result as not adding the string.
import pandas as pd
d = {'A': ['Mr ', '', 'Mrs '], 'B': ['Max ', 'John ', 'Marie '], 'C': ['Power', 'Doe', 'Curie']}
df = pd.DataFrame(data=d)
df['new'] = df['A'] df['B'] df['C']
Results in:
>>> df
A B C new
0 Mr Max Power Mr Max Power
1 John Doe John Doe
2 Mrs Marie Curie Mrs Marie Curie
In the case that "blank" refers to NaN and not to an empty string you can do the following:
df['new'] = df.apply(lambda x: ''.join(x.dropna().astype(str)), axis=1)
Have a look at this question, which seems to be similar: questions 33098383
CodePudding user response:
this happens because df['A']
returns a object which is Series
and a object with some information can never be false like [0,0,0] or [None] so it will always return true if it is object. And pandas series doesn't allow you to compare it as a boolean as it's ambiguous
so try this:
if df[A].any():
df[new_column] = df[column_A] df[column_B] df[column_C]
else:
df[new_column] = df[column_B] df[column_C]
what this code does is it returns true if there is any value present in whole column. You can use df[A].all() if you need all elements in column to be true.