I want to create a dataframe column using a function using if/elif/else. A simplified example of my code is below. The main difference is the real code adds at least 20 different elif statements--hence, although I might be able to use nested np.where() or np.select(), I would prefer not to.
def func(row):
if row['condition']==True:
if row['summary'].str.contains('hi|there',case=False):
return 'hi_there'
else:
return 'Other'
else:
if row['summary'].str.contains('goodbye|you',case=False):
return 'goodbye_you'
else:
return 'Other'
df['newcolumn'] = df.apply(lambda row: func(row), axis=1)
I get this error message:
AttributeError: 'str' object has no attribute 'str'.
Is it possible to create my column using this method, but with a few additional tweaks? If it's not possible in Python, why?
CodePudding user response:
The value stored in row
at key 'summary'
is a String, which has no attribute .str
. Try getting rid of .str
and see if it works.
CodePudding user response:
You use the 'str' to access vectorized string functions on a column. With apply you iterate trough every row.
You can rewrite your function that you use within your apply. In this case you need something like this:
'hi' in lower(row['summary'])
or a regex as you did
import re
...
re.search('hi|there', row['summary'], re.IGNORECASE)
Better I think would be to actually use those vectorized functions as you started, just not within the apply, but on the dataframe columns directly
df['new_column'] = 'Other'
df.loc[(row['condition']==True)&(row['summary'].str.contains('hi|there',case=False)),'new_column'] = 'hi_there'
df.loc[(row['condition']==False)&(row['summary'].str.contains('goodbye|you',case=False)),'new_column'] = 'goodbye_you'