using if, elif, else logic in a function to define a dataframe column--why can't I use str.cont-CodePudding

I want to create a dataframe column using a function using if/elif/else. A simplified example of my code is below. The main difference is the real code adds at least 20 different elif statements--hence, although I might be able to use nested np.where() or np.select(), I would prefer not to.

def func(row):
    if row['condition']==True:
        if row['summary'].str.contains('hi|there',case=False):
            return 'hi_there'
        else:
            return 'Other'
    else:
        if row['summary'].str.contains('goodbye|you',case=False):
            return 'goodbye_you'
        else:
            return 'Other'


df['newcolumn'] = df.apply(lambda row: func(row), axis=1)

I get this error message:

AttributeError: 'str' object has no attribute 'str'.

Is it possible to create my column using this method, but with a few additional tweaks? If it's not possible in Python, why?

CodePudding user response：

The value stored in row at key 'summary' is a String, which has no attribute .str. Try getting rid of .str and see if it works.

CodePudding user response：

You use the 'str' to access vectorized string functions on a column. With apply you iterate trough every row.

You can rewrite your function that you use within your apply. In this case you need something like this:

'hi' in lower(row['summary'])

or a regex as you did

import re
...
re.search('hi|there', row['summary'], re.IGNORECASE)

Better I think would be to actually use those vectorized functions as you started, just not within the apply, but on the dataframe columns directly

    df['new_column'] = 'Other'
    df.loc[(row['condition']==True)&(row['summary'].str.contains('hi|there',case=False)),'new_column'] = 'hi_there'
    df.loc[(row['condition']==False)&(row['summary'].str.contains('goodbye|you',case=False)),'new_column'] = 'goodbye_you'