I am trying to create a new column based on the following logic by using a function as below. When I apply this function to a df using lambda, I am seeing the following error. I tried to remove the "str" in front of "contains" but it did not work. Could anyone assist or advice. Thanks.
def new_col(x):
if pd.isna(x):
return ''
elif x.str.contains('Watch',case=False):
return 'Product A'
elif x.str.contains('Glasses',case=False):
return 'Product B'
elif x.str.contians('Table',case=False):
return 'Product C'
elif x.str.contains('Computer',case=False):
return 'Product D'
elif x.str.contains('Beauty',case=False):
return 'Product E'
elif x.str.contains(','):
return x.split(',')[0]
else:
return x
df['new column'] = df.apply(lambda x: new_col(x['product']),axis=1)
AttributeError: 'str' object has no attribute 'contains'
CodePudding user response:
str.contains
is a function applicable to a series as a whole. In your case, instead of using Pandas functions, you can use a simple for loop to do the trick.
result = []
for i in df['product']:
if pd.isna(i):
result.append('')
elif 'watch' in i.lower():
result.append('Product A')
elif 'glasses' in i.lower():
result.append('Product B')
elif 'table' in i.lower():
result.append('Product C')
elif 'computer' in i.lower():
result.append('Product D')
elif 'beauty' in i.lower():
result.append('Product E')
elif ',' in i:
result.append(i.split(',')[0])
else:
result.append(i)
df['new column'] = result
CodePudding user response:
Use the in
operator with str.casefold()
to perform a case-insensitive substring search:
def new_col(x):
if pd.isna(x):
return ''
elif 'Watch'.casefold() in x.str.casefold():
return 'Product A'
elif 'Glasses'.casefold() in x.str.casefold():
return 'Product B'
elif 'Table'.casefold() in x.str.casefold():
return 'Product C'
elif 'Computer'.casefold() in x.str.casefold():
return 'Product D'
elif 'Beauty'.casefold() in x.str.casefold():
return 'Product E'
elif ',' in x.str:
return x.split(',')[0]
else:
return x