Date,Amount,Subcategory,Memo,
29/10/2021,953.76,DIRECTDEP,Stripe Payments UK STRIPE BGC,
29/10/2021,-1260.44,FT,DIESEL INJECTORS U TRANSFER FT,
29/10/2021,-509.15,FT,TNT 002609348 FT,
Above is some accounts data that I need to group, and later apply labels to.
Firstly I tried this df['Suppliers'] = [re.search(r'\b[a-zA-Z]{3,}\b', item).group(0) for item in df['Memo'] if item is not None]
But get AttributeError: 'NoneType' object has no attribute 'group'
I understand that this is because the pattern was not found in the data.
So I tried removing the .group(0)
and get a match object for each item respectively e.g <re.Match object; span=(0, 6), match='Stripe'>
Question: I am not sure why if item is not None
doesn't skip over those items where no match is found. And why if I am returned a match object that I can't access with .group(0)
I have figured out a solution with a loop, but I would really like to understand what the problem is with the list comp approach.
for item in df['Memo']:
match = re.search(r'\b[a-zA-Z]{3,}\b', item)
try:
my_list.append(match.group(0).lower())
df['Suppliers'] = pd.DataFrame({'Suppliers': my_list})
except AttributeError:
my_list.append('na')
continue
CodePudding user response:
When you use if item is not None
you check if the item
is not None
, not the result of the re.search(r'\b[a-zA-Z]{3,}\b', item)
operation.
Just use Series.str.extract
directly:
df['Suppliers'] = df['Memo'].str.extract(r'\b([a-zA-Z]{3,})\b')
Mind you need to use a pair of unescaped parentheses to form a capturing group in the pattern when you want to use with with Series.str.extract
.
If you want to add the na
as string for the cases where no match was found add .fillna
:
df['Suppliers'] = df['Memo'].str.extract(r'\b([a-zA-Z]{3,})\b').fillna('na')