I have Account Names which look like GH85036, LG95639, etc in a column. I want to check the format of the entire columns so I can edit the ones that don't follow the format. This is my first time using regex.
So far I have got
for i in Reports['Account Name']:
match = re.findall(r'\[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None
The error message I get:
<ipython-input-77-86f17b9d34ff> in <module>()
1 for i in Reports['Account Name']:
----> 2 match = re.findall(r'\[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None
C:\Program Files\Anaconda3\lib\re.py in findall(pattern, string, flags)
221
222 Empty matches are included in the result."""
--> 223 return _compile(pattern, flags).findall(string)
224
225 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
CodePudding user response:
Assuming the correct/acceptable account number be two capital letters followed by 5 digits, we can use str.contains
on the entire column to flag any non matching values:
Reports[~Reports["Account Name"].str.contains(r'^[A-Z]{2}[0-9]{5}$', regex=True)]