Home > Enterprise >  How to check the string format of an entire column in Python using regex
How to check the string format of an entire column in Python using regex

Time:01-21

I have Account Names which look like GH85036, LG95639, etc in a column. I want to check the format of the entire columns so I can edit the ones that don't follow the format. This is my first time using regex.

So far I have got

for i in Reports['Account Name']:

 match = re.findall(r'\[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None

The error message I get:

<ipython-input-77-86f17b9d34ff> in <module>()
      1 for i in Reports['Account Name']:
----> 2     match = re.findall(r'\[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None

C:\Program Files\Anaconda3\lib\re.py in findall(pattern, string, flags)
    221 
    222     Empty matches are included in the result."""
--> 223     return _compile(pattern, flags).findall(string)
    224 
    225 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

CodePudding user response:

Assuming the correct/acceptable account number be two capital letters followed by 5 digits, we can use str.contains on the entire column to flag any non matching values:

Reports[~Reports["Account Name"].str.contains(r'^[A-Z]{2}[0-9]{5}$', regex=True)]
  • Related