Home > Software engineering >  Imrove regex in Python (2.7) to find short / incomplete UK postcodes
Imrove regex in Python (2.7) to find short / incomplete UK postcodes

Time:07-04

I have a little function that finds full UK postcodes (e.g. DE2 7TT) in strings and returns them accordingly.

However, I'd like to change it to ALSO return postcodes it gets where there's either one or two letters and then one or two numbers (e.g. SE3, E2, SE45, E34).

i.e. it must collect BOTH forms of UK postcode (incomplete and complete).

The code is:

def pcsearch(postcode):
    if bool(re.search('(?i)[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', postcode)):
        postcode = re.search('(?i)[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', postcode)
        postcode = postcode.group()
        return postcode
    else:
        postcode = "na"
        return postcode

What tweaks are needed to get this to ALSO work with those shorter, incomplete, postcodes?

CodePudding user response:

You might write the pattern using an alternation and word boundaries.

(?i)\b(?:[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}|[A-Z]{1,2}\d{1,2})\b

Regex demo

The code could be refactored using the pattern only once by checking the match:

import re

def pcsearch(postcode):
       pattern = r"(?i)\b(?:[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}|[A-Z]{1,2}\d{1,2})\b"
       match = re.search(pattern, postcode)
       if match:
              return match.group()
       else:
              return  "na"

strings = [
       "SE3",
       "E2",
       "SE45",
       "E34",
       "DE2 7TT",
       "E123",
       "SE222"
]

for s in strings:
       print(pcsearch(s))

Output

SE3
E2
SE45
E34
DE2 7TT
na
na
  • Related