I have a little function that finds full UK postcodes (e.g. DE2 7TT) in strings and returns them accordingly.
However, I'd like to change it to ALSO return postcodes it gets where there's either one or two letters and then one or two numbers (e.g. SE3, E2, SE45, E34).
i.e. it must collect BOTH forms of UK postcode (incomplete and complete).
The code is:
def pcsearch(postcode):
if bool(re.search('(?i)[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', postcode)):
postcode = re.search('(?i)[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', postcode)
postcode = postcode.group()
return postcode
else:
postcode = "na"
return postcode
What tweaks are needed to get this to ALSO work with those shorter, incomplete, postcodes?
CodePudding user response:
You might write the pattern using an alternation and word boundaries.
(?i)\b(?:[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}|[A-Z]{1,2}\d{1,2})\b
The code could be refactored using the pattern only once by checking the match:
import re
def pcsearch(postcode):
pattern = r"(?i)\b(?:[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}|[A-Z]{1,2}\d{1,2})\b"
match = re.search(pattern, postcode)
if match:
return match.group()
else:
return "na"
strings = [
"SE3",
"E2",
"SE45",
"E34",
"DE2 7TT",
"E123",
"SE222"
]
for s in strings:
print(pcsearch(s))
Output
SE3
E2
SE45
E34
DE2 7TT
na
na