Home > Blockchain >  Getting city abbreviations from text with Python regex
Getting city abbreviations from text with Python regex

Time:07-26

I'm trying to get pairs of uppercase letters from text using the following pattern:

import re

pattern = re.compile('[A-Z][A-Z]')

text = []
text.append('CY, Cityname')
text.append('CY Cityname')
text.append('Cityname, CY')
text.append('Cityname CY')

for item in text:
  result = pattern.match(item)
  print(result)

The result I get is:

<re.Match object; span=(0, 2), match='CY'>
<re.Match object; span=(0, 2), match='CY'>
None
None

As can be seen in the code snippet above, the format I'm expecting the text to have is [pair of uppercase letters] and [some string] in any order, separated by a semicolon or whitespaces.

Why does that regex works in the first two cases where the string begins with the abbreviation, but not with those cases where it ends with the abbreviation?

CodePudding user response:

This is happening because the re.match method checks for accuracy from the start of the string. Using re.search will solve the problem.

import re

pattern = re.compile('[A-Z][A-Z]')

text = []
text.append('CY, Cityname')
text.append('CY Cityname')
text.append('Cityname, CY')
text.append('Cityname CY')

for item in text:
  result = pattern.search(item)
  print(result)
  • Related