Home > database >  Differentiating two hyphens in string Regex python [closed]
Differentiating two hyphens in string Regex python [closed]

Time:09-28

I would like to differentiate between the following addresses:

addrs = ['100-123 Main Street', '100-123 Main Street 239-241 Second Street']

When I use the following query, I get both:

r = re.compile(r"\d{1,4}[-]\d{1,4}")
x = list(filter(r.match, addrs))

How do I get each address separately (i.e. address in index 0 and index 1)? For example, I want to get the second address for which I did the following:

r = re.compile(r"\d{1,4}[-]\d{1,4}\s\d{1,4}[-]\d{1,4}\s\w")

But that gives me an empty list.

Objective:

I am looking for two regex patterns that will give me either '100-123 Main Street' or '100-123 Main Street 239-241 Second Street' but not both together.

CodePudding user response:

findall method in re library should get the job done..

I tried with the below regexp and it worked:

re.findall("\d \-\d \s[a-zA-Z ]*", input_string)

This will return you the list of addresses..

For example if your input_string is '100-123 Main Street 239-241 Second Street', output will be:

['100-123 Main Street ', '239-241 Second Street']

CodePudding user response:

You could try:

>>> addrs = ['100-123 Main Street', '100-123 Main Street 239-241 Second Street']
>>>  
>>> r = re.compile(r"\d{1,4}[-]\d{1,4}(\s[A-Za-z] ) $") # 1st regex pattern
>>> x = list(filter(r.match, addrs))
>>> x
['100-123 Main Street']
>>>
>>> r = re.compile(r"(\d{1,4}[-]\d{1,4}(\s[A-Za-z] ) \s?){2}") # 2nd regex pattern
>>> x = list(filter(r.match, addrs))
>>> x
['100-123 Main Street 239-241 Second Street']
>>>
>>> r.match('100-123 Main Street 239-241 Second Street').group(1)
'239-241 Second Street'
>>>
>>> 
>>> r = re.compile(r"(\d{1,4}[-]\d{1,4}(\s\w ) ){2}") # hack in the comment, don't use it
>>> x = list(filter(r.match, addrs))
>>> x
['100-123 Main Street 239-241 Second Street']
>>> r.match('100-123 Main Street 239-241 Second Street').group(1)
'9-241 Second Street' 

Explanation:

  1. \d{1,4}[-]\d{1,4} in both regex patterns matches 1-4 digits, followed by -, followed by 1-4 digits
  2. (\s[A-Za-z] ) in both regex patterns matches a whitespace character, followed by one or more alphabet characters, and all that one or more times
  3. $ in the first regex pattern results in matching what is described in 1. and 2. once (e.g., 100-123 Main Street)
  4. \s? with {2} in the second regex pattern results in matching what is described in 1. and 2. twice separated by a whitespace character
  • Related