I would like to differentiate between the following addresses:
addrs = ['100-123 Main Street', '100-123 Main Street 239-241 Second Street']
When I use the following query, I get both:
r = re.compile(r"\d{1,4}[-]\d{1,4}")
x = list(filter(r.match, addrs))
How do I get each address separately (i.e. address in index 0 and index 1)? For example, I want to get the second address for which I did the following:
r = re.compile(r"\d{1,4}[-]\d{1,4}\s\d{1,4}[-]\d{1,4}\s\w")
But that gives me an empty list.
Objective:
I am looking for two regex patterns that will give me either '100-123 Main Street' or '100-123 Main Street 239-241 Second Street' but not both together.
CodePudding user response:
findall
method in re
library should get the job done..
I tried with the below regexp and it worked:
re.findall("\d \-\d \s[a-zA-Z ]*", input_string)
This will return you the list of addresses..
For example if your input_string
is '100-123 Main Street 239-241 Second Street'
, output will be:
['100-123 Main Street ', '239-241 Second Street']
CodePudding user response:
You could try:
>>> addrs = ['100-123 Main Street', '100-123 Main Street 239-241 Second Street']
>>>
>>> r = re.compile(r"\d{1,4}[-]\d{1,4}(\s[A-Za-z] ) $") # 1st regex pattern
>>> x = list(filter(r.match, addrs))
>>> x
['100-123 Main Street']
>>>
>>> r = re.compile(r"(\d{1,4}[-]\d{1,4}(\s[A-Za-z] ) \s?){2}") # 2nd regex pattern
>>> x = list(filter(r.match, addrs))
>>> x
['100-123 Main Street 239-241 Second Street']
>>>
>>> r.match('100-123 Main Street 239-241 Second Street').group(1)
'239-241 Second Street'
>>>
>>>
>>> r = re.compile(r"(\d{1,4}[-]\d{1,4}(\s\w ) ){2}") # hack in the comment, don't use it
>>> x = list(filter(r.match, addrs))
>>> x
['100-123 Main Street 239-241 Second Street']
>>> r.match('100-123 Main Street 239-241 Second Street').group(1)
'9-241 Second Street'
Explanation:
\d{1,4}[-]\d{1,4}
in both regex patterns matches 1-4 digits, followed by-
, followed by 1-4 digits(\s[A-Za-z] )
in both regex patterns matches a whitespace character, followed by one or more alphabet characters, and all that one or more times$
in the first regex pattern results in matching what is described in 1. and 2. once (e.g.,100-123 Main Street
)\s?
with{2}
in the second regex pattern results in matching what is described in 1. and 2. twice separated by a whitespace character