Im trying to use python re library in order to analyze a string containing a street name and multiple (or just a single) numbers separated by a forward slash.
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
I want to match all digits, including positions after the dot and adjacent alpha characters. If a hyphen connects two numbers with an alpha character, they should also be considered as one match.
Expected output:
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
I'm trying the following
numbers = re.findall(r'\d \.*\d*\w[-\w]*', example)
Which is able to find all except single non-float digits (i.e. '1'
):
print(numbers)
['2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
How do I need to tweak my regex in order to achieve the desired output?
CodePudding user response:
The pattern does not match the single 1 as \d \.*\d*\w[-\w]*
expects at least 2 characters being at least 1 digit for \d
and 1 word character for \w
If the address should not end on -
and can only match characters a-z after the digits, and using a case insensitive match:
\b\d (?:\.\d )?[a-z]*(?:-\w )*
\b
A word boundary\d (?:\.\d )?
Match digits with an optional decimal part[a-z]*
Match optional chars a-z(?:-\w )*
optionally repeat matching-
and 1 or more word characters
Note that matching an address can be hard as there can be many different notations, this pattern matches the given format in the example string.
import re
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
pattern = r"\b\d (?:\.\d )?[a-z]*(?:-\w )*"
print(re.findall(pattern, example))
Output
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
CodePudding user response:
this works:
numbers = re.findall(r'\d[0-9a-z\-\.]*', example)
CodePudding user response:
Using Regex
Working example : https://regex101.com/r/PDYSgH/1
import re
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
numbers = re.findall(r'\d[a-z0-9.\-]*', example)
Using Split
Probably you can split the string using space
and then /
.
numbers = example.split(" ")[-1].split("/")
CodePudding user response:
Another solution, which seems simpler:
>> re.findall(r'\d[^/]*', example)
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
You can confirm that it works here (although, I had to escape the slash (/
) character).
\d[^/]*
: Matches any string that starts with a digit and is followed up by any character, except a/
(stops at said character).