Using Python re and findall to match complex combination of digits in string-CodePudding

Im trying to use python re library in order to analyze a string containing a street name and multiple (or just a single) numbers separated by a forward slash.

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'

I want to match all digits, including positions after the dot and adjacent alpha characters. If a hyphen connects two numbers with an alpha character, they should also be considered as one match.

Expected output:

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

I'm trying the following

numbers = re.findall(r'\d \.*\d*\w[-\w]*', example)

Which is able to find all except single non-float digits (i.e. '1'):

print(numbers)

['2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

How do I need to tweak my regex in order to achieve the desired output?

CodePudding user response：

The pattern does not match the single 1 as \d \.*\d*\w[-\w]* expects at least 2 characters being at least 1 digit for \d and 1 word character for \w

If the address should not end on - and can only match characters a-z after the digits, and using a case insensitive match:

\b\d (?:\.\d )?[a-z]*(?:-\w )*

\b A word boundary
\d (?:\.\d )? Match digits with an optional decimal part
[a-z]* Match optional chars a-z
(?:-\w )* optionally repeat matching - and 1 or more word characters

Regex demo

Note that matching an address can be hard as there can be many different notations, this pattern matches the given format in the example string.

import re

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
pattern = r"\b\d (?:\.\d )?[a-z]*(?:-\w )*"
print(re.findall(pattern, example))

Output

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

CodePudding user response：

this works:

 numbers = re.findall(r'\d[0-9a-z\-\.]*', example)

CodePudding user response：

Using Regex

Working example : https://regex101.com/r/PDYSgH/1

import re
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
numbers = re.findall(r'\d[a-z0-9.\-]*', example)

Using Split

Probably you can split the string using space and then /.

numbers = example.split(" ")[-1].split("/")

CodePudding user response：

Another solution, which seems simpler:

>> re.findall(r'\d[^/]*', example)
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

You can confirm that it works here (although, I had to escape the slash (/) character).

\d[^/]*: Matches any string that starts with a digit and is followed up by any character, except a / (stops at said character).