I am unsure how to tell a regular expression in Python to stop after finding the first match.
Apparently you can tell regex to be lazy, RegEx - stop after first match , I tried placing (.*?) at the end of my expression but that just broke it. I just want it to stop after finding the first complete address and return that.
Sample code with data: https://regexr.com/6okuv
In the sample data all addresses are accepted by the expression except "Hindenburgdamm 27, Hygiene-Institut" where it should stop after "27" and return "Hindenburgdamm 27" and "Peschkestr. 5a/Holsteinische Str. 44" where it should stop after "5a" and return "Peschkestr. 5a".
Regex expression :
^([A-Za-zÄäÖöÜüß\s\d.-] ?)\s*([\d\s] (?:\s?[- /]\s?\d )?\s*[A-Za-z]?-?[A-Za-z]?)?$
Sample data:
Berliner Str. 74
Hindenburgdamm 27, Hygiene-Institut
Peschkestr. 5a/Holsteinische Str. 44
Lankwitzer Str. 13-17a
Fidicinstr. 15A
Haudegen Weg 15/17
Johanna-Stegen-Strasse 14a-d
Friedrichshaller Str. 7
Südwestkorso 9
CodePudding user response:
You could make the pattern a bit more specific for the digits and the trailing characters, and match at least a single digit using a case insensitive match:
^([A-ZÄäÖöÜüß.\s-] ?)\s*(\d (?:[/-]\d )?(?:[A-Z](?:-[A-Z])?)?)\b
Explanation
^
Start of string([A-ZÄäÖöÜüß.\s-] ?)
Capture group 1\s*
Match optional whitespace chars(
Capture group 1\d
Match 1 digits(?:[/-]\d )?
Optionally match/
-
and 1 digits(?:[A-Z](?:-[A-Z])?)?
Optionally match A-Z followed by an optional-
and A-Z
)
Close group 2\b
A word boundary
If you want a match only and don't need the capture groups you can omit them.
Note that in the leading character class there are chars like .
, -
and \s
If the match should not start with any of these characters you can start with a character class without those following by an optionally repeated character class to still match at least 1 character.
^[A-ZÄäÖöÜüß][A-ZÄäÖöÜüß.\s-]*?\s*\d (?:[/-]\d )?(?:[A-Z](?:-[A-Z])?)?\b
CodePudding user response:
You can try this pattern
^([A-Za-zÄäÖöÜüß\s\d.-] ?\s[0-9a-zA-zÄäÖöÜüß-] ?)[\s\/,]?
In any case if you don't expect to match the full line don't use the $ to expect the regular expression to reach EOL.