Maybe some regex-Master can solve my problem.
I have a big list with many addresses with no seperators( , ; ). The address string contains following Information:
- The first group is the street name
- The second group is the street number
- The third group is the zipcode (optional)
- The last group is the town name (optional)
As you can see on the image above the last two test strings are not matching. I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.
I tried (\d{4,5})
for allowing 4 and 5 digits. But this only works halfways as you can see here:
(This sometimes mixes the street number and zipcode together)
I also tried (?:\d{5})?
to make the third and fourth group optional. But this destroys my whole group layout...
This is my current regex:
/^([a-zäöüÄÖÜß\s\d.,-] ?)\s*([\d\s] (?:\s?[-| \/]\s?\d )?\s*[a-z]?)?\s*(\d{5})\s*(. )?$/im
Try it out yourself: https://regex101.com/r/zC8NCP/1
My brain is only farting at this moment and i can't think straight anymore.
Please help me fix this problem so i can die in peace.
CodePudding user response:
You can use
^(.*?)(?:\s (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b))?(?:\s (\d{4,5})(?:\s (.*))?)?$
See the regex demo (note all \s
are replaced with \h
to only match horizontal whitespaces).
Details:
^
- start of string(.*?)
- Group 1: any zero or more chars other than line break chars(?:\s (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b))?
- an optional non-capturing group matching\s
- one or more whitespaces(\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b)
- Group 2:\d
- one or more digits(?:\s*[-| \/]\s*\d )*
- zero or more sequences of zero or more whitespaces,-
,|
or/
, zero or more whitespaces, one or more digits\s*
- zero or more whitespaces[a-z]?\b
- an optional lowercase ASCII letter and a word boundary
(?:\s (\d{4,5})\b(?:\s (.*))?)?
- an optional non-capturing group matching\s
- one or more whitespaces(\d{4,5})
- Group 3: four or five digits(?:\s (.*))?
- an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
$
- end of string.
Please note that the (?:\s (.*))?
optional group must be inside the (?:\s (\d{4,5})...)?
group to work.