Home > Net >  Regex optional groups and digit length
Regex optional groups and digit length

Time:02-16

Maybe some regex-Master can solve my problem.

I have a big list with many addresses with no seperators( , ; ). The address string contains following Information:

  • The first group is the street name
  • The second group is the street number
  • The third group is the zipcode (optional)
  • The last group is the town name (optional)

regex_png

As you can see on the image above the last two test strings are not matching. I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.

I tried (\d{4,5}) for allowing 4 and 5 digits. But this only works halfways as you can see here: regex_4_5_digits (This sometimes mixes the street number and zipcode together)

I also tried (?:\d{5})? to make the third and fourth group optional. But this destroys my whole group layout... regex_optional

This is my current regex:

/^([a-zäöüÄÖÜß\s\d.,-] ?)\s*([\d\s] (?:\s?[-| \/]\s?\d )?\s*[a-z]?)?\s*(\d{5})\s*(. )?$/im

Try it out yourself: https://regex101.com/r/zC8NCP/1

My brain is only farting at this moment and i can't think straight anymore.

Please help me fix this problem so i can die in peace.

CodePudding user response:

You can use

^(.*?)(?:\s (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b))?(?:\s (\d{4,5})(?:\s (.*))?)?$

See the regex demo (note all \s are replaced with \h to only match horizontal whitespaces).

Details:

  • ^ - start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars
  • (?:\s (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b))? - an optional non-capturing group matching
    • \s - one or more whitespaces
    • (\d (?:\s*[-| \/]\s*\d )*\s*[a-z]?\b) - Group 2:
      • \d - one or more digits
      • (?:\s*[-| \/]\s*\d )* - zero or more sequences of zero or more whitespaces, -, , | or /, zero or more whitespaces, one or more digits
      • \s* - zero or more whitespaces
      • [a-z]?\b - an optional lowercase ASCII letter and a word boundary
  • (?:\s (\d{4,5})\b(?:\s (.*))?)? - an optional non-capturing group matching
    • \s - one or more whitespaces
    • (\d{4,5}) - Group 3: four or five digits
    • (?:\s (.*))? - an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
  • $ - end of string.

Please note that the (?:\s (.*))? optional group must be inside the (?:\s (\d{4,5})...)? group to work.

  • Related