I'm parsing a file and need to extract street and house numbers in separate capture groups.
A file could look like this:
Start of String
Straße HNR : example street
More example data
Currently my expression looks like this:
/\s*Straße\s*HNR\s*:\s*(?<loc_street>\D )(?<loc_streetnumber>\d*\s{0,1}[a-z]*){0,1}/g
which matches things like:
Straße HNR : example street 1 a
Straße HNR : example street 12
correctly. But if I don't have a house number the (?<loc_street>\D )
matches just everything until the file ends, but I want to stop at the new line. Any hints?
CodePudding user response:
I would use this:
/\h*Straße\s*HNR\s*:\s*(?<loc_street>[^\d\n] )(?<loc_streetnumber>\d*\s?[a-z]*)?/g
You can check it here.
One key point is to match only horizontal spaces (\h
) at the beginning of the line, or it could pick up possible newlines that are before that. \s
is equivalent to [\r\n\t\f\v ]
.
Another point is to make sure you don't match newlines in the loc_street
group. If you use \D
you will match anything that is not a digit, including newline. By using [^\d\n]
you explicitly match anything that is neither a digit nor a newline.
I replaced {0,1}
with ?
, but that is not important, just personal preference.