Home > Blockchain >  RegEx stop at linebreak
RegEx stop at linebreak

Time:11-28

I'm parsing a file and need to extract street and house numbers in separate capture groups.

A file could look like this:

Start of String

Straße HNR : example street
 
More example data

Currently my expression looks like this:

/\s*Straße\s*HNR\s*:\s*(?<loc_street>\D )(?<loc_streetnumber>\d*\s{0,1}[a-z]*){0,1}/g

which matches things like:

Straße HNR : example street 1 a
Straße HNR : example street 12

correctly. But if I don't have a house number the (?<loc_street>\D ) matches just everything until the file ends, but I want to stop at the new line. Any hints?

CodePudding user response:

I would use this:

/\h*Straße\s*HNR\s*:\s*(?<loc_street>[^\d\n] )(?<loc_streetnumber>\d*\s?[a-z]*)?/g

You can check it here.

One key point is to match only horizontal spaces (\h) at the beginning of the line, or it could pick up possible newlines that are before that. \s is equivalent to [\r\n\t\f\v ].

Another point is to make sure you don't match newlines in the loc_street group. If you use \D you will match anything that is not a digit, including newline. By using [^\d\n] you explicitly match anything that is neither a digit nor a newline.

I replaced {0,1} with ?, but that is not important, just personal preference.

  • Related