Home > Net >  Regex to find instances of a string pattern that is not preceded by a carriage return/line feed
Regex to find instances of a string pattern that is not preceded by a carriage return/line feed

Time:09-30

I'm working in Notepad

In the file that I'm working with there is a string pattern of [0-9][0-9]-[0-9][0-9][0-9][0-9]| that every line should start with, immediately followed by a pipe. (A caveat there: the pattern can have up to three capital letters following the four digits. E.g. 00-1324A| or 12-3456STR|).

There are instances in the file where that pattern is in the middle of a line, and needs to be moved to the next line.

Example:

00-1234REV|The quick brown fox jumped over the lazy dog|Test
11-6544|FooBar|text99-8656ST|This needs to be on the next line|some text
45-8737|Peter pipe picked a peck of pickled peppers|TEST2

As I noted within the example, 99-8656ST needs to be moved to the next line, resulting in this:

00-1234REV|The quick brown fox jumped over the lazy dog|Test
11-6544|FooBar|text
99-8656ST|This needs to be on the next line|some text
45-8737|Peter pipe picked a peck of pickled peppers|TEST2

I currently have this regex: (?<=[^\d\r\n])\d{2}-\d{4}(?!\d) but that is matching on parts of social security numbers in the middle of a line:

123-45-6789

My regex will on 45-6789.

CodePudding user response:

Since purely numeric boundaries do not work here, you can add up a check for a digit hyphen on the left. The right-hand boundary is clear, it is zero to three uppercase letters followed with a pipe.

That means, you can use

(?<=[^\d\r\n])(?<!\d-)\d{2}-\d{4}(?=[A-Z]{0,3}\|)

See the regex demo. Details:

  • (?<=[^\d\r\n]) - immediately on the left, there must be a char other than a digit, CR, LF
  • (?<!\d-) - immediately on the left, there should be no digit -
  • \d{2}-\d{4} - two digits, -, four digits
  • (?=[A-Z]{0,3}\|) - immediately followed with 0 to 3 uppercase letters and then a literal | char.

If the left-hand boundary can be a single hyphen or digit, then replace (?<=[^\d\r\n])(?<!\d-) with (?<=[^\r\n\d-]).

  • Related