I'm working in Notepad
In the file that I'm working with there is a string pattern of [0-9][0-9]-[0-9][0-9][0-9][0-9]|
that every line should start with, immediately followed by a pipe. (A caveat there: the pattern can have up to three capital letters following the four digits. E.g. 00-1324A| or 12-3456STR|).
There are instances in the file where that pattern is in the middle of a line, and needs to be moved to the next line.
Example:
00-1234REV|The quick brown fox jumped over the lazy dog|Test
11-6544|FooBar|text99-8656ST|This needs to be on the next line|some text
45-8737|Peter pipe picked a peck of pickled peppers|TEST2
As I noted within the example, 99-8656ST
needs to be moved to the next line, resulting in this:
00-1234REV|The quick brown fox jumped over the lazy dog|Test
11-6544|FooBar|text
99-8656ST|This needs to be on the next line|some text
45-8737|Peter pipe picked a peck of pickled peppers|TEST2
I currently have this regex: (?<=[^\d\r\n])\d{2}-\d{4}(?!\d)
but that is matching on parts of social security numbers in the middle of a line:
123-45-6789
My regex will on 45-6789
.
CodePudding user response:
Since purely numeric boundaries do not work here, you can add up a check for a digit hyphen on the left. The right-hand boundary is clear, it is zero to three uppercase letters followed with a pipe.
That means, you can use
(?<=[^\d\r\n])(?<!\d-)\d{2}-\d{4}(?=[A-Z]{0,3}\|)
See the regex demo. Details:
(?<=[^\d\r\n])
- immediately on the left, there must be a char other than a digit, CR, LF(?<!\d-)
- immediately on the left, there should be no digit-
\d{2}-\d{4}
- two digits,-
, four digits(?=[A-Z]{0,3}\|)
- immediately followed with 0 to 3 uppercase letters and then a literal|
char.
If the left-hand boundary can be a single hyphen or digit, then replace (?<=[^\d\r\n])(?<!\d-)
with (?<=[^\r\n\d-])
.