Home > database >  Why is my RegEx pattern working for one string, but not the others?
Why is my RegEx pattern working for one string, but not the others?

Time:08-16

This feels like such a simple request, but I cannot figure out what is going on here, and have been messing with different RegEx testers for a while now.

My RegEx pattern: \b(?=<GTOL-[A-Z]*>)

If it matters, the command I am calling in my code (C#): Regex.Split(Text, @"\b(?=<GTOL-[A-Z]*>)").ToList();

The string that will successfully be split: <GTOL-POSI>.010<MOD-MMC>B-C<GTOL-POSI>.002<MOD-FMC>; Returns: <GTOL-POSI>.010<MOD-MMC>B-C ; <GTOL-POSI>.002<MOD-FMC>

The string that will not split as expected: <GTOL-POSI><MOD-DIAM>.004<MOD-MMC>HC<MOD-MMC><GTOL-POSI>.030<MOD-MMC>D-E<MOD-FMC>; Should return (but doesn't): <GTOL-POSI><MOD-DIAM>.004<MOD-MMC>HC<MOD-MMC> ; <GTOL-POSI>.030<MOD-MMC>D-E<MOD-FMC>

Another string that will not split as expected: <GTOL-POSI><MOD-DIAM>.005<MOD-MMC>AD-E<MOD-MMC><GTOL-PERP>.001A; Should return (but doesn't): <GTOL-POSI><MOD-DIAM>.005<MOD-MMC>AD-E<MOD-MMC> ; <GTOL-PERP>.001A

CodePudding user response:

There is not word boundary between the >< and as there seem to be at least 1 or more uppercase characters, then quantifier can be to match 1 or more times.

If you don't want to split at the start of the string, creating an empty entry in the result list, you can assert using a negative lookbehind (?<!^\s*) that there are not optional whitespace chars to the left after the start of the string.

(?<!^\s*)(?=<GTOL-[A-Z]*>)

See a regex .NET demo.

  • Related