This feels like such a simple request, but I cannot figure out what is going on here, and have been messing with different RegEx testers for a while now.
My RegEx pattern: \b(?=<GTOL-[A-Z]*>)
If it matters, the command I am calling in my code (C#): Regex.Split(Text, @"\b(?=<GTOL-[A-Z]*>)").ToList();
The string that will successfully be split: <GTOL-POSI>.010<MOD-MMC>B-C<GTOL-POSI>.002<MOD-FMC>
; Returns: <GTOL-POSI>.010<MOD-MMC>B-C
; <GTOL-POSI>.002<MOD-FMC>
The string that will not split as expected: <GTOL-POSI><MOD-DIAM>.004<MOD-MMC>HC<MOD-MMC><GTOL-POSI>.030<MOD-MMC>D-E<MOD-FMC>
; Should return (but doesn't): <GTOL-POSI><MOD-DIAM>.004<MOD-MMC>HC<MOD-MMC>
; <GTOL-POSI>.030<MOD-MMC>D-E<MOD-FMC>
Another string that will not split as expected: <GTOL-POSI><MOD-DIAM>.005<MOD-MMC>AD-E<MOD-MMC><GTOL-PERP>.001A
; Should return (but doesn't): <GTOL-POSI><MOD-DIAM>.005<MOD-MMC>AD-E<MOD-MMC>
; <GTOL-PERP>.001A
CodePudding user response:
There is not word boundary between the ><
and as there seem to be at least 1 or more uppercase characters, then quantifier can be
to match 1 or more times.
If you don't want to split at the start of the string, creating an empty entry in the result list, you can assert using a negative lookbehind (?<!^\s*)
that there are not optional whitespace chars to the left after the start of the string.
(?<!^\s*)(?=<GTOL-[A-Z]*>)
See a regex .NET demo.