Home > database >  Non Capturing group lazy when pattern repeats
Non Capturing group lazy when pattern repeats

Time:03-24

I am trying to capture 1-2 groups in a line. If a line has a dash I want a group for before and a group for after the dash. If it does not then I would like 1 group of everything.

However, occasionally a line will start with 'Remove - ', which is a phrase I would like to ignore.

Example data:

| Strings |
| -------- |
| Remove - Precision Speed - Recap |
| Precision Speed - Recap |
| Remove - Precision Speed |
| Precision Speed |

The first two should each capture group 1: 'Precision Speed' AND group 2: 'Recap'. While the last two should only capture 1 group: 'Precision Speed'.

Right now I have ^(?:Remove - )?(. )(?:\s*-\s*)(.*) and it is working correctly for the first two (because there is a second dash in there I believe). For the 3rd one it is capturing 'Remove' and 'Precision Speed' and for the 4th one it isn't capturing anything.

CodePudding user response:

You may use the following pattern:

^(?:Remove - )?([^-] )(?: - ([^-] ))?$

And if you're dealing with a multiline text, simply add \r\n to the negated character class to avoid matches across multiple lines:

^(?:Remove - )?([^-\r\n] )(?: - ([^-\r\n] ))?$

Demo.

CodePudding user response:

Make the second - and surrounding whitespace optional.

^(?:Remove - )?([^-] )(?:\s*-\s*)?(.*)
  • Related