Home > Back-end >  Regular Expressions - every unmatched to separated group
Regular Expressions - every unmatched to separated group

Time:10-20

Is it possible to group items that are left/unmatched?

Total 123,11
TEST TEST 313131231 132,131 31231,3123
TEST TEST TEST 12313 123,123
TEST 1 1231,4123
TEST 131234,211
TEST 78
TEST   3 2,13 

My pattern: (?P<Element_Name>[a-zA-Z].*[a-zA-Z] ).*?(?P<first_price>\d ,\d )(?:[^\d\n]*(?P<sec_price>\d ,\d ))?

Excepted result:

Total 123,11
TEST TEST 313131231 132,131 31231,3123   <--- add 313131231 into new group
TEST TEST TEST 12313 123,123   <--- add 12313 into new group
TEST 1 1231,4123   <--- add 1 into new group
TEST 131234,211   <--- do not add empty string into new group
TEST 78
TEST   3 2,13 <--- add 3 into new group

Demo: https://regex101.com/r/iktCov/1

I know that I can add a new group and to it designate the elements that should fall in (another pattern), but can it be matched differently/faster?

CodePudding user response:

If there can not be other digits in between, you can use an optional non capture group containing a capture group for digits only (?:\s*\b(\d )\b)?

Then you can use the negated character class again [^\d\n]* matching any char except newlines and digits.

The .* at the start of the pattern can be quite expensive, as it matches until the end of the string first and then introduces backtracking to fit in the rest of the pattern.

If you know that there are no digits in between, you can consider using [a-zA-Z][^\d\n]*[a-zA-Z]

Note that it matches at least 2 chars a-zA-Z

If the matches should be at the start of the string, you can prepend an anchor ^

^(?P<Element_Name>[a-zA-Z][^\d\n]*[a-zA-Z])(?:\s*\b(\d )\b)?[^\d\n]*(?P<first_price>\d ,\d )(?:[^\d\n]*(?P<sec_price>\d ,\d ))?

See a regex demo.

  • Related