Home > other >  Regex two separate nested capturing groups
Regex two separate nested capturing groups

Time:02-10

I have a regex and test case on

https://regex101.com/r/5Z5Lop/1

^(?<KEY>CONF|ESD|TRACKING)[:;'\s]\s*(?<DATA>.*?)\s*(?:L[:;'\s]\s*\K(?<LINE_DATA>.*?))?(?<INITIALS>\*[a-zA-Z] )?\s*$

See the LINE_DATA named group.

Is it possible to split that group up into two separate groups?

I want one group LINE_NUMBERS to hold all integers not contained in parentheses. Then, 1 group called QTYS to hold all integers that are contained in parentheses.

So currently LINE_NUMBERS yields "1,2,3(4),5(12) "

Is it possible to have a LINE_NUMBERS be [1,2,3,4] (either array or some kinda string) and then QTYS to be [(4),(12)] Note: I do still want to capture the parentheses.

I would like to do this in the current regex if it's possible and doesn't overly complicate what I currently have.

Right now, I'm obtaining this data through post-processing with separate regexes. I'm using php

preg_match_all('/\d (?!\s*\))/i', $ret_data['LINE_DATA'], $ret_data['LINE_NUMBERS']);

Thanks! preg_match_all('/\(\s*\d\s*\)/i', $ret_data['LINE_DATA'], $ret_data['QUANTITIES']);

CodePudding user response:

You can use a single pattern in the post-processing for the QUANTITIES and the LINE_NUMBERS using an alternation | and removing the empty entries from the result.

$re = '/^(?<KEY>CONF|ESD|TRACKING)[:;\'\s]\s*(?<DATA>.*?)\s*(?:L[:;\'\s]\s*\K(?<LINE_DATA>.*?))?(?<INITIALS>\*[a-zA-Z] )?\s*$/i';
$str = 'esd:      here is my data      L:       1,2,3(4),5(12)   *sm          ';
preg_match($re, $str, $matches);

preg_match_all('/(?<QUANTITIES>\(\d \))|(?<LINE_NUMBERS>\d )/', $matches["LINE_DATA"], $numbers);

print_r(array_filter($numbers["QUANTITIES"]));
print_r(array_filter($numbers["LINE_NUMBERS"]));

Output

Array
(
    [3] => (4)
    [5] => (12)
)
Array
(
    [0] => 1
    [1] => 2
    [2] => 3
    [4] => 5
)

There could be an option to use the \G anchor to get 2 separate groups for the given example data, but it will make the INITIALS part after it optional:

^(?<KEY>CONF|ESD|TRACKING)[:;'\s]\s*(?<DATA>.*?)\s*L[:;'\s]\s*|\G(?!^)(?:(?<QUANTITIES>\(\d \))|(?<LINE_NUMBERS>\d )),?(?:\s*(?<INITIALS>\*[a-zA-Z] )\s*$)?
  • ^ Start of string
  • (?<KEY>CONF|ESD|TRACKING)[:;'\s]\s* The KEY group with alternatives, and match a single char listed in the character class and optional whitspace chars
  • (?<DATA>.*?)\s* Match the DATA group, any char non greedy followed by optional whitespace chars
  • L[:;'\s]\s* Match L the any of the list chars and optional whitespace chars
  • | Or
  • \G(?!^) Assert the position at the end of the previous match, not at the start
  • (?: Non capture group
    • (?<QUANTITIES>\(\d \)) Group QUANTITIES, match 1 digits between parenthesis
    • | Or
    • (?<LINE_NUMBERS>\d ) Group LINE_NUMBERS, match 1 digits
  • ) Close non capture group
  • ,? Match an optional comma
  • (?:\s*(?<INITIALS>\*[a-zA-Z] )\s*$)? Optional non capture group with group INITIALS

Regex demo | PHP demo

  •  Tags:  
  • Related