I have a regex and test case on
https://regex101.com/r/5Z5Lop/1
^(?<KEY>CONF|ESD|TRACKING)[:;'\s]\s*(?<DATA>.*?)\s*(?:L[:;'\s]\s*\K(?<LINE_DATA>.*?))?(?<INITIALS>\*[a-zA-Z] )?\s*$
See the LINE_DATA
named group.
Is it possible to split that group up into two separate groups?
I want one group LINE_NUMBERS
to hold all integers not contained in parentheses.
Then, 1 group called QTYS
to hold all integers that are contained in parentheses.
So currently LINE_NUMBERS
yields "1,2,3(4),5(12) "
Is it possible to have a LINE_NUMBERS
be [1,2,3,4]
(either array or some kinda string)
and then QTYS
to be [(4),(12)]
Note: I do still want to capture the parentheses.
I would like to do this in the current regex if it's possible and doesn't overly complicate what I currently have.
Right now, I'm obtaining this data through post-processing with separate regexes. I'm using php
preg_match_all('/\d (?!\s*\))/i', $ret_data['LINE_DATA'], $ret_data['LINE_NUMBERS']);
Thanks!
preg_match_all('/\(\s*\d\s*\)/i', $ret_data['LINE_DATA'], $ret_data['QUANTITIES']);
CodePudding user response:
You can use a single pattern in the post-processing for the QUANTITIES and the LINE_NUMBERS using an alternation |
and removing the empty entries from the result.
$re = '/^(?<KEY>CONF|ESD|TRACKING)[:;\'\s]\s*(?<DATA>.*?)\s*(?:L[:;\'\s]\s*\K(?<LINE_DATA>.*?))?(?<INITIALS>\*[a-zA-Z] )?\s*$/i';
$str = 'esd: here is my data L: 1,2,3(4),5(12) *sm ';
preg_match($re, $str, $matches);
preg_match_all('/(?<QUANTITIES>\(\d \))|(?<LINE_NUMBERS>\d )/', $matches["LINE_DATA"], $numbers);
print_r(array_filter($numbers["QUANTITIES"]));
print_r(array_filter($numbers["LINE_NUMBERS"]));
Output
Array
(
[3] => (4)
[5] => (12)
)
Array
(
[0] => 1
[1] => 2
[2] => 3
[4] => 5
)
There could be an option to use the \G
anchor to get 2 separate groups for the given example data, but it will make the INITIALS part after it optional:
^(?<KEY>CONF|ESD|TRACKING)[:;'\s]\s*(?<DATA>.*?)\s*L[:;'\s]\s*|\G(?!^)(?:(?<QUANTITIES>\(\d \))|(?<LINE_NUMBERS>\d )),?(?:\s*(?<INITIALS>\*[a-zA-Z] )\s*$)?
^
Start of string(?<KEY>CONF|ESD|TRACKING)[:;'\s]\s*
The KEY group with alternatives, and match a single char listed in the character class and optional whitspace chars(?<DATA>.*?)\s*
Match the DATA group, any char non greedy followed by optional whitespace charsL[:;'\s]\s*
MatchL
the any of the list chars and optional whitespace chars|
Or\G(?!^)
Assert the position at the end of the previous match, not at the start(?:
Non capture group(?<QUANTITIES>\(\d \))
Group QUANTITIES, match 1 digits between parenthesis|
Or(?<LINE_NUMBERS>\d )
Group LINE_NUMBERS, match 1 digits
)
Close non capture group,?
Match an optional comma(?:\s*(?<INITIALS>\*[a-zA-Z] )\s*$)?
Optional non capture group with group INITIALS