I have a string:
{value1} {value2}-{value3}*{value...n}
using a regular expression, I want to capture each of the bracketed values as well as the operators in between them and I do not know how many brackets there will be.
I tried:
/(\{.*\}).*([\ |\-|\*|\/])*/mgU
but that is just getting me the values and not the operators. Where did I go wrong?
CodePudding user response:
You can validate the string first with
/\A ({ [^{}]* }) (?: [\/ *-] (?1))* \z/x
Details:
\A
- start of string({[^{}]*})
- Group 1: a{
, any zero or more chars other than{
and}
and then a}
char(?:[\/ *-](?1))*
- zero or more occurrences of a/
,*
or-
char and then the Group 1 pattern\z
- end of string.
Then, you may collect individual matches with
/ { [^{}]* } | [\/ *-] /gx
This regex matches all occurrences of any substrings between {
and }
(with {[^{}]*}
) or /
,
, *
or -
chars (with [\/ *-]
).
See a complete demo script:
#!/usr/bin/perl
use strict;
use warnings;
my $text = "{value1} {value2}-{value3}*{value...n}";
if ($text =~ /\A ({ [^{}]* }) (?: [\/ *-] (?1))* \z/x) {
while($text =~ / { [^{}]* } | [\/ *-] /gx) {
print "$&\n";
}
}
Output:
{value1}
{value2}
-
{value3}
*
{value...n}
CodePudding user response:
Another idea might be using the \G
anchor and 2 capture groups, where the curly values are in group 1 and the operator in group 2:
\G(?=.*{[^{}]*}\z)({[^{}]*})([ *\/-])?
The pattern matches
\G
Assert the position at the end of the previous match, or at the start of the string (in this case)(?=.*{[^{}]*}\z)
Positive lookahead, assert that the string ends with a curly part({[^{}]*})
Capture the curly braces in group 1([ *\/-])?
Optionally capture an operator in group 2
Example
my $str = "{value1} {value2}-{value3}*{value...n}";
while ($str =~ /\G(?=.*\{[^{}]*}\z)({[^{}]*})([ *\/-])?/g) {
print "Curly value: $1 Operator: $2\n";
}
Output
Curly value: {value1} Operator:
Curly value: {value2} Operator: -
Curly value: {value3} Operator: *
Curly value: {value...n} Operator:
CodePudding user response:
The tokenizer approach:
my @tokens;
for ($str) {
while (1) {
/\G \s /xgc;
/\G \{ ( [^{}]* ) \} /xgc
and do { push @tokens, [ VALUE => $1 ]; next; };
/\G ( [ -*\/] ) /xgc
and do { push @tokens, [ OP => $1 ]; next; };
/\G \Z /xgc
and last;
die( "Unexpected character at pos ".( pos )."\n" );
}
}
It might be overkill, but it's easier to extend.
CodePudding user response:
If you only have non-nested blocks, separated by a known list of operators, you can use split
to very easily separate a statement into values and operators.
use strict;
use warnings;
use Data::Dumper;
my @val = split m#([- /*])#, <DATA>; # parens will prevent operators from being consumed
print Dumper \@val;
__DATA__
{value1} {value2}-{value3}*{valuen}/{value4} {value5}-{value6}*{valuen} {value7} {value8}-{value9}
This will print:
$VAR1 = [
'{value1}',
' ',
'{value2}',
'-',
'{value3}',
'*',
'{valuen}',
'/',
'{value4}',
' ',
'{value5}',
'-',
'{value6}',
'*',
'{valuen}',
' ',
'{value7}',
' ',
'{value8}',
'-',
'{value9}
'
];
From there, it should be a simple task to validate and clean up the values, as well as identify the operators.