Home > Net >  Some capture groups seem lost when matching group repeatedly
Some capture groups seem lost when matching group repeatedly

Time:09-17

Trying to parse the output of monitoring plugins I ran into a problem where the match result was unexpected by me:

First consider this debugger session with Perl 5.18.2:

 DB<6> x $_
0  'last=0.508798;;;0'
  DB<7> x $RE
0  (?^u:^((?^u:\'[^\'=] \'|[^\'= ] ))=((?^u:\\d (?:\\.\\d*)?|\\.\\d ))(s|%|[KMT]?B)?(;(?^u:\\d (?:\\.\\d*)?|\\.\\d )?){0,4}$)
   -> qr/(?^u:^((?^u:'[^'=] '|[^'= ] ))=((?^u:\d (?:\.\d*)?|\.\d ))(s|%|[KMT]?B)?(;(?^u:\d (?:\.\d*)?|\.\d )?){0,4}$)/
  DB<8> @m = /$RE/

  DB<9> x @m
0  'last'
1  0.508798
2  undef
3  ';0'
  DB<10>

OK, the regex $RE (intended to match "'label'=value[UOM];[warn];[crit];[min];[max]") looks terrifying at a first glance, so let me show the construction of it:

my $RE_label = qr/'[^'=] '|[^'= ] /;
my $RE_simple_float = qr/\d (?:\.\d*)?|\.\d /;
my $RE_numeric = qr/[- ]?$RE_simple_float(?:[eE][- ]?\d )?/;
my $RE = qr/^($RE_label)=($RE_simple_float)(s|%|[KMT]?B)?(;$RE_simple_float?){0,4}$/;

The relevant part is (;$RE_simple_float?){0,4}$ intended to match ";[warn];[crit];[min];[max]" (still not perfect), so for ";;;0" I'd expect @m to end with ';', ';', ';0'. However it seems the matches are lost, except for the last one.

Did I misunderstand something, or is it a Perl bug?

CodePudding user response:

When you use {<number>} (or or * for that matter) after a capture group, only the last value that is matched by the capture group is stored. This explain why you only end up with ;0 instead of ;;;0 in your fourth capture group: (;$RE_simple_float?){0,4} sets the fourth capture group to the last element it matches.

Top fix that, I would recommend to match the whole end of the string, and split it afterwards:

my $RE = qr/...((?:;$RE_simple_float?){0,4})$/;
my @m = /$RE/;
my @end = split /;/, $m[3]; # use /(?<=;)/ to keep the semicolons

Another solution is to repeat the capture group: replace (;$RE_simple_float?){0,4} with

(;$RE_simple_float?)?(;$RE_simple_float?)?(;$RE_simple_float?)?(;$RE_simple_float?)?

The capture groups that do not match will be set to undef. This issue with this approach is that it's a bit verbose, and only works for {}, but not for or *.

CodePudding user response:

Following demo code utilizes split to obtain data of interest. Investigate if it will fit as a solution for your problem.

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

while( <DATA> ) {
    chomp;
    say;
    my $record;
    $record->@{qw/label value warn crit min max/} = split(/[=;]/,$_);
    say Dumper($record);
}

exit 0;

#'label'=value[UOM];[warn];[crit];[min];[max]

__DATA__
'label 1'=0.3345s;0.8s;1.2s;0.2s;3.2s
'label 2'=10%;7%;18%;2%;28%
'label 3'=0.5us;2.3us

Output

'label 1'=0.3345s;0.8s;1.2s;0.2s;3.2s
$VAR1 = {
          'crit' => '1.2s',
          'warn' => '0.8s',
          'value' => '0.3345s',
          'label' => '\'label 1\'',
          'max' => '3.2s',
          'min' => '0.2s'
        };

'label 2'=10%;7%;18%;2%;28%
$VAR1 = {
          'min' => '2%',
          'max' => '28%',
          'label' => '\'label 2\'',
          'value' => '10%',
          'warn' => '7%',
          'crit' => '18%'
        };

'label 3'=0.5us;2.3us
$VAR1 = {
          'min' => undef,
          'max' => undef,
          'label' => '\'label 3\'',
          'warn' => '2.3us',
          'value' => '0.5us',
          'crit' => undef
        };
  • Related