how can I partition a line into code and comment using a single regex in perl?-CodePudding

I want to read through a text file and partition each line into the following three variables. Each variable must be defined, although it might be equal to the empty string.

$a1code: all characters up to and not including the first non-escaped percent sign. If there is no non-escaped percent sign, this is the entire line. As we see in the example below, this also could be the empty string in a line where the following two variables are non-empty.
$a2boundary: the first non-escaped percent sign, if there is one.
$a3cmnt: any characters after the first non-escaped percent sign, if there is one.

The script below accomplishes this but requires several lines of code, two hashes, and a composite regex, that is, 2 regex combined by |. The composite seems necessary because the first clause,

(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)

does not match a line that is pure code, no comment. Is there a more elegant way, using a single regex and fewer steps? In particular, is there a way to dispense with the %match hash and somehow fill the % hash with all three three variables in a single step?

#!/usr/bin/env perl
use strict; use warnings;
print join('', 'perl ', $^V, "\n",);
use Data::Dumper qw(Dumper); $Data::Dumper::Sortkeys = 1;

my $count=0;
while(<DATA>)
{
    $count  ;
    print "$count\t";
    chomp;
    my %match=(
        a2boundary=>'',
        a3cmnt=>'',
    );
    print "|$_|\n";
    if($_=~/^(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)|(?<a1code>.*)/)
    {
        print "from regex:\n";
        print Dumper \% ;
        %match=(%match,% ,);
    }
    else
    {
        die "no match? coding error, should never get here";
    }
    if(scalar keys %  != scalar keys %match)
    {
        print "from multiple lines of code:\n";
        print Dumper \%match;
    }
    print "------------------------------------------\n";
}

__DATA__
This is 100\% text and below you find an empty line.

abba 5\% %comment 9\% %Borgia
%all comment
%

Result:

perl v5.34.0
1   |This is 100\% text and below you find an empty line.   |
from regex:
$VAR1 = {
          'a1code' => 'This is 100\\% text and below you find an empty line.   '
        };
from multiple lines of code:
$VAR1 = {
          'a1code' => 'This is 100\\% text and below you find an empty line.   ',
          'a2boundary' => '',
          'a3cmnt' => ''
        };
------------------------------------------
2   ||
from regex:
$VAR1 = {
          'a1code' => ''
        };
from multiple lines of code:
$VAR1 = {
          'a1code' => '',
          'a2boundary' => '',
          'a3cmnt' => ''
        };
------------------------------------------
3   |abba 5\% %comment 9\% %Borgia|
from regex:
$VAR1 = {
          'a1code' => 'abba 5\\% ',
          'a2boundary' => '%',
          'a3cmnt' => 'comment 9\\% %Borgia'
        };
------------------------------------------
4   |%all comment|
from regex:
$VAR1 = {
          'a1code' => '',
          'a2boundary' => '%',
          'a3cmnt' => 'all comment'
        };
------------------------------------------
5   |%|
from regex:
$VAR1 = {
          'a1code' => '',
          'a2boundary' => '%',
          'a3cmnt' => ''
        };
------------------------------------------

CodePudding user response：

You can use the following:

my ($a1code, $a2boundary, $a3cmnt) =
   /
      ^
      (  (?: [^\\%]  | \\. )* )
      (?: (%) (.*) )?
      \z
   /sx;

It does not consider % escaped in abc\\