I want to read through a text file and partition each line into the following three variables. Each variable must be defined, although it might be equal to the empty string.
$a1code
: all characters up to and not including the first non-escaped percent sign. If there is no non-escaped percent sign, this is the entire line. As we see in the example below, this also could be the empty string in a line where the following two variables are non-empty.$a2boundary
: the first non-escaped percent sign, if there is one.$a3cmnt
: any characters after the first non-escaped percent sign, if there is one.
The script below accomplishes this but requires several lines of code, two hashes, and a composite regex, that is, 2 regex combined by |
.
The composite seems necessary because the first clause,
(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)
does not match a line that is pure code, no comment.
Is there a more elegant way, using a single regex and fewer steps?
In particular, is there a way to dispense with the %match
hash and somehow
fill the %
hash with all three three variables in a single step?
#!/usr/bin/env perl
use strict; use warnings;
print join('', 'perl ', $^V, "\n",);
use Data::Dumper qw(Dumper); $Data::Dumper::Sortkeys = 1;
my $count=0;
while(<DATA>)
{
$count ;
print "$count\t";
chomp;
my %match=(
a2boundary=>'',
a3cmnt=>'',
);
print "|$_|\n";
if($_=~/^(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)|(?<a1code>.*)/)
{
print "from regex:\n";
print Dumper \% ;
%match=(%match,% ,);
}
else
{
die "no match? coding error, should never get here";
}
if(scalar keys % != scalar keys %match)
{
print "from multiple lines of code:\n";
print Dumper \%match;
}
print "------------------------------------------\n";
}
__DATA__
This is 100\% text and below you find an empty line.
abba 5\% %comment 9\% %Borgia
%all comment
%
Result:
perl v5.34.0
1 |This is 100\% text and below you find an empty line. |
from regex:
$VAR1 = {
'a1code' => 'This is 100\\% text and below you find an empty line. '
};
from multiple lines of code:
$VAR1 = {
'a1code' => 'This is 100\\% text and below you find an empty line. ',
'a2boundary' => '',
'a3cmnt' => ''
};
------------------------------------------
2 ||
from regex:
$VAR1 = {
'a1code' => ''
};
from multiple lines of code:
$VAR1 = {
'a1code' => '',
'a2boundary' => '',
'a3cmnt' => ''
};
------------------------------------------
3 |abba 5\% %comment 9\% %Borgia|
from regex:
$VAR1 = {
'a1code' => 'abba 5\\% ',
'a2boundary' => '%',
'a3cmnt' => 'comment 9\\% %Borgia'
};
------------------------------------------
4 |%all comment|
from regex:
$VAR1 = {
'a1code' => '',
'a2boundary' => '%',
'a3cmnt' => 'all comment'
};
------------------------------------------
5 |%|
from regex:
$VAR1 = {
'a1code' => '',
'a2boundary' => '%',
'a3cmnt' => ''
};
------------------------------------------
CodePudding user response:
You can use the following:
my ($a1code, $a2boundary, $a3cmnt) =
/
^
( (?: [^\\%] | \\. )* )
(?: (%) (.*) )?
\z
/sx;
It does not consider %
escaped in abc\\