Home > Blockchain >  Perl Regex substitution - capturing and executing at same time $1
Perl Regex substitution - capturing and executing at same time $1

Time:01-28

Using Perl to spit out HTML text, I'd like to highlight in yellow an asterisk and leading/trailing letter if there's no space in between. Otherwise, if there are spaces around the asterisk, I want to highlight the asterisk in cyan. Here's what I'd like as far as inputs / outputs.

Input 1 : word*
Output 1: wor<span class='bcy'>d*</span>

Input 2 : *word
Output 2: <span class='bcy'>*w</span>ord

Input 3 : word * word
Output 3: word <span class='bcc'>*</span> word

I'm using the following, but losing that leading/trailing letter and asterisk since $1 is used in execution instead of capturing. Is there a way to do both ?

$phr =~ s/(\*\S|\S\*)|\*/$1 ? '<span class=\'bcy\'>$1<\/span>' : '<span class=\'bcc\'>*<\/span>'/eg;

CodePudding user response:

What comes in the replacement side with /e modifier must be correct Perl code, and I am getting a little lost trying to trace through your (well meant) quotes and escapes.

Here is a fresh take on it

use warnings;
use strict;
use feature 'say';

my $span_beg_1 = q(<span class='bcy'>);
my $span_beg_2 = q(<span class='bcc'>);
my $span_end   = q(</span>);

while (<DATA>) {
    chomp;

    s{ (\*\S|\S\*) | \* }
     { $1 ? $span_beg_1 . $1 . $span_end : $span_beg_2 . q(*) . $span_end }egx;

    say 
}
    
__DATA__
word*
*word
word * word

The DATA is Perl's builtin filehandle which reads line by line data appearing after __DATA__.

The q() is an operator form of single quotes, very handy for cleanly quoting string literals which may then freely use quotes.

The exact structure of constants I set up above can surely be simplified and/or improved.

This prints

wor<span class='bcy'>d*</span>
<span class='bcy'>*w</span>ord
word <span class='bcc'>*</span> word

For one, '<span class=\'bcy\'>$1<\/span>' won't do what is expected (consider 'a$1c'), as $1 isn't interpolated. Perhaps "a${1}b" would do it, and then given the special nature of $1 even "a$1b" goes.

So it appears that if you just replace the single quotes around the HTML elements with the double ones, so to interpolate the $1 inside, that it works. Then you can remove the escapes in front of single quotes in those elements, as well.

But then just better extract those constants anyway. It's so much easier to read and ensure that it's all the way you want it.

CodePudding user response:

I find separate regexes for such problems much easier to understand and maintain as a case distinction is necessary anyway. Up to your use case the [a-z] character class should maybe be replaced by [A-Za-z] or \w.

The e modifier is also not needed in the following code:

$phr =~ s{([a-z])\b\*} {<span class='bcy'>$1*</span>}g;
$phr =~ s{\*\b([a-z])} {<span class='bcy'>*$1</span>}g;
$phr =~ s{\b \* \b}   { <span class='bcc'>*</span> }g;
  • Related