I am modifying some HTML pages and want to increase the font size dynamically with a regex. In my script below, I want the '8' and '3' to turn into '9' and '4' but I get '8 ' and '3 ', respectively. I have the following:
#!/usr/bin/perl
use warnings;
use LWP::Simple;
my $content = "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"8\">this is just a bunch of text</FONT></TD>";
$content .= "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"3\">more text</FONT></TD>";
$content=~s/SIZE="(\d )">/SIZE="$1 ">/g;
print $content;
CodePudding user response:
I'll just skip the part about how regexps are a bad way to parse HTML, because sometimes a quick-and-dirty solution is good enough.
You can't use an operator inside a string like that. The is just treated as plain text (as you found). You have to use the /e
flag to indicate that the replacement should be evaluated as Perl code, and then use the appropriate expression, like:
$content =~ s/SIZE="(\d )">/'SIZE="' . ($1 1) . '">'/eg;
You can't use $1
for two reasons. First, it would do the increment after returning the value, so you'd be replacing 8 with 8 instead of 9. Second, $1
is a read-only value, and the increment would want to modify it.
CodePudding user response:
You should consider using an HTML parser such as HTML::TokeParser::Simple:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TokeParser::Simple;
my $content = "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"8\">this is just a bunch of text</FONT></TD>";
$content .= "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"3\">more text</FONT></TD>";
my $parser = HTML::TokeParser::Simple->new( \$content );
while ( my $token = $parser->get_token ) {
if ( $token->is_start_tag('font') ) {
my $font_size = $token->get_attr('size');
if ( defined $font_size ) {
$font_size;
$token->set_attr(size => $font_size);
}
}
print $token->rewrite_tag->as_is;
}
Output:
<td><font style="font-family:Verdana, Geneva, sans-serif" size="9">this is just
a bunch of text</font></td><td><font style="font-family:Verdana, Geneva,
sans-serif" size="4">more text</font></td>
CodePudding user response:
Use the e
modifier/flag to execute scripts inside the regex, e.g.
$content=~s/SIZE="(\d )">/'SIZE="'.($1 1).'">'/ge;
CodePudding user response:
#!/usr/bin/perl -w
use strict;
sub main{
my $c = qq{<TD><FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="8">this is just a bunch of text</FONT></TD>\n}
. '<TD><FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="3">more text</FONT></TD>';
$c =~ s/(SIZE=\")(\d )(\")/$_=$2 1;"$1$_$3"/eg;
print "$c\n";
#<TD><FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="9">this is just a bunch of text</FONT></TD>
#<TD><FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="4">more text</FONT></TD>
}
main();