I have some .html
files in a directory to which I want to add one line of css
code. Using perl
, I can locate the position with a regex and add the css
code, this works very well.
However, my first .html file contain an accented letter: é but the resulting .html
file has an encoding problem and prints: \xE9
In the perl file, I have been careful to specify UTF-8
encoding when opening and closing the files, has shown in the MWE below, but that does not solve the problem. How can I solve this encoding error?
MWE
use strict;
use warnings;
use File::Spec::Functions qw/ splitdir rel2abs /; # To get the current directory name
# Define variables
my ($inputfile, $outputfile, $dir);
# Initialize variables
$dir = '.';
# Open current directory
opendir(DIR, $dir);
# Scan all files in directory
while (my $inputfile = readdir(DIR)) {
#Name output file based on input file
$outputfile = $inputfile;
$outputfile =~ s/_not_centered//;
# Open output file
open(my $ofh, '>:encoding(UTF-8)', $outputfile);
# Open only files containning ending in _not_centered.html
next unless (-f "$dir/$inputfile");
next unless ($inputfile =~ m/\_not_centered.html$/);
# Open input file
open(my $ifh, '<:encoding(UTF-8)', $inputfile);
# Read input file
while(<$ifh>) {
# Catch and store the number of the chapter
if(/(<h2)(.*?)/) {
# $_ =~ s/<h2/<h2 style="text-align: center;"/;
print $ofh "$1 style=\"text-align: center;\"$2";
}else{
print $ofh "$_";
}
}
# Close input and output files
close $ifh;
close $ofh;
}
# Close output file and directory
closedir(DIR);
Problematic file named "Chapter_001_not_centered.html"
<html >
<head></head>
<body>
<h2 ><span >Chapter 1</span><br /><a id="x1-10001"></a>Brocéliande</h2>
Brocéliande
</body></html>
CodePudding user response:
Following demo script does required inject with utilization of glob function.
Note: the script creates a new file, uncomment rename to make replacement original file with a new one
use strict;
use warnings;
use open ":encoding(Latin1)";
my $dir = '.';
process($_) for glob("$dir/*_not_centered.html");
sub process {
my $fname_in = shift;
my $fname_new = $fname_in . '.new';
open my $in, '<', $fname_in
or die "Couldn't open $fname_in";
open my $out, '>', $fname_new
or die "Couldn't open $fname_new";
while( <$in> ) {
s/<h2/<h2 style="text-align: center;"/;
print $out $_;
}
close $in;
close $out;
# rename $fname_new, $fname_in
# or die "Couldn't rename $fname_new to $fname_in";
}
If you do not mind to run following script per individual file basis script.pl in_file > out_file
use strict;
use warnings;
print s/<h2/<h2 style="text-align: center;"/ ? $_ : $_ for <>;
In case if such task arises only occasionally then it can be solved with one liner
perl -pe "s/<h2/<h2 style='text-align: center;'/" in_file
CodePudding user response:
This question found an answer in the commments of @Shawn and @ sticky bit:
By changing the encoding to open and close the files to ISO 8859-1, it solves the problem. If one of you wants to post the answer, I will validate it.