I am trying to locate some special characters within sentences marked as "\v{G}" or "\v{g}" in a HTML file to replace them with "Ǧ" and "ǧ", respectively, and save the corrected sentences in a new HTML file.
My regex (.*)\\v\{(\w)\}(.*)
locate the character to replace but I cannot replace the character based on its case: the resulting file contains:
This is a sentence ǧ with a upper case G.
This is a sentence ǧ with a lower case g.
instead of:
This is a sentence Ǧ with a upper case G.
This is a sentence ǧ with a lower case g.
MWE
The HTML input file contains:
This is a sentence \v{G} with a upper case G.
This is a sentence \v{g} with a lower case g.
The perl file contains:
use strict;
use warnings;
# Define variables
my ($inputfile, $outputfile, $inputone, $inputtwo, $part1, $specialcharacter, $part2);
# Initialize variables
$inputfile = "TestFile.html";
$outputfile = 'Results.html';
# Open output file
open(my $ofh, '>:encoding(UTF-8)', "$outputfile");
# Open input file
open(my $ifh, '<:encoding(UTF-8)', "$inputfile");
# Read input file
while(<$ifh>) {
# Analyse _temp.html file to identify special characters
($part1, $specialcharacter, $part2) = ($_ =~ /(.*)\\v\{(\w)\}(.*)/);
if ($specialcharacter == "g") {
$specialcharacter = "ǧ";
}elsif ($specialcharacter == "G") {
$specialcharacter = "Ǧ";# PROBLEM
}
say $ofh "\t\t<p>$part1$specialcharacter$part2";
}
# Close input and output files
close $ifh;
close $ofh;
CodePudding user response:
As mentioned in the comments, ==
is the wrong operator. You should compare the non-numeric scalars using eq
instead.
An alternative is to create a form of dictionary, a lookup table, and lookup your special characters in that.
# A map between the special characters and the html code you want in its place.
# Fill it with more if you've got them.
my %SpecialMap = (
'g' => 'ǧ',
'G' => 'Ǧ',
);
# Read input file
while(<$ifh>) {
# loop for as long as \v{character} is found in $_
while(/\\v\{(\w)\}/) {
# Look up the character in the dictionary.
# Fallback if it's not in the map: Use the character as-is instead.
my $ch = $SpecialMap{$1} || $1;
# Rebuild $_
$_ = $` . $ch . $';
}
# print the result
print $ofh $_;
}
For the input
Both \v{g} and \v{G} in here.
This is a sentence \v{g} with a lower case g.
This is a sentence \v{H} with a upper case H which is not in the map.
This contains nothing special.
It'll produce this output:
Both ǧ and Ǧ in here.
This is a sentence ǧ with a lower case g.
This is a sentence H with a upper case H which is not in the map.
This contains nothing special.
Inspired by Polar Bear's comment, you could use s///ge
to execute a mapping function instead and get the same result:
my %SpecialMap = (
'g' => 'ǧ',
'G' => 'Ǧ',
);
sub mapfunc {
return $SpecialMap{$1} || $1;
}
# Read input file
while(<$ifh>) {
# /g substitute all matches on the line
# /e by executing mapfunc($1) for each
s/\\v\{(\w)\}/mapfunc($1)/ge;
print $ofh $_;
}