Home > other >  Perl-regex not case sensitive when it should be
Perl-regex not case sensitive when it should be

Time:06-01

I am trying to locate some special characters within sentences marked as "\v{G}" or "\v{g}" in a HTML file to replace them with "Ǧ" and "ǧ", respectively, and save the corrected sentences in a new HTML file.

My regex (.*)\\v\{(\w)\}(.*) locate the character to replace but I cannot replace the character based on its case: the resulting file contains:

This is a sentence ǧ with a upper case G.
This is a sentence ǧ with a lower case g. 

instead of:

This is a sentence Ǧ with a upper case G.
This is a sentence ǧ with a lower case g.

MWE

The HTML input file contains:

This is a sentence \v{G} with a upper case G.
This is a sentence \v{g} with a lower case g.

The perl file contains:

use strict;
use warnings;

# Define variables
my ($inputfile, $outputfile, $inputone, $inputtwo, $part1, $specialcharacter, $part2);

# Initialize variables
$inputfile = "TestFile.html";
$outputfile = 'Results.html';

# Open output file
open(my $ofh, '>:encoding(UTF-8)', "$outputfile");

# Open input file
open(my $ifh, '<:encoding(UTF-8)', "$inputfile");

# Read input file
while(<$ifh>) {
    # Analyse _temp.html file to identify special characters
        ($part1, $specialcharacter, $part2) = ($_ =~ /(.*)\\v\{(\w)\}(.*)/);
        if ($specialcharacter == "g") {
            $specialcharacter = "&#487";
        }elsif ($specialcharacter == "G") {
            $specialcharacter = "&#486";# PROBLEM 
        }
        say $ofh "\t\t<p>$part1$specialcharacter$part2";
}

# Close input and output files
close $ifh;
close $ofh;

CodePudding user response:

As mentioned in the comments, == is the wrong operator. You should compare the non-numeric scalars using eq instead.

An alternative is to create a form of dictionary, a lookup table, and lookup your special characters in that.

# A map between the special characters and the html code you want in its place.
# Fill it with more if you've got them.
my %SpecialMap = (
    'g' => '&#487;',
    'G' => '&#486;',
);

# Read input file
while(<$ifh>) {
    # loop for as long as \v{character} is found in $_
    while(/\\v\{(\w)\}/) {
        # Look up the character in the dictionary.
        # Fallback if it's not in the map: Use the character as-is instead.
        my $ch = $SpecialMap{$1} || $1;
        # Rebuild $_
        $_ = $` . $ch . $';
    }
    # print the result
    print $ofh $_;
}

For the input

Both \v{g} and \v{G} in here.
This is a sentence \v{g} with a lower case g.
This is a sentence \v{H} with a upper case H which is not in the map.
This contains nothing special.

It'll produce this output:

Both &#487; and &#486; in here.
This is a sentence &#487; with a lower case g.
This is a sentence H with a upper case H which is not in the map.
This contains nothing special.

Inspired by Polar Bear's comment, you could use s///ge to execute a mapping function instead and get the same result:

my %SpecialMap = (
    'g' => '&#487;',
    'G' => '&#486;',
);

sub mapfunc {
    return $SpecialMap{$1} || $1;
}

# Read input file
while(<$ifh>) {
    # /g substitute all matches on the line
    # /e by executing mapfunc($1) for each
    s/\\v\{(\w)\}/mapfunc($1)/ge;
    print $ofh $_;
}
  • Related