I am struggling to convert a genome read quality from a fasta.qual file (40, 39, 38 etc) to ASCII using Phred 33 on Perl, but can't get it to work. I am trying to do it through the s///g operator. I have my qualities stored in a hash and I am trying to run the following loop:
foreach $key (keys %qual) {
$value = $qual{$key};
$qual{$key} =~ s/($value)/$map{$1}/g;
}
%map contains:
%map = ("0 " => "\!",
"1 " => "\"",
"2 " => "\#",
"3 " => "\$",
"4 " => "\%",
"5 " => "\&",
"6 " => "\'",
"7 " => "\(",
"8 " => "\)",
"9 " => "\*",
"10 " => "\ ",
"11 " => "\,",
"12 " => "\-",
"13 " => "\.",
"14 " => "\/",
"15 " => "0",
"16 " => "1",
"17 " => "2",
"18 " => "3",
"19 " => "4",
"20 " => "5",
"21 " => "6",
"22 " => "7",
"23 " => "8",
"24 " => "9",
"25 " => "\:",
"26 " => "\;",
"27 " => "\<",
"28 " => "\=",
"29 " => "\>",
"30 " => "\?",
"31 " => "\@",
"32 " => "A",
"33 " => "B",
"34 " => "C",
"35 " => "D",
"36 " => "E",
"37 " => "F",
"38 " => "G",
"39 " => "H",
"40 " => "I",);
It however transforms this:
>FR5ON5F01DQM9C
37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 35 35 35 40 40 40 40 40 40 40 40 40 40
40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 30 30 30 30
30 38 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37
into this:
>FR5ON5F01DQM9C
It happens for all the elements inside the hash. Is there anything I am doing wrong while applying the s/// operator?
The goal is to convert everything into a .fastq file.
Thank you!
CodePudding user response:
my $alt = join "|", map quotemeta, sort { length{$b} <=> length($a) } keys %map;
my $re = qr/($alt)/;
$str =~ s/$re/$map{$1}/g;
In retrospect, that doesn't take care of the spaces. It would make more sense to read in the sequence, then use
$seq = join "", map $map{$_}, split ' ', $seq;