Home > Software design >  How can I find a protein sequence from a FASTA file using perl?
How can I find a protein sequence from a FASTA file using perl?

Time:11-28

So I have an exercise in which I have to print the three first lines of a fasta file as well as the protein sequence. I have tried to run a script I wrote, but cygwin doesnt seem to print the sequence out. My code is as follows:

#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
    if($_=~ m/^ID/) {
        print $_ ;
    }
    if($_=~ m/^AC/) {
        print $_ ;
    }
    if ($_=~ m/^SQ/) {
        print $_;
    }
    if ($_=~ m/\^s (\w )/) { #this is the part I have trouble with
        $a.=$1;
        $a=~s/\s//g; #this is for removing the spaces inside the sequence
        print $a;
    }

The fast file looks like this: SQ SEQUENCE 474 AA; 55345 MW; 0D9FA81230B282D9 CRC64; MRFTFTSRCL ALFLLLNHPT PILPAFSNQT YPTIEPKPFL YVVGRKKMMD AQYKCYDRMQ QLPAYQGEGP YCNRTWDGWL CWDDTPAGVL SYQFCPDYFP DFDPSEKVTK YCDEKGVWFK HPENNRTWSN YTMCNAFTPE KLKNAYVLYY LAIVGHSLSI FTLVISLGIF VFFRSLGCQR VTLHKNMFLT YILNSMIIII HLVEVVPNGE LVRRDPVSCK ILHFFHQYMM ACNYFWMLCE GIYLHTLIVV AVFTEKQRLR WYYLLGWGFP LVPTTIHAIT RAVYFNDNCW LSVETHLLYI IHGPVMAALV VNFFFLLNIV RVLVTKMRET HEAESHMYLK AVKATMILVP LLGIQFVVFP WRPSNKMLGK IYDYVMHSLI HFQGFFVATI YCFCNNEVQT TVKRQWAQFK IQWNQRWGRR PSNRSARAAA AAAEAGDIPI YICHQELRNE PANNQGEESA EIIPLNIIEQ ESSA //

To match the sequence I used the fact that each line starts with several spaces and then its only letters. It doesnt seem to do the trick regarding cygwin. Here is the link for the sequence https://www.uniprot.org/uniprot/P30988.txt

CodePudding user response:

The problem is with this line

    if ($_=~ m/\^s (\w )/) { #this is the part I have trouble with

You have the backslash in the wrong place in this part \^s . You are actually escaping the ^. The line in your code should be

    if ($_=~ m/^\s (\w )/) { #this is the part I have trouble with

I'd write that block of code like this

    if ($_=~ m/^\s/) { 
        s/\s //g; #this is for removing the spaces inside the sequence
        print $_;
    }```
  • Related