How to replace characters at specific positions from a file list?-CodePudding

I have a file containing a sequence:

    >sequence
TAGGACTGAGGGCTGGACAGGGCTGCGGGAG

and another one containing numbers refering to positions:

3
6
11

I would like to get a new file containing 'N' instead of A,C,G,T at the positions defined in the second file such as:

    >sequence
TANGANTGAGNGCTGGACAGGGCTGCGGGAG

Is there a way to do it using bash awk/sed or should I use a python script with SeqIO from biopython?

EDIT:

Here is a start for python script:

from Bio import SeqIO
import sys
import string
unput1=raw_input("enter sequence:")
unput2=raw_input("enter position file:")
fasta_file=unput1
position_file=unput2
result_file="outfile.fasta"
nb_list=list()
with open(position_file) as f:
    for line in f:
        line=line.strip()
        headerline = line.split()
        position=headerline[0]
        position_list.add(position)
for record in SeqIO.parse(StringIO(data), "fasta"):
    if record.id in nb_list:
        seq_record[position_list]="N"
        SeqIO.write([seq_record], f, "fasta")

CodePudding user response：

Using awk with empty FS. This may not work with every awk version or with arbitrarily long sequences:

$ awk 'BEGIN {
    FS=OFS=""               # process each char as an individual field
}
NR==FNR {                   # process the numbers file
    a[$0]                   # hash numbers to a hash
    next
}
/^[ACGT]/ {                 # process sequence file
    for(i=1;i<=NF;i  )      # itetate every field
        if(i in a)          # if i found in a
            $i="N"          # replace char with N
}1' no-file seq-file

Output:

    >sequence
TANGANTGAGNGCTGGACAGGGCTGCGGGAG

CodePudding user response：

Using POSIX awk and substr() to address string indexes:

awk '
FNR==NR {a[c  ] = $0}
FNR!=NR && !/^[[:space:]]*[;>]|^[[:space:]]*$/ {
        for (i in a) {
            n=a[i]
            $0 = substr($0, 1, n-1) "N" substr($0, n 1)
        }

}
FNR!=NR' indexes.txt sequence.fasta