I have a list of sequences (for simplicity like the following one)
seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
and I want to use for looping to replace every instance of a nucleotide other than ["A","C","G","T"] with "N"
my code so far
seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
for x in range(len(seqList)):
for i in range(len(seqList[x])):
if seqList[x][i] not in ["A","C","G","T"]:
seqList[x][i].replace(seqList[x][i],"N")
print(seqList)
problem is, the nucleotides are not replaced and nothing changes in the original sequence and i can't figure out the reason!!!
CodePudding user response:
Strings in python are immutable. You can make ot work like this
seqList= ["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"]
for x in range(len(seqList)):
stringl=list(seqList[x])
for i in range(len(seqList[x])):
if seqList[x][i] not in ["A","C","G","T"]:
stringl[i]="N"
seqList[x]="".join(stringl)
CodePudding user response:
An aprouch without looping all letters would be replacing all letters which are not ACGT
def replace_bad(seq):
unique = [
letter
for letter in set(seq)
if letter not in "ACGT"
]
for each in unique:
seq = seq.replace(each, "N")
return seq
if __name__ == '__main__':
for seq in seqList:
print(replace_bad(seq))