There will be a bit of molecular biology here.
So I need to generate 1000 mutant sequences bsed on a primary 1000-nucleotide sequence. Every following mutant sequence must have one random nucleotide switched to one of the same class (A to G and vice-versa; T to C and vice-versa) compared to the preceding sequence. Also, random.randint
and random.seed(1)
must be used.
Here's what I have so far:
import random
# below is the initial sequence
seq = 'CGCCTGTAATCCCAGCACTCTGGGAGGCAGAGGTGGGCCGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACACCATCTCTACTAAAAACACAAAAATTAGCCAGGTGTGGTGGCAGGCACCTGCAGTCCCAGCTACTCCGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCAGGGGTTGCAGTGAGCCGACATGGCGCCACTGCACTCCAGTCTGGGCGACAGAGTGAGACCCTATCTCAAAAAAAAAAAAAAAAAAAAAAGACCCAACTCAAGTATCATCTCCAGGAAGCCTTCCCCTACTCCCAGCAATTAAATGCTCCTCAGAGAATTCCCATTTTTGGTTTACTCTTTGGTTTACCTCCAGACAGGAAGCCCCCACTGACACTGTTGTAGTCCCAGGGTGCAACACAAAGCAGAGATCACAAGCTGAGTTTAATAATTGCTTGTGGAATACATGTCCCAAGCCACCTCCTGCAGGAAGCCCTTCCAGATGCCCATTCTAGCCAGTCTGGCTCTTTGCTTCCATACCTTCACAACACTTGTGCCTCCCCCAGGGCCTCTTTCTCATCTTGCTTTCTGGGGCAGCTGTGTGCACATTTGTCTGTGTGCAGCAACTCTCTAAGGCAGGGATTTTTACTCCTATTTTTGATGAGGGGAGCTGTGGCTCAGAGAGGTTGAATAACCTAAGGCCACACAGTGAGTGGCAGAGCCAGGAATGTGACTTGGGTCCATTTGAATCCAAAGTCCCTGTACTTTCCACTGCCCTACCTAGATGTCCCTGTACCTCCTATAAAATCAGCATGGAGCCTGGTGCCTGGTAGTCCCTACAAATATTCACAAATTGGAGCTTAGCTCAGCTCTCAGGCAAGGCCCAGGTCAAAAGGGCAGATACAGCTTTGGGACCTTAGTTGCCACCACATGCCATACCTTCTTCCCAGCAGAAGGACTCCCTCCAAGACAGGGTAGGGGTGGAGG'
n = 0
while n <= 1000: # setting up a cycle for 1000 mutations
i = random.randint(0, 1001) # choosing random nucleotide to switch
if seq[i] == 'A':
print(seq.replace('A', 'G', 1)) # the third argumunt is supposed to show how many times a nucleotide must be switched but it does't work for some reason
elif seq[i] == 'G':
print(seq.replace('G', 'A', 1))
elif seq[i] == 'C':
print(seq.replace('C', 'T', 1))
elif seq[i] == 'T':
print(seq.replace('T', 'C', 1))
n = n 1
The main problems I encountered is getting the program to generate new mutations based on the previous sequence, not the original one and only substituting one nucleotide.
CodePudding user response:
You need to update the sequence of nucleotides in the loop. Strings cannot be changed, so I'd recommend just using a list of letters to start:
import random
# Since string cannot be changed/mutated,
# break up sequence into a list of strings
seq = list('CGCCTGTAATCCCAGCACTCTGGGAGGCAGAGGTGGGCCGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACACCATCTCTACTAAAAACACAAAAATTAGCCAGGTGTGGTGGCAGGCACCTGCAGTCCCAGCTACTCCGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCAGGGGTTGCAGTGAGCCGACATGGCGCCACTGCACTCCAGTCTGGGCGACAGAGTGAGACCCTATCTCAAAAAAAAAAAAAAAAAAAAAAGACCCAACTCAAGTATCATCTCCAGGAAGCCTTCCCCTACTCCCAGCAATTAAATGCTCCTCAGAGAATTCCCATTTTTGGTTTACTCTTTGGTTTACCTCCAGACAGGAAGCCCCCACTGACACTGTTGTAGTCCCAGGGTGCAACACAAAGCAGAGATCACAAGCTGAGTTTAATAATTGCTTGTGGAATACATGTCCCAAGCCACCTCCTGCAGGAAGCCCTTCCAGATGCCCATTCTAGCCAGTCTGGCTCTTTGCTTCCATACCTTCACAACACTTGTGCCTCCCCCAGGGCCTCTTTCTCATCTTGCTTTCTGGGGCAGCTGTGTGCACATTTGTCTGTGTGCAGCAACTCTCTAAGGCAGGGATTTTTACTCCTATTTTTGATGAGGGGAGCTGTGGCTCAGAGAGGTTGAATAACCTAAGGCCACACAGTGAGTGGCAGAGCCAGGAATGTGACTTGGGTCCATTTGAATCCAAAGTCCCTGTACTTTCCACTGCCCTACCTAGATGTCCCTGTACCTCCTATAAAATCAGCATGGAGCCTGGTGCCTGGTAGTCCCTACAAATATTCACAAATTGGAGCTTAGCTCAGCTCTCAGGCAAGGCCCAGGTCAAAAGGGCAGATACAGCTTTGGGACCTTAGTTGCCACCACATGCCATACCTTCTTCCCAGCAGAAGGACTCCCTCCAAGACAGGGTAGGGGTGGAGG')
for n in range(1000): # setting up a cycle for 1000 mutations
i = random.randint(0, len(seq)-1) # choosing random nucleotide to switch
print(i)
print(seq[i]) # i-th nucleotide before the mutation
if seq[i] == 'A':
seq[i] = 'G'
elif seq[i] == 'G':
seq[i] = 'A'
elif seq[i] == 'C':
seq[i] = 'T'
elif seq[i] == 'T':
seq[i] = 'C'
print(seq[i]). # i-th nucleotide after the mutation
print(''.join(seq)) # join nucleotides into a string for printing
CodePudding user response:
random.randrange()
is more appropriate to choose a random index in a fixed list, and str.maketrans()
and str.translate()
are a faster and more straightforward way to translate one letter to another.
Make sure to replace the character in the previous iteration each time. Strings are immutable, so use a mutable list:
import random
xlat = str.maketrans('AGCT','GATC')
#seq = list('CGCCTGTAATCCCAGCACTCTGGGAGGCAGAGGTGGGCCGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACACCATCTCTACTAAAAACACAAAAATTAGCCAGGTGTGGTGGCAGGCACCTGCAGTCCCAGCTACTCCGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCAGGGGTTGCAGTGAGCCGACATGGCGCCACTGCACTCCAGTCTGGGCGACAGAGTGAGACCCTATCTCAAAAAAAAAAAAAAAAAAAAAAGACCCAACTCAAGTATCATCTCCAGGAAGCCTTCCCCTACTCCCAGCAATTAAATGCTCCTCAGAGAATTCCCATTTTTGGTTTACTCTTTGGTTTACCTCCAGACAGGAAGCCCCCACTGACACTGTTGTAGTCCCAGGGTGCAACACAAAGCAGAGATCACAAGCTGAGTTTAATAATTGCTTGTGGAATACATGTCCCAAGCCACCTCCTGCAGGAAGCCCTTCCAGATGCCCATTCTAGCCAGTCTGGCTCTTTGCTTCCATACCTTCACAACACTTGTGCCTCCCCCAGGGCCTCTTTCTCATCTTGCTTTCTGGGGCAGCTGTGTGCACATTTGTCTGTGTGCAGCAACTCTCTAAGGCAGGGATTTTTACTCCTATTTTTGATGAGGGGAGCTGTGGCTCAGAGAGGTTGAATAACCTAAGGCCACACAGTGAGTGGCAGAGCCAGGAATGTGACTTGGGTCCATTTGAATCCAAAGTCCCTGTACTTTCCACTGCCCTACCTAGATGTCCCTGTACCTCCTATAAAATCAGCATGGAGCCTGGTGCCTGGTAGTCCCTACAAATATTCACAAATTGGAGCTTAGCTCAGCTCTCAGGCAAGGCCCAGGTCAAAAGGGCAGATACAGCTTTGGGACCTTAGTTGCCACCACATGCCATACCTTCTTCCCAGCAGAAGGACTCCCTCCAAGACAGGGTAGGGGTGGAGG')
#mutations = 1000
seq = list('CGCCTGTAAT')
mutations = 5
random.seed(1)
for _ in range(mutations):
i = random.randrange(len(seq)) # equivalent to random.randint(0,len(seq)-1)
seq[i] = seq[i].translate(xlat) # translate and replace at index
print(''.join(seq))
Output:
CGTCTGTAAT
CGTCTGTAAC
CATCTGTAAC
CATCCGTAAC
CGTCCGTAAC