Home > Software design >  How to print out a 1000 new strings with a single replaced element based on a 1000-symbol string?
How to print out a 1000 new strings with a single replaced element based on a 1000-symbol string?

Time:12-27

There will be a bit of molecular biology here.

So I need to generate 1000 mutant sequences bsed on a primary 1000-nucleotide sequence. Every following mutant sequence must have one random nucleotide switched to one of the same class (A to G and vice-versa; T to C and vice-versa) compared to the preceding sequence. Also, random.randint and random.seed(1) must be used.

Here's what I have so far:

import random
# below is the initial sequence
seq = 'CGCCTGTAATCCCAGCACTCTGGGAGGCAGAGGTGGGCCGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACACCATCTCTACTAAAAACACAAAAATTAGCCAGGTGTGGTGGCAGGCACCTGCAGTCCCAGCTACTCCGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCAGGGGTTGCAGTGAGCCGACATGGCGCCACTGCACTCCAGTCTGGGCGACAGAGTGAGACCCTATCTCAAAAAAAAAAAAAAAAAAAAAAGACCCAACTCAAGTATCATCTCCAGGAAGCCTTCCCCTACTCCCAGCAATTAAATGCTCCTCAGAGAATTCCCATTTTTGGTTTACTCTTTGGTTTACCTCCAGACAGGAAGCCCCCACTGACACTGTTGTAGTCCCAGGGTGCAACACAAAGCAGAGATCACAAGCTGAGTTTAATAATTGCTTGTGGAATACATGTCCCAAGCCACCTCCTGCAGGAAGCCCTTCCAGATGCCCATTCTAGCCAGTCTGGCTCTTTGCTTCCATACCTTCACAACACTTGTGCCTCCCCCAGGGCCTCTTTCTCATCTTGCTTTCTGGGGCAGCTGTGTGCACATTTGTCTGTGTGCAGCAACTCTCTAAGGCAGGGATTTTTACTCCTATTTTTGATGAGGGGAGCTGTGGCTCAGAGAGGTTGAATAACCTAAGGCCACACAGTGAGTGGCAGAGCCAGGAATGTGACTTGGGTCCATTTGAATCCAAAGTCCCTGTACTTTCCACTGCCCTACCTAGATGTCCCTGTACCTCCTATAAAATCAGCATGGAGCCTGGTGCCTGGTAGTCCCTACAAATATTCACAAATTGGAGCTTAGCTCAGCTCTCAGGCAAGGCCCAGGTCAAAAGGGCAGATACAGCTTTGGGACCTTAGTTGCCACCACATGCCATACCTTCTTCCCAGCAGAAGGACTCCCTCCAAGACAGGGTAGGGGTGGAGG'
n = 0
while n <= 1000: # setting up a cycle for 1000 mutations
    i = random.randint(0, 1001) # choosing  random nucleotide to switch
    if seq[i] == 'A':
        print(seq.replace('A', 'G', 1)) # the third argumunt is supposed to show how many times a nucleotide must be switched but it does't work for some reason
    elif seq[i] == 'G':
        print(seq.replace('G', 'A', 1))
    elif seq[i] == 'C':
        print(seq.replace('C', 'T', 1))
    elif seq[i] == 'T':
        print(seq.replace('T', 'C', 1))
    n = n   1

The main problems I encountered is getting the program to generate new mutations based on the previous sequence, not the original one and only substituting one nucleotide.

CodePudding user response:

You need to update the sequence of nucleotides in the loop. Strings cannot be changed, so I'd recommend just using a list of letters to start:

import random

# Since string cannot be changed/mutated, 
# break up sequence into a list of strings
seq = list('CGCCTGTAATCCCAGCACTCTGGGAGGCAGAGGTGGGCCGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACACCATCTCTACTAAAAACACAAAAATTAGCCAGGTGTGGTGGCAGGCACCTGCAGTCCCAGCTACTCCGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCAGGGGTTGCAGTGAGCCGACATGGCGCCACTGCACTCCAGTCTGGGCGACAGAGTGAGACCCTATCTCAAAAAAAAAAAAAAAAAAAAAAGACCCAACTCAAGTATCATCTCCAGGAAGCCTTCCCCTACTCCCAGCAATTAAATGCTCCTCAGAGAATTCCCATTTTTGGTTTACTCTTTGGTTTACCTCCAGACAGGAAGCCCCCACTGACACTGTTGTAGTCCCAGGGTGCAACACAAAGCAGAGATCACAAGCTGAGTTTAATAATTGCTTGTGGAATACATGTCCCAAGCCACCTCCTGCAGGAAGCCCTTCCAGATGCCCATTCTAGCCAGTCTGGCTCTTTGCTTCCATACCTTCACAACACTTGTGCCTCCCCCAGGGCCTCTTTCTCATCTTGCTTTCTGGGGCAGCTGTGTGCACATTTGTCTGTGTGCAGCAACTCTCTAAGGCAGGGATTTTTACTCCTATTTTTGATGAGGGGAGCTGTGGCTCAGAGAGGTTGAATAACCTAAGGCCACACAGTGAGTGGCAGAGCCAGGAATGTGACTTGGGTCCATTTGAATCCAAAGTCCCTGTACTTTCCACTGCCCTACCTAGATGTCCCTGTACCTCCTATAAAATCAGCATGGAGCCTGGTGCCTGGTAGTCCCTACAAATATTCACAAATTGGAGCTTAGCTCAGCTCTCAGGCAAGGCCCAGGTCAAAAGGGCAGATACAGCTTTGGGACCTTAGTTGCCACCACATGCCATACCTTCTTCCCAGCAGAAGGACTCCCTCCAAGACAGGGTAGGGGTGGAGG')

for n in range(1000):  # setting up a cycle for 1000 mutations
    i = random.randint(0, len(seq)-1)  # choosing  random nucleotide to switch

    print(i)
    print(seq[i])  # i-th nucleotide before the mutation

    if seq[i] == 'A':
        seq[i] = 'G'
    elif seq[i] == 'G':
        seq[i] = 'A'
    elif seq[i] == 'C':
        seq[i] = 'T'
    elif seq[i] == 'T':
        seq[i] = 'C'

    print(seq[i]). # i-th nucleotide after the mutation

    print(''.join(seq))  # join nucleotides into a string for printing

CodePudding user response:

random.randrange() is more appropriate to choose a random index in a fixed list, and str.maketrans() and str.translate() are a faster and more straightforward way to translate one letter to another.

Make sure to replace the character in the previous iteration each time. Strings are immutable, so use a mutable list:

import random

xlat = str.maketrans('AGCT','GATC')
#seq = list('CGCCTGTAATCCCAGCACTCTGGGAGGCAGAGGTGGGCCGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGGCAACATGGTGAAACACCATCTCTACTAAAAACACAAAAATTAGCCAGGTGTGGTGGCAGGCACCTGCAGTCCCAGCTACTCCGGAGGCTGAGGCAGGAGAATTGCTCGAACCTGGGAGGCAGGGGTTGCAGTGAGCCGACATGGCGCCACTGCACTCCAGTCTGGGCGACAGAGTGAGACCCTATCTCAAAAAAAAAAAAAAAAAAAAAAGACCCAACTCAAGTATCATCTCCAGGAAGCCTTCCCCTACTCCCAGCAATTAAATGCTCCTCAGAGAATTCCCATTTTTGGTTTACTCTTTGGTTTACCTCCAGACAGGAAGCCCCCACTGACACTGTTGTAGTCCCAGGGTGCAACACAAAGCAGAGATCACAAGCTGAGTTTAATAATTGCTTGTGGAATACATGTCCCAAGCCACCTCCTGCAGGAAGCCCTTCCAGATGCCCATTCTAGCCAGTCTGGCTCTTTGCTTCCATACCTTCACAACACTTGTGCCTCCCCCAGGGCCTCTTTCTCATCTTGCTTTCTGGGGCAGCTGTGTGCACATTTGTCTGTGTGCAGCAACTCTCTAAGGCAGGGATTTTTACTCCTATTTTTGATGAGGGGAGCTGTGGCTCAGAGAGGTTGAATAACCTAAGGCCACACAGTGAGTGGCAGAGCCAGGAATGTGACTTGGGTCCATTTGAATCCAAAGTCCCTGTACTTTCCACTGCCCTACCTAGATGTCCCTGTACCTCCTATAAAATCAGCATGGAGCCTGGTGCCTGGTAGTCCCTACAAATATTCACAAATTGGAGCTTAGCTCAGCTCTCAGGCAAGGCCCAGGTCAAAAGGGCAGATACAGCTTTGGGACCTTAGTTGCCACCACATGCCATACCTTCTTCCCAGCAGAAGGACTCCCTCCAAGACAGGGTAGGGGTGGAGG')
#mutations = 1000
seq = list('CGCCTGTAAT')
mutations = 5

random.seed(1)
for _ in range(mutations):
    i = random.randrange(len(seq))   # equivalent to random.randint(0,len(seq)-1)
    seq[i] = seq[i].translate(xlat)  # translate and replace at index
    print(''.join(seq))

Output:

CGTCTGTAAT
CGTCTGTAAC
CATCTGTAAC
CATCCGTAAC
CGTCCGTAAC
  • Related