I have created a mutate_v1 function that generates random mutations in a DNA sequence.
def mutate_v1(sequence, mutation_rate):
dna_list = list(sequence)
for i in range(len(sequence)):
r = random.random()
if r < mutation_rate:
mutation_site = random.randint(0, len(dna_list) - 1)
dna_list[mutation_site] = random.choice(list('ATCG'))
return ''.join(dna_list)
If I apply my function to all elements of G0
I get a new generation (G1
) of mutants (a list of mutated sequences).
G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
G1 = [mutate_v1(s,0.01) for s in G0]
#G1
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
How can I repeat my function up to G20 (20 generations)?
I can do it manually like the following
G1 = [mutate_v1(s,0.01) for s in G0]
G2 = [mutate_v1(s,0.01) for s in G1]
G3 = [mutate_v1(s,0.01) for s in G2]
G4 = [mutate_v1(s,0.01) for s in G3]
G5 = [mutate_v1(s,0.01) for s in G4]
G6 = [mutate_v1(s,0.01) for s in G5]
G7 = [mutate_v1(s,0.01) for s in G6]
But I'm sure a for loop would be better. I have tested several codes but without results.
Some one can help please?
CodePudding user response:
Use range to iterate up to the number of generations, and store each generation in a list, each generation is the result of mutating the previous one:
G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generations = [G0]
for _ in range(20):
previous_generation = generations[-1]
generations.append([mutate_v1(s, 0.01) for s in previous_generation])
# then you can access by index to a generation
print(generations[1]) # access generation 1
print(generations[20]) # access generation 20
Output
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAT']
CodePudding user response:
Dani's answer is a nice simple solution, but I wanted to demonstrate another approach using a slightly more advanced programming technique in Python, generator functions:
def mutation_generator(g0):
g = g0.copy()
while True:
yield g
g = [mutate_v1(seq, 0.01) for seq in g]
Right now, mutation_generator
is an infinite sequence generator, meaning that you could theoretically continue evolving your sequence indefinitely. If you want to grab 20 generations:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
twenty_generations = [next(generation) for _ in range(20)]
The nice thing about this generator is that we can start it back up where it left off at any point. Say you've done some analysis on the first twenty generations, and now you want to see what happens over the next hundred:
next_hundred = [next(generation) for _ in range(100)]
Now, we could've initialized a new generator, using the last generation from twenty_generations
as the initial generation of the new generator, but that's not necessary, since our generation
generator simply left off at 20 generations and is ready to go on mutating whenever you call next(generation)
.
This opens up a LOT of possibilities, including sending new mutation rate parameters, or even, if you want, entirely new mutation functions. Really, anything you want.
The other benefit here is that you can run multiple generators on the same initial sequence and observe how they diverge. Note this is totally possible with the more traditional approach of using a for
loop in a function, but the benefit of using the generators is that you don't have to generate an entire sequence at once; it only mutates when you tell it to (via next()
). For example:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
universe_1 = mutation_generator(g0)
universe_2 = mutation_generator(g0)
universe_3 = mutation_generator(g0)
# The first generation is always the same as g0, but this can be modified if you desire
next(universe_1)
next(universe_2)
next(universe_3)
# Compare the first mutation without having to calculate twenty generations in each 'universe' before getting back results
first_mutation_u1 = next(universe_1)
first_mutation_u2 = next(universe_2)
first_mutation_u3 = next(universe_3)
Again, you can also modify the generator function mutation_generator
to accept other parameters, like custom mutation functions, or even make it possible to change the mutation rate at any time, etc.
Finally, just as a side note, using a generator makes it very easy to skip thousands of generations without needing to store more than one sequence in memory:
g0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
generation = mutation_generator(g0)
for _ in range(10000):
next(generation)
print(g0) # first gen
print(next(generation)) # ten thousand generations later
Output:
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
['TTGGA', 'CTTCG', 'TGTGA', 'TAACA', 'CATCG']
With a for
loop-based approach, you would've had to either create and store all 10000 generations (wasting a lot of memory), or modify the code in Dani's answer to behave more like a generator (but without the benefits!).
Real Python has a good article on generators if you want to learn more. And of course, check out the docs as well.