I have a scenario in which I have a peptide frame having 9 AA. I want to generate all possible peptides by replacing a maximum of 3 AA on this frame ie by replacing only 1 or 2 or 3 AA.
The frame is CKASGFTFS and I want to see all the mutants by replacing a maximum of 3 AA from the pool of 20 AA.
we have a pool of 20 different AA (A,R,N,D,E,G,C,Q,H,I,L,K,M,F,P,S,T,W,Y,V).
I am new to coding so Can someone help me out with how to code for this in Python or Biopython.
output is supposed to be a list of unique sequences like below:
CKASGFTFT, CTTSGFTFS, CTASGKTFS, CTASAFTWS, CTRSGFTFS, CKASEFTFS ....so on so forth getting 1, 2, or 3 substitutions from the pool of AA without changing the existing frame.
CodePudding user response:
Ok so after my code finished, I worked the calculations backwards,
Case1, is 9c1 x 19 = 171
Case2, is 9c2 x 19 x 19 = 12,996
Case3, is 9c3 x 19 x 19 x 19 = 576,156
That's a total of 589,323 combinations.
Here is the code for all 3 cases, you can run them sequentially.
counter=1
# case 1 prints 171 cases
for i in range(len(original)):
for x in range(20):
temp = copy.deepcopy(original)
if temp[i] == possibilities[x]:
pass
else:
temp[i] = possibilities[x]
print(counter, temp)
counter = 1
# case 2 prints 12,996 cases
for i in range(len(original)):
for j in range(i 1,len(original)):
for x in range(len(possibilities)):
for y in range(len(possibilities)):
temp = copy.deepcopy(original)
if temp[i] == possibilities[x] or temp[j] == possibilities[y]:
pass
else:
temp[i] = possibilities[x]
temp[j] = possibilities[y]
print(counter,temp)
counter = 1
# case 3 prints 576,156 cases
for i in range(len(original)):
for j in range(i 1,len(original)):
for k in range(i 2,len(original)):
for x in range(len(possibilities)):
for y in range(len(possibilities)):
for z in range(len(possibilities)):
temp = copy.deepcopy(original)
if temp[i] == possibilities[x] or temp[j] == possibilities[y] or temp[k] == possibilities[z]:
pass
else:
temp[i] = possibilities[x]
temp[j] = possibilities[y]
temp[k] = possibilities[z]
print(counter,temp)
counter = 1
The outputs look like this,
1 ['A', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
2 ['R', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
3 ['N', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
4 ['D', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
5 ['E', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
6 ['G', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
7 ['Q', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
8 ['H', 'K', 'A', 'S', 'G', 'F', 'T', 'F', 'S']
....
....
....
It takes around 10 - 20 minutes to run depending on your computer.
It will display all the combinations, skipping over changing AAs if any one is same as the original in case1 or 2 in case2 or 3 in case 3.
Currently, this code only prints them in python so it is not storage or memory intensive, just CPU intensive, it doesn't save them, saving them is another issue,
You could increase efficiency by replacing the letters with numbers cause they might take lesser space, you could even consider using something like pandas or appending to a csv file in storage.
If you want to save it let me know, I will help you out.
CodePudding user response:
Let's compute the total number of mutations that you are looking for.
Say you want to replace a single AA. Firstly, there are 9 AAs in your frame, each of which can be changed into one of 19 other AA. That's 9 * 19 = 171
If you want to change two AA, there are 9c2 = 36 combinations of AA in your frame, and 19^2 permutations of two of the pool. That gives us 36 * 19^2 = 12996
Finally, if you want to change three, there are 9c3 = 84 combinations and 19^3 permutations of three of the pool. That gives us 84 * 19^3 = 576156
Put it all together and you get 171 12996 576156 = 589323 possible mutations. Hopefully, this helps illustrate the scale of the task you are trying to accomplish!
CodePudding user response:
If an upper bound is enough, I would try to take a different approach:
Instead of computing all possible permutations, just get all possible permutations from the 20-element pool:
- for 1 replacement: 20! / (20 - 1)! = 20
- for 2 replacements: 20! / (20 - 2)! = 380
- for 3 replacements: 20! / (20 - 3)! = 6840
Then, for each permutation, compute the possible arrangements given 9 'positions'. We should respect the relative order of the elements of each tuple. For instance, if the first letter (of a 3-letter tuple) is in the first place, then the second can be between the 2nd place and eighth ([2, 8]). Therefore, for each of those arrangements, the 3rd letter can take ((index of the second letter in the tuple) - 1) positions. Finally, this would give us:
- for 3 replacements: 1 * 2 / 2 2 * 3 / 2 3 * 4 / 2 ... 7 * 8 / 2 = 84
- for 2 replacements: 1 2 ... 8 = 36
- for 1 replacement: 9
We have now 84 possible arrangements for each permutation with 3 replacements in the 20-element tool, 36 for permutations with 2 replacements and only 9 for permutations with one. Then, we would have a total of 6840 * 84 380 * 36 20 * 9 = 588420.
However, this is meant to be taken with a grain of salt since some letters from the 9-element-long string are repeated in the pool.