Home > OS >  Python: replace each of the letter with all possible values in dictionary item
Python: replace each of the letter with all possible values in dictionary item

Time:06-04

I want to make a list of strings, replacing the each of the letters in Amino to all of the strings in the list within the following dictionary items:

Amino = "mvkhdlsr"

dict = {
'f' : ['UUU', 'UUC'],
'l' : ['UUA', 'UUG', 'CUU', 'CUG', 'CUA', 'CUG'],
'i' : ['AUU', 'AUC', 'AUA'],
'm' : ['AUG'],
'v' : ['GUU', 'GUC', 'GUA', 'GUG'],
's' : ['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'],
'p' : ['CCU', 'CCC', 'CCA', 'CCG'],
't' : ['ACU', 'ACC', 'ACA', 'ACG'],
'a' : ['GCU', 'GCC', 'GCA', 'GCG'],
'y' : ['UAU', 'UAC'],
'x' : ['UAA', 'UAG', 'UGA'],
'h' : ['CAU', 'CAC'],
'q' : ['CAA', 'CAG'],
'n' : ['AAU', 'AAC'],
'k' : ['AAA', 'AAG'],
'd' : ['GAU', 'GAC'],
'e' : ['GAA', 'GAG'],
'c' : ['UGU', 'UGC'],
'w' : ['UGG'],
'r' : ['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
'g' : ['GGU', 'GGC', 'GGA', 'GGG']
}

For example, if Amino is "mfy", the desired output is

AUGUUUUAU
AUGUUUUAC
AUGUUCUAU
AUGUUCUAC

since m has only one case (AUG), f has two cases (UUU, UUC), and y also has two cases (UAU, UAC).

I've tried something like

for word in Amino.split():
    if word in dict:
        for key, value in dict.items():
            for i in (0,len(value) - 1):
                for idx in value:

(unfinished code) but could not figure it out.

CodePudding user response:

Try this

from itertools import product
from operator import itemgetter

# itemgetter gets the values where letters in Amino are keys
# product creates Cartesian product from the lists
# join each tuple with "".join
list(map("".join, product(*itemgetter(*Amino)(dict))))

# ['AUGGUUAAACAUGAUUUAUCUCGU',
#  'AUGGUUAAACAUGAUUUAUCUCGC',
#  'AUGGUUAAACAUGAUUUAUCUCGA',
#  'AUGGUUAAACAUGAUUUAUCUCGG',
#  'AUGGUUAAACAUGAUUUAUCUAGA',
#  ...]

For Amino = "mfy", the steps are

itemgetter(*Amino)(dct)
# (['AUG'], ['UUU', 'UUC'], ['UAU', 'UAC'])

list(product(*itemgetter(*Amino)(dct)))
# [('AUG', 'UUU', 'UAU'), ('AUG', 'UUU', 'UAC'), ('AUG', 'UUC', 'UAU'), ('AUG', 'UUC', 'UAC')]

list(map("".join, product(*itemgetter(*Amino)(dct))))
# ['AUGUUUUAU', 'AUGUUUUAC', 'AUGUUCUAU', 'AUGUUCUAC']

CodePudding user response:

Using itertools would be a much faster solution but this would work as well.


def output(text):
    lst = [i for i in d[text[0]]]
    for char in text[1:]:
        val = d[char]
        lst = [seq1   seq2 for seq1 in lst for seq2 in val]
    return lst


[print(i) for i in output('mfy')]

output

AUGUUUUAU
AUGUUUUAC
AUGUUCUAU
AUGUUCUAC

CodePudding user response:

One possible intuitive way of doing this is with recursion, where we basically go through each substring of the target amino and build our possible set of responses.

You could try something like this:

Amino = "mvkhdlsr"

conversions = {
    'f' : ['UUU', 'UUC'],
    'l' : ['UUA', 'UUG', 'CUU', 'CUG', 'CUA', 'CUG'],
    'i' : ['AUU', 'AUC', 'AUA'],
    'm' : ['AUG'],
    'v' : ['GUU', 'GUC', 'GUA', 'GUG'],
    's' : ['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'],
    'p' : ['CCU', 'CCC', 'CCA', 'CCG'],
    't' : ['ACU', 'ACC', 'ACA', 'ACG'],
    'a' : ['GCU', 'GCC', 'GCA', 'GCG'],
    'y' : ['UAU', 'UAC'],
    'x' : ['UAA', 'UAG', 'UGA'],
    'h' : ['CAU', 'CAC'],
    'q' : ['CAA', 'CAG'],
    'n' : ['AAU', 'AAC'],
    'k' : ['AAA', 'AAG'],
    'd' : ['GAU', 'GAC'],
    'e' : ['GAA', 'GAG'],
    'c' : ['UGU', 'UGC'],
    'w' : ['UGG'],
    'r' : ['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    'g' : ['GGU', 'GGC', 'GGA', 'GGG']
}

class AminoSolver:
    def __init__(self, amino_conversions):
        self.amino_conversions = amino_conversions
        
    def get_amino_combinations(self, target_amino):
        # Base cases. If we only have 1 letter in the amino, we just return the list.
        # (i.e. if target_amino is 'a', we return ['GCU', 'GCC', 'GCA', 'GCG']). W/ 0,
        # we return an empty list
        if len(target_amino) == 0:
            return []
        if len(target_amino) == 1: 
            return self.amino_conversions[target_amino]
        
        new_potential_combinations = set()
        # Iterate through each possible codon of the first letter of the current substring
        for possible_codon in self.amino_conversions[target_amino[0]]:
            # Generate all possible combinations for the target amino minus the first letter
            for previous_combination in self.get_amino_combinations(target_amino[1:]):
                # Add the possible codons from the first letter to all of the combinations generated prior
                new_potential_combinations.add(possible_codon   previous_combination)
        return list(new_potential_combinations)
        
a = AminoSolver(conversions)
print(a.get_amino_combinations("mfy"))

(returns ['AUGUUUUAC', 'AUGUUCUAU', 'AUGUUUUAU', 'AUGUUCUAC'])

The recursion here basically works as follows:

  1. Say we have a target amino of length 1. The possible combinations are just the conversions of that letter (i.e. "v" = ['GUU', 'GUC', 'GUA', 'GUG']), so we would just return that.
  2. Imagine now that we have an amino of length 2 (i.e. "hq"). The way that we would do that then is basically by starting with one possible codon for the first letter (in this case lets say "CAU") and then adding to it all of the conversions of the second letter (in this case, we would get "CAU" "CAA" and "CAU" "CAG"). Then, we would repeat that for all possible codons in the first letter to get all possible combinations for the length 2 amino acid (in this case, "CAUCAA", "CAUCAG", "CACCAA", and "CACCAG").
  3. Now, for an amino of length n, we can do a similar process to an amino of length two. Basically, for each possible codon of the first letter of the n length string, we find all possible combinations for the substring from the second letter to the last letter (a string of length n-1), and append them in a similar way to the 2 length amino acid.

Thus, this recursion would get us all possible combinations.

  • Related