Home > database >  Code can't find keys in dictionary when prompted with string sequence
Code can't find keys in dictionary when prompted with string sequence

Time:11-01

Newbie at coding, doing this for university. I have written a dictionary which translates codons into single letter amino acids. However, my function can't find the keys in the dict and just adds an X to the list I've made. See code below:

codon_table = {('TTT', 'TTC'): 'F',
               ('TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'): 'L',
               ('ATT', 'ATC', 'ATA'): 'I',
               ('ATG'): 'M',
               ('GTT', 'GTC', 'GTA', 'GTG'): 'V',
               ('TCT', 'TCC', 'TCA', 'TCG'): 'S',
               ('CCT', 'CCC', 'CCA', 'CCG'): 'P',
               ('ACT', 'ACC', 'ACA', 'ACG'): 'T',
               ('GCT', 'GCC', 'GCA', 'GCG'): 'A',
               ('TAT', 'TAC'): 'Y',
               ('CAT', 'CAC'): 'H',
               ('CAA', 'CAG'): 'Q',
               ('AAT', 'AAC'): 'N',
               ('AAA', 'AAG'): 'K',
               ('GAT', 'GAC'): 'D',
               ('GAA', 'GAG'): 'E',
               ('TGT', 'TGC'): 'C',
               ('TGG'): 'W',
               ('CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'): 'R',
               ('AGT', 'AGC'): 'S',
               ('GGT', 'GGC', 'GGA', 'GGG'): 'G',
               ('TAA', 'TAG', 'TGA'): '*',
               }
AA_seq = []

input_DNA = str(input('Please input a DNA string: '))

def translate_dna():
    list(input_DNA)
    global AA_seq
    for codon in range(0, len(input_DNA), 3):
        if codon in codon_table:
            AA_seq = codon_table[codon]
            AA_seq.append(codon_table[codon])
        else:
            AA_seq.append('X')
    print(str(' '.join(AA_seq)).strip('[]').replace("'", ""))

translate_dna()

Inputted a DNA sequence, eg TGCATGCTACGTAGCGGACCTGG, which would only return XXXXXXX. What I would expect is a string of single letters corresponding to the amino acids in the dict.

I've been staring at it for the best part of an hour, so I figured it's time to ask the experts. Thanks in advance.

CodePudding user response:

You need a codon dictionary keyed on single codons.

Then you need to iterate over the inout sequence in groups of 3.

You also need to decide what the output should look like if a triplet is not found in your lookup dictionary.

For example:

codon_table = {('TTT', 'TTC'): 'F',
               ('TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'): 'L',
               ('ATT', 'ATC', 'ATA'): 'I',
               ('ATG'): 'M',
               ('GTT', 'GTC', 'GTA', 'GTG'): 'V',
               ('TCT', 'TCC', 'TCA', 'TCG'): 'S',
               ('CCT', 'CCC', 'CCA', 'CCG'): 'P',
               ('ACT', 'ACC', 'ACA', 'ACG'): 'T',
               ('GCT', 'GCC', 'GCA', 'GCG'): 'A',
               ('TAT', 'TAC'): 'Y',
               ('CAT', 'CAC'): 'H',
               ('CAA', 'CAG'): 'Q',
               ('AAT', 'AAC'): 'N',
               ('AAA', 'AAG'): 'K',
               ('GAT', 'GAC'): 'D',
               ('GAA', 'GAG'): 'E',
               ('TGT', 'TGC'): 'C',
               ('TGG'): 'W',
               ('CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'): 'R',
               ('AGT', 'AGC'): 'S',
               ('GGT', 'GGC', 'GGA', 'GGG'): 'G',
               ('TAA', 'TAG', 'TGA'): '*',
               }
lookup = {}

for k, v in codon_table.items():
    if isinstance(k, tuple):
        for c in k:
            lookup[c] = v
    else:
        lookup[k] = v

sequence = 'TGCATGCTACGTAGCGGACCTGG'

AA_Seq = []

for i in range(0, len(sequence), 3):
    AA_Seq.append(lookup.get(sequence[i:i 3], '?'))

print(AA_Seq)

Output:

['C', 'M', 'L', 'R', 'S', 'G', 'P', '?']

Note:

The ? appears because the last item extracted from the input sequence is 'GG' which is not a valid codon.

Also note that the key/value pair in codon_table of ('ATG'): 'M' is not a tuple/string pair. ('ATG') is just a string (the parentheses are irrelevant). You could write it as ('ATG',): 'M' to make the key a 1-tuple

CodePudding user response:

Your for loop goes through input and inside it can't find any matches and appends "X" to your AA_seq

This is because

  • you are trying to access only 1 element in the input string rather than 3

  • your dictionary keys are tuples, which means "TTT" is not the same thing as ("TTT",)

To fix this:

  • You have to reorder your dictionary to only use single value for key instead of a tuple.
  • You have to loop through your input such as [i:i 3] to get a string length of three
  • Related