Home > Mobile >  Update key name in a dictionary python
Update key name in a dictionary python

Time:03-29

I have the following fasta file in a dictionary, in the following shape:

from Bio import SeqIO

alignment_file = '/Users/dissertation/Desktop/Alignment 4 sequences.fasta'

seq_dict = {rec.id : rec.seq for rec in SeqIO.parse(alignment_file, "fasta")}

Which gives me the following input:

{'NC_000962.3': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'NC_008596.1': Seq('------------------------------------------------------...ccg'),
 'NC_009525.1': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'NC_002945.4': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}

The only issue here is that I would like to replace the key names for other than easier to identify when comparing the sequences to other parts of my code. So I have tried the following:

name_list = ['Tuberculosis', 'Smegmatis', 'H37Ra', 'Bovis']

for key in seq_dict:
    for name in name_list:
        seq_dict[name[x]]= seq_dict[key]
    
seq_dict

However I get the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/var/folders/pq/ghtv3wj159j681vy0ny3tz9w0000gp/T/ipykernel_47822/1486954832.py in <module>
      9
---> 10 for key in seq_dict:
     11     for name in name_list:
     12         seq_dict[name[x]]= seq_dict[key]

RuntimeError: dictionary changed size during iteration

I understand that there's not an easy straight forward way of updating key names values in a dictionary, but I don't understand the error. Is there a way of doing something similar?

I have also tried this:

seq_dict.update({'NC_000962.3': 'Tuberculosis', 'NC_008596.1': 'Smegmatis', 'NC_009525.1': 'H37Ra', 'NC_002945.4': 'Bovis'})

But this gives me the following output:

{'NC_000962.3': 'Tuberculosis',
 'NC_008596.1': 'Smegmatis',
 'NC_009525.1': 'H37Ra',
 'NC_002945.4': 'Bovis'}

My desire output would look like this:

{'Tuberculosis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'Smegmatis': Seq('------------------------------------------------------...ccg'),
 'H37Ra': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'Bovis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}

Does anybody have an idea on how to update these?

CodePudding user response:

Construct a new dictionary and then assign it to seq_dict in a single operation, rather than mutating seq_dict as you're in the process of iterating over it. I think this is what you're aiming for:

seq_dict = dict(zip(name_list, seq_dict.values()))

although I'd personally want to have an explicit mapping from sequence IDs to names rather than relying on the ordering being the same.

  • Related