I have the following fasta file in a dictionary, in the following shape:
from Bio import SeqIO
alignment_file = '/Users/dissertation/Desktop/Alignment 4 sequences.fasta'
seq_dict = {rec.id : rec.seq for rec in SeqIO.parse(alignment_file, "fasta")}
Which gives me the following input:
{'NC_000962.3': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'NC_008596.1': Seq('------------------------------------------------------...ccg'),
'NC_009525.1': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'NC_002945.4': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}
The only issue here is that I would like to replace the key names for other than easier to identify when comparing the sequences to other parts of my code. So I have tried the following:
name_list = ['Tuberculosis', 'Smegmatis', 'H37Ra', 'Bovis']
for key in seq_dict:
for name in name_list:
seq_dict[name[x]]= seq_dict[key]
seq_dict
However I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/var/folders/pq/ghtv3wj159j681vy0ny3tz9w0000gp/T/ipykernel_47822/1486954832.py in <module>
9
---> 10 for key in seq_dict:
11 for name in name_list:
12 seq_dict[name[x]]= seq_dict[key]
RuntimeError: dictionary changed size during iteration
I understand that there's not an easy straight forward way of updating key names values in a dictionary, but I don't understand the error. Is there a way of doing something similar?
I have also tried this:
seq_dict.update({'NC_000962.3': 'Tuberculosis', 'NC_008596.1': 'Smegmatis', 'NC_009525.1': 'H37Ra', 'NC_002945.4': 'Bovis'})
But this gives me the following output:
{'NC_000962.3': 'Tuberculosis',
'NC_008596.1': 'Smegmatis',
'NC_009525.1': 'H37Ra',
'NC_002945.4': 'Bovis'}
My desire output would look like this:
{'Tuberculosis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'Smegmatis': Seq('------------------------------------------------------...ccg'),
'H37Ra': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'Bovis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}
Does anybody have an idea on how to update these?
CodePudding user response:
Construct a new dictionary and then assign it to seq_dict
in a single operation, rather than mutating seq_dict
as you're in the process of iterating over it. I think this is what you're aiming for:
seq_dict = dict(zip(name_list, seq_dict.values()))
although I'd personally want to have an explicit mapping from sequence IDs to names rather than relying on the ordering being the same.