I have a list of dictionaries in which keys are "group_names" and values are gene_lists.
I want to update each dictionary with a new list of genes by looping through a species_list.
Here is my pseudocode:
groups=["group1", "group2"]
species_list=["spA", "spB"]
def get_genes(group,sp)
return gene_list
for sp in species_list:
for group in groups:
gene_list[group]=get_genes(group,sp)
gene_list.update(get_genes(group,sp))
The problem with this code is that new genes are replaced/overwritten by the previous ones instead of being added to the dictionary. My question is where should I put the following line. Although, I'm not sure if this is the only problem.
gene_list.update(get_genes(group,sp))
The data I have looks like this dataframe:
data={"group1":["geneA1", "geneA2"],
"group2":[ "geneB1","geneB2"]}
pd.DataFrame.from_dict(data).T
The data I want to create should look like this:
data={"group1":["geneA1", "geneA2", "geneX"],
"group2":[ "geneB1","geneB2", "geneX"]}
pd.DataFrame.from_dict(data).T
So in this case, "gene_x" refers to the new genes obtained by the get_genes function for each species and finally updated to the existing dictionary.
Any help would be much appreciated!!
CodePudding user response:
You need to append to the list in the dictionary entry, not assign it.
Use setdefault()
to provide a default empty list if the dictionary key doesn't exist yet.
for sp in species_list:
for group in groups:
gene_list.setdefault(group, []).extend(get_genes(group, sp))
CodePudding user response:
From what I understand, you want to append new gene to each key, in order to do that:
new_gene = "gene_x"
data={"group1":["geneA1", "geneA2"], "group2":[ "geneB1","geneB2"]}
for value in data.values():
value.append(new_gene)
print(data)
You can also use defaultdict where you can append directly (read the docs for that).