I am experiencing some hard time trying in creating a nested dictionary from a dataframe in python and would really appreciate your help.This is what the data looks like.
tag | id | cat | |
---|---|---|---|
0 | PA14_00030 | GO:0005524 | F |
1 | PA14_00030 | GO:0006281 | P |
2 | PA14_00050 | GO:0003677 | F |
3 | PA14_00050 | GO:0003918 | F |
4 | PA14_00050 | GO:0005524 | F |
5 | PA14_00050 | GO:0006265 | P |
6 | PA14_00060 | GO:0016746 | F |
7 | PA14_00070 | GO:0005975 | C |
8 | PA14_00080 | GO:0009055 | C |
I want to create a nested dictionary that will look something like this:
{'C':{'PA14_00080':{'GO:0009055'},'PA14_00070':{'GO:0005975'}},
'F':{'PA14_0003': {'GO:0005524'}, 'PA14_00050': {'GO:0003677', 'GO:0003918','GO:0005524'}},
'P':{PA14_00050:{GO:0006265}, PA14_00030:{GO:0006281}}}
Thank you for your help.
CodePudding user response:
If I understand you correctly, you can do:
out = {
k: {
row["tag"]: set(g.loc[g["tag"] == row["tag"], "id"].to_list())
for _, row in g.iterrows()
}
for k, g in df.groupby("cat")
}
print(out)
Prints:
{
"C": {"PA14_00070": {"GO:0005975"}, "PA14_00080": {"GO:0009055"}},
"F": {
"PA14_00030": {"GO:0005524"},
"PA14_00050": {"GO:0003918", "GO:0005524", "GO:0003677"},
"PA14_00060": {"GO:0016746"},
},
"P": {"PA14_00030": {"GO:0006281"}, "PA14_00050": {"GO:0006265"}},
}