Home > database >  From dataframe to a nested dictionary
From dataframe to a nested dictionary

Time:05-22

I am experiencing some hard time trying in creating a nested dictionary from a dataframe in python and would really appreciate your help.This is what the data looks like.

tag id cat
0 PA14_00030 GO:0005524 F
1 PA14_00030 GO:0006281 P
2 PA14_00050 GO:0003677 F
3 PA14_00050 GO:0003918 F
4 PA14_00050 GO:0005524 F
5 PA14_00050 GO:0006265 P
6 PA14_00060 GO:0016746 F
7 PA14_00070 GO:0005975 C
8 PA14_00080 GO:0009055 C

I want to create a nested dictionary that will look something like this:

{'C':{'PA14_00080':{'GO:0009055'},'PA14_00070':{'GO:0005975'}},
'F':{'PA14_0003': {'GO:0005524'}, 'PA14_00050': {'GO:0003677', 'GO:0003918','GO:0005524'}}, 
'P':{PA14_00050:{GO:0006265}, PA14_00030:{GO:0006281}}}

Thank you for your help.

CodePudding user response:

If I understand you correctly, you can do:

out = {
    k: {
        row["tag"]: set(g.loc[g["tag"] == row["tag"], "id"].to_list())
        for _, row in g.iterrows()
    }
    for k, g in df.groupby("cat")
}

print(out)

Prints:

{
    "C": {"PA14_00070": {"GO:0005975"}, "PA14_00080": {"GO:0009055"}},
    "F": {
        "PA14_00030": {"GO:0005524"},
        "PA14_00050": {"GO:0003918", "GO:0005524", "GO:0003677"},
        "PA14_00060": {"GO:0016746"},
    },
    "P": {"PA14_00030": {"GO:0006281"}, "PA14_00050": {"GO:0006265"}},
}
  • Related