Nested Dictionary using Pandas DataFrame-CodePudding

I have some data with duplicates that looks like this:

Each webpage can have more than 1 ID associated with it and each ID can have more than one value associated with it.

I'd like to ideally have a nested dictionary with lists to handle the multiple IDs and multiple values:

{WEBPAGE: {ID 1: [value 1, value 2, value 3], ID 2: [value 4, value 5]}}

I've tried using to_dict and group_by but I can't seem to find the right syntax to create a nested dictionary within those.

CodePudding user response：

Try:

out = {}
for _, x in df.iterrows():
    out.setdefault(x["WEBPAGE"], {}).setdefault(x["ID"], []).append(x["VALUE"])

print(out)

Prints:

{
    "Webpage 1": {
        "ID 1": ["Value 1", "Value 2", "Value 3"],
        "ID 2": ["Value 4", "Value 5"],
    }
}

CodePudding user response：

For a pandas approach, you just need to use a nested groupby:

d = (df.groupby('WEBPAGE')
       .apply(lambda g: g.groupby('ID')['VALUE'].agg(list).to_dict())
       .to_dict()
    )

output:

{'Webpage 1': {'ID 1': ['Value 1', 'Value 2', 'Value 3'],
               'ID 2': ['Value 4', 'Value 5']}}