I have some data with duplicates that looks like this:
WEBPAGE | ID | VALUE |
---|---|---|
Webpage 1 | ID 1 | Value 1 |
Webpage 1 | ID 1 | Value 2 |
Webpage 1 | ID 1 | Value 3 |
Webpage 1 | ID 2 | Value 4 |
Webpage 1 | ID 2 | Value 5 |
Each webpage can have more than 1 ID associated with it and each ID can have more than one value associated with it.
I'd like to ideally have a nested dictionary with lists to handle the multiple IDs and multiple values:
{WEBPAGE: {ID 1: [value 1, value 2, value 3], ID 2: [value 4, value 5]}}
I've tried using to_dict and group_by but I can't seem to find the right syntax to create a nested dictionary within those.
CodePudding user response:
Try:
out = {}
for _, x in df.iterrows():
out.setdefault(x["WEBPAGE"], {}).setdefault(x["ID"], []).append(x["VALUE"])
print(out)
Prints:
{
"Webpage 1": {
"ID 1": ["Value 1", "Value 2", "Value 3"],
"ID 2": ["Value 4", "Value 5"],
}
}
CodePudding user response:
For a pandas approach, you just need to use a nested groupby
:
d = (df.groupby('WEBPAGE')
.apply(lambda g: g.groupby('ID')['VALUE'].agg(list).to_dict())
.to_dict()
)
output:
{'Webpage 1': {'ID 1': ['Value 1', 'Value 2', 'Value 3'],
'ID 2': ['Value 4', 'Value 5']}}