I have a list of nested dictionaries (python 3.9) that looks something like this:
records = [
{'Total:': {'Owner:': {'Available:': {'15 to 34 years': 1242}}}},
{'Total:': {'Owner:': {'Available:': {'35 to 64 years': 5699}}}},
{'Total:': {'Owner:': {'Available:': {'65 years and over': 2098}}}},
{'Total:': {'Owner:': {'No Service:': {'15 to 34 years': 43}}}},
{'Total:': {'Owner:': {'No Service:': {'35 to 64 years': 64}}}},
{'Total:': {'Owner:': {'No Service:': {'65 years and over': 5}}}},
{'Total:': {'Renter:': {'Available:': {'15 to 34 years': 1403}}}},
{'Total:': {'Renter:': {'Available:': {'35 to 64 years': 2059}}}},
{'Total:': {'Renter:': {'Available:': {'65 years and over': 395}}}},
{'Total:': {'Renter:': {'No Service:': {'15 to 34 years': 16}}}},
{'Total:': {'Renter:': {'No Service:': {'35 to 64 years': 24}}}},
{'Total:': {'Renter:': {'No Service:': {'65 years and over': 0}}}},
]
The levels of nesting is not always consistent. The example above has 4 levels (total, owner/renter, available/no service, age group) but there are some examples with a single level and others with as many as 5.
I would like to merge the data in a way that doesn't replace the final dictionary like update()
or {*dict_a, **dict_b}
does.
The final output should look something like this:
combined = {
'Total': {
'Owner': {
'Available': {
'15 to 34 years': 1242,
'35 to 64 years': 5699,
'65 years and over': 2098
},
'No Service:': {
'15 to 34 years': 43,
'35 to 64 years': 64,
'65 years and over': 5
}
},
'Renter': {
'Available': {
'15 to 34 years': 1403,
'35 to 64 years': 2059,
'65 years and over': 395
},
'No Service:': {
'15 to 34 years': 16,
'35 to 64 years': 24,
'65 years and over': 0
}
},
}
}
CodePudding user response:
Recursion is an easy way to navigate and operate on arbitrarily nested structures:
def combine_into(d: dict, combined: dict) -> None:
for k, v in d.items():
if isinstance(v, dict):
combine_into(v, combined.setdefault(k, {}))
else:
combined[k] = v
combined = {}
for record in records:
combine_into(record, combined)
print(combined)
{'Total:': {'Owner:': {'Available:': {'15 to 34 years': 1242, '35 to 64 years': 5699, '65 years and over': 2098}, 'No Service:': {'15 to 34 years': 43, '35 to 64 years': 64, '65 years and over': 5}}, 'Renter:': {'Available:': {'15 to 34 years': 1403, '35 to 64 years': 2059, '65 years and over': 395}, 'No Service:': {'15 to 34 years': 16, '35 to 64 years': 24, '65 years and over': 0}}}}
The general idea here is that each call to combine_into
takes one dict and combines it into the combined
dict -- each value that is itself a dict results in another recursive call, while other values just get copied into combined
as-is.
Note that this will raise an exception (or clobber some data) if some of the records
have disagreements about whether a particular node is a leaf or not!