I'm trying to get a simple Python code to merge a list of dictionaries into a condensed list as I have lots of duplicates atm.
From this:
[
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS",
"receipt_category": "BISCUITS"
},
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER",
"receipt_category": "BISCUITS"
},
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS GOLD",
"receipt_category": "BISCUITS GOLD"
},
{
"module": "RECEIPT COFFEE",
"product_range": "BLACK GOLD",
"receipt_category": "BLACK GOLD"
}
]
To this:
[
{
"module": "RECEIPT BISCUITS",
"product_range": ["ULKER BISCUITS", "ULKER"],
"receipt_category": ["BISCUITS", "BISCUITS GOLD"]
},
{
"module": "RECEIPT COFFEE",
"product_range": ["BLACK GOLD"],
"receipt_category": ["BLACK GOLD"]
}
]
Where the module is used to sort between them and the other 2 will be stored as a list even if there's only one value. This is JSON format btw.
CodePudding user response:
collections.defaultdict
to the rescue for your data regrouping needs!
import collections
data = [
{"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS", "receipt_category": "BISCUITS"},
{"module": "RECEIPT BISCUITS", "product_range": "ULKER", "receipt_category": "BISCUITS"},
{"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS GOLD", "receipt_category": "BISCUITS GOLD"},
{"module": "RECEIPT COFFEE", "product_range": "BLACK GOLD", "receipt_category": "BLACK GOLD"},
]
grouped = collections.defaultdict(lambda: collections.defaultdict(list))
group_key = "module"
for datum in data:
datum = datum.copy() # Copy so we can .pop without consequence
group = datum.pop(group_key) # Get the key (`module` value)
for key, value in datum.items(): # Loop over the rest and put them in the group
grouped[group][key].append(value)
collated = [
{
group_key: group,
**values,
}
for (group, values) in grouped.items()
]
print(collated)
prints out
[
{'module': 'RECEIPT BISCUITS', 'product_range': ['ULKER BISCUITS', 'ULKER', 'ULKER BISCUITS GOLD'], 'receipt_category': ['BISCUITS', 'BISCUITS', 'BISCUITS GOLD']},
{'module': 'RECEIPT COFFEE', 'product_range': ['BLACK GOLD'], 'receipt_category': ['BLACK GOLD']}
]
Note that this doesn't deduplicate the values within product_range
, since I wasn't sure whether the order of the values is important for you, and so whether to use sets (which do not retain order).
Changing list
to set
and append
to add
will make the values unique.