I would appreciate your help to aggregate dataset. I have a dataset something link below, and I want to aggregate ids if the other values are the same.
Current Data:
dataset= [
{'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_id' : 'instance A', 'instance_name' : 'name A'},
{'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_id' : 'instance B', 'instance_name' : 'name B'},
{'title' : 'ABC', 'Description' : 'ABC Description', 'instance_id' : 'instance B', 'instance_name' : 'name B'},
{'title' : 'ABC', 'Description' : 'ABC Description', 'instance_id' : 'instance C', 'instance_name' : 'name C'}
]
I want to get instance details for each title and description. Desired result should be as following:
dataset= [
{'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_info': {'instance_id' :['instance A', 'instance B'],'instance_name' : ['name A', 'name B']}},
{'title' : 'ABC', 'Description' : 'ABC Description', 'instance_info': {'instance_id' :['instance B', 'instance C'],'instance_name' : ['name B', 'name C']}}
]
I hope I could explain myself.
I tried to do it with aggregate item module but could not manage to work.
CodePudding user response:
You can do it using with this way;
dataset = [
{'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': 'instance A', 'instance_name': 'name A'},
{'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': 'instance B', 'instance_name': 'name B'},
{'title': 'ABC', 'Description': 'ABC Description', 'instance_id': 'instance B', 'instance_name': 'name B'},
{'title': 'ABC', 'Description': 'ABC Description', 'instance_id': 'instance C', 'instance_name': 'name C'}
]
new_dataset = {}
for row in dataset:
if row['title'] not in new_dataset.keys():
new_dataset[row['title']] = {
'title': row['title'],
'Description': row['Description'],
'instance_id': [row['instance_id']],
'instance_name': [row['instance_name']]
}
else:
new_dataset[row['title']]['instance_id'].append(row['instance_id'])
new_dataset[row['title']]['instance_name'].append(row['instance_name'])
print(list(new_dataset.values()))
output:
# output:
# [{'Description': 'XYZ Description',
# 'instance_id': ['instance A', 'instance B'],
# 'instance_name': ['name A', 'name B'],
# 'title': 'XYZ'},
# {'Description': 'ABC Description',
# 'instance_id': ['instance B', 'instance C'],
# 'instance_name': ['name B', 'name C'],
# 'title': 'ABC'}]
CodePudding user response:
This is a great use case for pandas:
import pandas as pd
pd.DataFrame(dataset).groupby(['title', 'Description'], as_index=False).agg(list).to_dict(orient="records")
Output:
[{'title': 'ABC', 'Description': 'ABC Description', 'instance_id': ['instance B', 'instance C'], 'instance_name': ['name B', 'name C']},
{'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': ['instance A', 'instance B'], 'instance_name': ['name A', 'name B']}]