Home > other >  Flatten a list of dictionaries in python
Flatten a list of dictionaries in python

Time:09-28

I have a list of dictionaries:

data = [{"average": 2, "day": "2022-01-01", "name": "joe", "employee_id": 1},
        {"average": 3, "day": "2022-01-02", "name": "joe", "employee_id": 1},
        {"average": 9, "day": "2022-01-03", "name": "joe", "employee_id": 1},
        {"sum": 13,    "day": "2022-01-01", "name": "joe", "employee_id": 1},
        {"sum": 15,    "day": "2022-01-02", "name": "joe", "employee_id": 1},
        {"sum": 0,     "day": "2022-01-03", "name": "joe", "employee_id": 1},
        {"average": 1, "day": "2022-01-01", "name": "bob", "employee_id": 2},
        {"average": 3, "day": "2022-01-02", "name": "bob", "employee_id": 2},
        {"sum":     9, "day": "2022-01-01", "name": "bob", "employee_id": 2},
        {"sum":     8, "day": "2022-01-02", "name": "bob", "employee_id": 2}]

I want my output as:

output = [{"name": "joe", "employee_id": 1, "day": "2022-01-01", "average": 2, "sum": 13},
          {"name": "joe", "employee_id": 1, "day": "2022-01-02", "average": 3, "sum": 15},
          {"name": "joe", "employee_id": 1, "day": "2022-01-03", "average": 9, "sum": 0},
          {"name": "bob", "employee_id": 2, "day": "2022-01-01", "average": 1, "sum": 9},
          {"name": "bob", "employee_id": 2, "day": "2022-01-02", "average": 3, "sum": 8}]

The goal is that the output values are put together by day, name, and employee_id.

I've tried:

output = {}
for item in data:
    if item["day"] not in output:
        output[item["day"]] = item
    else:
        output[item["day"]].update(item)
print(list(output.values()))

This works in getting the "average" and "sum" and "date" together, but it ends up not including all of the employees and their IDs.

Any help is appreciated

CodePudding user response:

Using collections.defaultdict with dict. Here, take the value of 'day' and 'name' of each dictionary as the key:

>>> from collections import defaultdict
>>> res = defaultdict(dict)
>>> for mp in data:
...     res[mp['day'], mp['name']].update(mp)
...
>>> keys = ('name', 'employee_id', 'day', 'average', 'sum')
>>> [{k: mp[k] for k in keys} for mp in res.values()]
[{'name': 'joe', 'employee_id': 1, 'day': '2022-01-01', 'average': 2, 'sum': 13},
 {'name': 'joe', 'employee_id': 1, 'day': '2022-01-02', 'average': 3, 'sum': 15},
 {'name': 'joe', 'employee_id': 1, 'day': '2022-01-03', 'average': 9, 'sum': 0},
 {'name': 'bob', 'employee_id': 2, 'day': '2022-01-01', 'average': 1, 'sum': 9},
 {'name': 'bob', 'employee_id': 2, 'day': '2022-01-02', 'average': 3, 'sum': 8}]

CodePudding user response:

here is one way to do it

# filter using loc where average is not null and where sum is not null, as two separate frame
# merge the two DF
#finally, output as a dict of orient record

pd.merge(df.loc[df['average'].notna()][[ 'name','day','employee_id','average']],
         df.loc[df['sum'].notna()][['name','day','employee_id','sum']],
         how='outer'
).to_dict(orient= 'records')

[{'name': 'joe',
  'day': '2022-01-01',
  'employee_id': 1,
  'average': 2.0,
  'sum': 13.0},
 {'name': 'joe',
  'day': '2022-01-02',
  'employee_id': 1,
  'average': 3.0,
  'sum': 15.0},
 {'name': 'joe',
  'day': '2022-01-03',
  'employee_id': 1,
  'average': 9.0,
  'sum': 0.0},
 {'name': 'bob',
  'day': '2022-01-01',
  'employee_id': 2,
  'average': 1.0,
  'sum': 9.0},
 {'name': 'bob',
  'day': '2022-01-02',
  'employee_id': 2,
  'average': 3.0,
  'sum': 8.0}]
  • Related