Home > Back-end >  Python Group and aggregate unidirectionally a list of dictionaries by multiple keys
Python Group and aggregate unidirectionally a list of dictionaries by multiple keys

Time:12-11

Am building a tree selector, I need to structure my data like a tree of grouped items. I have bellow input which is a list of dictionaries.

data = [
        {'region': 'R1', 'group': 'G1', 'category': 'C1', 'item': 'I2'},
        {'region': 'R1', 'group': 'G1', 'category': 'C1', 'item': 'I1'},
        {'region': 'R1', 'group': 'G2', 'category': 'C2', 'item': 'I3'},
        {'region': 'R2', 'group': 'G1', 'category': 'C1', 'item': 'I1'},
        {'region': 'R2', 'group': 'G2', 'category': 'C2', 'item': 'I3'},
        {'region': 'R2', 'group': 'G2', 'category': 'C2', 'item': 'I4'},
        {'region': 'R2', 'group': 'G2', 'category': 'C3', 'item': 'I5'},
    ]

I want to get the following output

result = {
  "regions": [
    {
      "name": "R1",
      "groups": [
        {
          "name": "G1",
          "categories": [
            {"name": "C1","items": [{ "name": "I2"},{"name": "I1"}]}
          ]
        },
        {
          "name": "G2",
          "categories": [
            {"name": "C2", "items": [{"name": "I3"}]}
          ]
        }
      ]
    },
    {
      "name": "R2",
      "groups": [
        {
          "name": "G1",
          "categories": [
            {"name": "C1","items": [{"name": "I1"}]}
          ]
        },
        {
          "name": "G2",
          "categories": [
            {"name": "C2","items": [{"name": "I3"},{"name": "I4"}]},
            {"name": "C3", "items": [{"name": "I5"}]}
          ]
        }
      ]
    }
  ]
}

After some researches I come up with this solution

from collections import OrderedDict

d = OrderedDict()
    for aggr in data:
        d.setdefault(
            key=(aggr['region'], aggr['group'], aggr['category']),
            default=list()
        ).append({"name": aggr['item']})
    d1 = OrderedDict()
    for k, v in d.items():
        d1.setdefault(
            key=(k[0], k[1]),
            default=list()
        ).append({"name": k[2], "items": v})
    d2 = OrderedDict()
    for k, v in d1.items():
        d2.setdefault(
            key=k[0],
            default=list()
        ).append({"name": k[1], "categories": v})
    result = {"regions": [{"name": k, "groups": v} for k, v in d2.items()]}

It's working but I believe it's not the most pythonic solution. I did not manage to simplify it.

Any help to propose another solution or improvement on above codes will be appreciated

CodePudding user response:

As long as the items are sorted, like in your example, you could use groupby from itertools in a recursive function, like:

from itertools import groupby
from operator import itemgetter

def plural(word):
    return f"{word}s" if word[-1] != 'y' else f"{word[:-1]}ies"

def grouping(records, *keys):
    if len(keys) == 1:
        return [{"name": record[keys[0]]} for record in records]
    return [
        {"name": key, plural(keys[1]): grouping(group, *keys[1:])}
        for key, group in groupby(records, itemgetter(keys[0]))
    ]

result = {"regions": grouping(data, "region", "group", "category", "item")}

If the sorting isn't guaranteed, then you could adjust grouping in the following way

def grouping(records, *keys):
    if len(keys) == 1:
        return [{"name": record[keys[0]]} for record in records]
    key_func = itemgetter(keys[0])
    records = sorted(records, key=key_func)
    return [
        {"name": key, plural(keys[1]): grouping(group, *keys[1:])}
        for key, group in groupby(records, key_func)
    ]

or sort the data beforehand

keys = ["region", "group", "category", "item"]
data = sorted(data, key=itemgetter(*keys))
result = {"regions": grouping(data, *keys)}

Result of first version for data as provided in the question:

result = {
   "regions": [
      {
         "name": "R1",
         "groups": [
            {
               "name": "G1",
               "categories": [
                  {"name": "C1", "items": [{"name": "I2"}, {"name": "I1"}]
                  }
               ]
            },
            {
               "name": "G2",
               "categories": [
                  {"name": "C2", "items": [{"name": "I3"}]}
               ]
            }
         ]
      },
      {
         "name": "R2",
         "groups": [
            {
               "name": "G1",
               "categories": [
                  {"name": "C1", "items": [{"name": "I1"}]}
               ]
            },
            {
               "name": "G2",
               "categories": [
                   {"name": "C2", "items": [{"name": "I3"}, {"name": "I4"}]},
                   {"name": "C3", "items": [{"name": "I5"}]}
               ]
            }
         ]
      }
   ]
}
  • Related