Group list of dicts by two params and count grouped values-CodePudding

I have list of dicts with id numbers, I need to group it by main_id and second_id and count values in each group. What is the best Python way to reach this?

I'm tried with Pandas, but don't get dict with groups and counts

df = pd.DataFrame(data_list)
df2 = df.groupby('main_id').apply(lambda x: x.set_index('main_id')['second_id']).to_dict()
print(df2)

List looks like:

[
    {
        "main_id":34,
        "second_id":"2149"
    },
    {
        "main_id":82,
        "second_id":"174"
    },
    {
        "main_id":24,
        "second_id":"4QCp"
    },
    {
        "main_id":34,
        "second_id":"2149"
    },
    {
        "main_id":29,
        "second_id":"126905"
    },
    {
        "main_id":34,
        "second_id":"2764"
    },
    {
        "main_id":43,
        "second_id":"16110"
    }
]

I need result like:

[
{
    "main_id":43,
    "second_id":"16110",
    "count": 1
},
{
    "main_id":34,
    "second_id":"2149",
    "count": 2
}
]

CodePudding user response：

You could use collections (from the standard library) instead of pandas. I assigned the list of dicts to xs:

import collections

# create a list of tuples; each is (main_id, secondary_id)
ids = [ (x['main_id'], x['second_id']) for x in xs ]

# count occurrences of each tuple
result = collections.Counter(ids)

Finally, result is a dict, which can be readily converted to the final form (not shown).

Counter({(34, '2149'): 2,
         (82, '174'): 1,
         (24, '4QCp'): 1,
         (29, '126905'): 1,
         (34, '2764'): 1,
         (43, '16110'): 1})

CodePudding user response：

You can use pandas.DataFrame.groupby.size to measure the size of each group and convert it back to a dictionary:

out = list(pd.DataFrame(data_list).groupby(['main_id','second_id']).size().reset_index().rename({0:'count'}, axis=1).T.to_dict().values())

Output:

[{'main_id': 24, 'second_id': '4QCp', 'count': 1},
 {'main_id': 29, 'second_id': '126905', 'count': 1},
 {'main_id': 34, 'second_id': '2149', 'count': 2},
 {'main_id': 34, 'second_id': '2764', 'count': 1},
 {'main_id': 43, 'second_id': '16110', 'count': 1},
 {'main_id': 82, 'second_id': '174', 'count': 1}]