Home > Net >  Pandas GroupBy to Aggregate Values into Dictionary with Keys from one column and Values based on ano
Pandas GroupBy to Aggregate Values into Dictionary with Keys from one column and Values based on ano

Time:07-26

I'm trying to create an "aggregated" dictionary/JSON-like object inside a cell of a DataFrame where the keys are the values according to another column (for a particular group). Will illustrate this with an example below.

To begin, I'm starting with an table I've already called .explode on.

Starting Table: start_df

child_id parent_id parent_name tag_id tag_name
12345 100 'Zack' 99 'Builder'
12345 101 'Cody' 67 'Plumber'
12345 102 'Jill' 43 'Doctor'
12345 102 'Jill' 47 'Harvard'
67890 108 'Emily' 31 'Artist'
67890 102 'Jill' 43 'Doctor'
67890 102 'Jill' 47 'Harvard'

For context, parents contain tags (properties) that a child will inherit. My goal is to create

Goal Table: output_df

child_id parent_id_list parent_desc_dict full_tag_id_list tag_ancestry
12345 [100, 101, 102] {100: 'Zack', 101: 'Cody', 102: 'Jill' } [99, 67, 43, 47] {100: [99], 101: [67], 102: [43, 47] }
67890 [102, 108] {102: 'Jill', 108: 'Emily' } [43, 47, 31] {102: [43, 47], 108: [31] }

For the lists parent_id_list and full_tag_id_list, I learned from here that we can do something like

start_df.groupby([collection_id], as_index = False).agg({'parent_id': list, 'tag_id': list})

...but how do we generate the dictionaries parent_desc_dict and tag_ancestry? Simply replacing list with dict won't do the job since that takes the row number as the dictionary's Key instead of the parent_id. I figured the way to do this is with lambda functions somehow but not sure how to integrate it with the .agg.

CodePudding user response:

Let us use custom agg func with groupby apply:

def agg(g):
    return pd.Series({
        'parent_id_list': [*g['parent_id'].unique()],
        'parent_desc_dict': dict(zip(g['parent_id'], g['parent_name'])),
        'full_tag_id_list': [*g['tag_id']],
        'tag_ancestry': g.groupby('parent_id')['tag_id'].agg(list).to_dict()
    })


df.groupby('child_id').apply(agg)

           parent_id_list                        parent_desc_dict  full_tag_id_list                           tag_ancestry
child_id                                                                                                                         
12345     [100, 101, 102]  {100: 'Zack', 101: 'Cody', 102: 'Jill'}  [99, 67, 43, 47]  {100: [99], 101: [67], 102: [43, 47]}
67890          [108, 102]              {108: 'Emily', 102: 'Jill'}      [31, 43, 47]             {102: [43, 47], 108: [31]}
  • Related