Would like to create a nested list of lists within one table according to a matching key in another table. To illustrate this, I've anonymized the PII data and built a small example of what I'm trying to do.
I'm starting with a table that lists all parent_id's for a given child_id stored as a list within a cell in the parent_id_list column of a Pandas dataframe. For context, parents contain tags that a child will inherit.
Table 1 (child_parent_relations_df)
child_id | parent_id_list |
---|---|
12345 | [100, 101, 102] |
67890 | [102, 108] |
In another table that has had .explode applied to it and been joined to another table that maps parent_id to the tags that they contain (as a list in parent_tag_ids).
Table 2 (parent_id_with_tags_df)
My goal is to create a clean table that has the parent_id kept as a list (as in Table 1) but now to condense the tag_id information in Table 2 into a nested list.
Goal Table
child_id | parent_id_list | inherited_tag_list |
---|---|---|
12345 | [100, 101, 102] | [[99], [67], [43, 47, 48]] |
67890 | [102, 108] | [[43, 47, 48], [31, 32, 33, 34]] |
I've thought about how to do this "from scratch" using loops but it seems really inefficient to have to do this for each child and then for each parent, concatenate the list of tags. Is there an easy way to do this using pandas functions?
CodePudding user response:
You can explode child_parent_relations_df
parent_id_list
column to rows then merge
parent_tag_ids
column from parent_id_with_tags_df
and groupby.agg
child_parent_relations_df = child_parent_relations_df.explode('parent_id_list', ignore_index=True)
out = child_parent_relations_df.merge(parent_id_with_tags_df,
left_on=['child_id', 'parent_id_list'],
right_on=['child_id', 'parent_id'],
how='left')
out = out.groupby(['child_id'], as_index=False).agg({'parent_id_list': list, 'parent_tag_ids': list})