I am not quite sure how to explain the desired steps, but the result I know. I have two data frames;
df1 is a data frame with two columns one is Unique IDs the other is unique descriptions 24 rows in total
id | description_1 |
---|---|
0 | desc a |
1 | desc b |
2 | desc c |
3 | desc d |
4 | desc e |
5 | desc f |
the table goes on
The next data frame is also two columns, the first being different descriptions that need to which the descriptions of df1 need to be matched. The next column consists of the matching IDs from df1 this column has non-unique values and also tuples, 190 rows in total.
description_2 | match |
---|---|
desc aa | 2 |
desc bb | 3 |
desc cc | 0 |
desc dd | (1, 4, 5) |
desc ee | 1 |
desc ff | (0, 3, 5) |
the desired result would be df3 as follows and would also have 190 rows.
description_2 | match | match_description |
---|---|---|
desc aa | 2 | desc c |
desc bb | 3 | desc d |
desc cc | 0 | desc a |
desc dd | (1, 4, 5) | desc b, desc e, desc f |
desc ee | 1 | desc b |
desc ff | (0, 3, 5) | desc a, desc d, desc f |
CodePudding user response:
Use merge
and explode
import ast
df2.assign(match=df2['match'].apply(ast.literal_eval))\
.explode('match', ignore_index=True)\
.merge(d, how='left', left_on= 'match', right_on='id')\
.groupby('description_2').agg({'match': tuple, 'description_1': ', '.join})
match description_1
desc aa (2,) desc c
desc bb (3,) desc d
desc cc (0,) desc a
desc dd (1, 4, 5) desc b, desc e, desc f
desc ee (1,) desc b
desc ff (0, 3, 5) desc a, desc d, desc f
The ast.literal_eval
step is to ensure you have tuples
in your cells in your match
column. This enabled explode
to be used.