I have two pandas dataframes and I need to do a join by the external_id
field to get the product_id
and to insert in the first dataframe according to the column external_id
, the detail is that I need to create a tuple, because a column external_id
can have more than one product.
That is, the expected output of this new column should be, for example: (1, 2, 3)
So I'll know that that external_id
has more than one product.
Dataframe 1 - who will receive the join:
id | external_id | collumn1 | collumn2 |
---|---|---|---|
1 | a43505 | Example | 1 |
2 | 11b737 | Example | 1 |
3 | 3 | Example | 1 |
4 | lb22 | Example | 1 |
5 | 2 | Example | 1 |
Dataframe 2 - Dimension:
product_id | external_id | product_name |
---|---|---|
1 | a43505 | Product 1 |
2 | c911d8 | Product 2 |
3 | 11b737 | Product 3 |
4 | a43505 | Product 4 |
5 | 5b1381 | Product 5 |
6 | a43505 | Product 6 |
Expected:
id | external_id | collumn1 | collumn2 | product_id |
---|---|---|---|---|
1 | a43505 | Example | 1 | (1, 4, 6) |
2 | 11b737 | Example | 1 | (3) |
3 | 5b1381 | Example | 1 | (5) |
4 | lb22 | Example | 1 | () |
5 | 2 | Example | 1 | () |
CodePudding user response:
here is one way :
df_grp_dim = df_dim.groupby('external_id')['product_id'].aggregate(list)
out = df.join(df_grp_dim,on='external_id')
output:
>>>
id external_id collumn1 collumn2 product_id
0 1 a43505 Example 1 [1, 4, 6]
1 2 11b737 Example 1 [3]
2 3 3 Example 1 NaN
3 4 lb22 Example 1 NaN
4 5 2 Example 1 NaN
CodePudding user response:
You could use map
:
df1["product_id"] = df1["external_id"].map(df2.groupby("external_id")["product_id"].agg(tuple))
>>> df1
id external_id collumn1 collumn2 product_id
0 1 a43505 Example 1 (1, 4, 6)
1 2 11b737 Example 1 (3,)
2 3 3 Example 1 NaN
3 4 lb22 Example 1 NaN
4 5 2 Example 1 NaN