Home > Software engineering >  Join with Concact to Create a Tuple Column in Pandas
Join with Concact to Create a Tuple Column in Pandas

Time:05-02

I have two pandas dataframes and I need to do a join by the external_id field to get the product_id and to insert in the first dataframe according to the column external_id, the detail is that I need to create a tuple, because a column external_id can have more than one product.

That is, the expected output of this new column should be, for example: (1, 2, 3)

So I'll know that that external_id has more than one product.

Dataframe 1 - who will receive the join:

id external_id collumn1 collumn2
1 a43505 Example 1
2 11b737 Example 1
3 3 Example 1
4 lb22 Example 1
5 2 Example 1

Dataframe 2 - Dimension:

product_id external_id product_name
1 a43505 Product 1
2 c911d8 Product 2
3 11b737 Product 3
4 a43505 Product 4
5 5b1381 Product 5
6 a43505 Product 6

Expected:

id external_id collumn1 collumn2 product_id
1 a43505 Example 1 (1, 4, 6)
2 11b737 Example 1 (3)
3 5b1381 Example 1 (5)
4 lb22 Example 1 ()
5 2 Example 1 ()

CodePudding user response:

here is one way :

df_grp_dim = df_dim.groupby('external_id')['product_id'].aggregate(list)
out = df.join(df_grp_dim,on='external_id')

output:

>>>
   id external_id collumn1  collumn2 product_id
0   1      a43505  Example         1  [1, 4, 6]
1   2      11b737  Example         1        [3]
2   3           3  Example         1        NaN
3   4        lb22  Example         1        NaN
4   5           2  Example         1        NaN

CodePudding user response:

You could use map:

df1["product_id"] = df1["external_id"].map(df2.groupby("external_id")["product_id"].agg(tuple))

>>> df1
   id external_id collumn1  collumn2 product_id
0   1      a43505  Example         1  (1, 4, 6)
1   2      11b737  Example         1       (3,)
2   3           3  Example         1        NaN
3   4        lb22  Example         1        NaN
4   5           2  Example         1        NaN
  • Related