About Pandas Dataframe-CodePudding

I have a question related to Pandas.

In df1 I have a data frame with the id of each seller and their respective names.

In df2 I have the id of the salesmen and their respective sales.

I would like to have in the df2, two new columns with the first name and last names of the salesmen.

PS. in df2 one of the sales is shared between two vendors.

import pandas as pd

vendors = {'first_name': ['Montgomery', 'Dagmar', 'Reeba', 'Shalom', 'Broddy', 'Aurelia'],
         'last_name': ['Humes', 'Elstow', 'Wattisham', 'Alen', 'Keningham', 'Brechin'],
         'id_vendor': [127, 241, 329, 333, 212, 233]}

sales = {'id_vendor': [['127'], ['241'], ['329, 333'], ['212'], ['233']],
         'sales': [1233, 25000, 8555, 4333, 3222]}

df1 = pd.DataFrame(vendors)
df2 = pd.DataFrame(sales)

I attach the code. any suggestions?`

Thank you in advance.

CodePudding user response：

You can merge df1 with df2 with the exploded id_vendors column and use DataFrame.GroupBy.agg when grouping by sales to obtain the columns as you want:

transform_names = lambda x: ', '.join(list(x))

res = (df1.merge(df2.explode('id_vendor')).
       groupby('sales').
       agg({'first_name': transform_names, 'last_name': transform_names, 
            'id_vendor': list})
      )

print(res)

          first_name        last_name   id_vendor
sales                                            
1233      Montgomery            Humes       [127]
3222         Aurelia          Brechin       [233]
4333          Broddy        Keningham       [212]
8555   Reeba, Shalom  Wattisham, Alen  [329, 333]
25000         Dagmar           Elstow       [241]

Note:

In your example, id_vendors in df2 is populated by lists of strings, but since id_vendor in df1 is of integer type, I assume that it was a typo. If id_vendors is indeed containing lists of strings, you need to also convert the strings to integers:

transform_names = lambda x: ', '.join(list(x))

# Notice the .astype(int) call.
res = (df1.merge(df2.explode('id_vendor').astype(int)).
       groupby('sales').
       agg({'first_name': transform_names, 'last_name': transform_names, 
            'id_vendor': list})
      )

print(res)