*I edited the question a little based on the answers I received!
This is my first data frame. In reality, it has over 30K columns = df
cl_id | gained_from_sup | lost_to_sup | fixed_term |
---|---|---|---|
1001 | 3 | 5 | NaN |
1002 | 1 | 1 | NaN |
1003 | 4 | 2 | 12 |
1004 | 1 | 3 | 6 |
This is my second dataframe which has the supplier ID - this matches the numbers in gained from Sup and lost to Sup to the corresponding supplier name = df_supp
supplier_id | supplier_name |
---|---|
1 | KFC |
2 | Costa |
3 | Nandos |
4 | Starbucks |
5 | Burger King |
After merging them using pd.merge and how’left’, left on gained from… right on supp Id..
It outputs the first df with df_supp to the end. It does match the keys correctly but is there a way to replace the iDS in df with the names from df2 since the table will not make sense when 2 Columns (gained_from and lost_from in df1 the supp_id when merged only matches the gained on.
This is my desired output:
cl_id | gained_from_sup | lost_to_sup | fixed_term |
---|---|---|---|
1001 | Nandos | Burger King. | NaN |
1002 | KFC | KFC. | NaN |
1003 | Starbucks | Costa. | 12 |
1004 | KFC | Nandos. | 6 |
Or is there a better way to reformat the table to show the corresponding IDs and names for both gained_from_sup and lost_from_sup?
CodePudding user response:
DataFrame.replace
df1.replace(df2.set_index('supplier_id')['supplier_name'])
If you want to perform replacement in only subset of columns:
cols = ['gained_from_sup', 'lost_to_sup']
df1.assign(**df1[cols].replace(df2.set_index('supplier_id')['supplier_name']))
cl_id gained_from_sup lost_to_sup
0 1001 Nandos Burger King
1 1002 KFC KFC
2 1003 Starbucks Costa
3 1004 KFC Nandos
CodePudding user response:
You can use a map
here, creating a dictionary of df_supp
:
df["gained_from_sup"] = df["gained_from_sup"].map(df_supp.set_index("supplier_id")["supplier_name"].to_dict())
df["lost_to_sup"] = df["lost_to_sup"].map(df_supp.set_index("supplier_id")["supplier_name"].to_dict())
CodePudding user response:
# setup
import pandas as pd
import numpy as np
df = pd.DataFrame({'cl_id' : [1001, 1002, 1003, 1004], 'gained_from_sup': [3, 1, 4, 1], 'lost_to_sup': [5, 1, 2, np.nan], 'fixed_term': [np.nan, np.nan, 12,6]})
df_supp = pd.DataFrame({'supplier_id': [1, 2, 3, 4, 5], 'supplier_name': ['KFC', 'Costa', 'Nandos', 'Starbucks', 'Burger King']})
# first merge
intermediate = df.merge(df_supp, left_on='gained_from_sup', right_on='supplier_id', how='left').drop(columns=['gained_from_sup', 'supplier_id'])
intermediate = intermediate.rename(columns= {'supplier_name': 'gained_from_sup'})
# second merge
final = intermediate.merge(df_supp, left_on='lost_to_sup', right_on='supplier_id', how='left').drop(columns=['lost_to_sup', 'supplier_id'])
final = final.rename(columns= {'supplier_name': 'lost_from_sup'})
# display result
final
CodePudding user response:
Best way to convert 2nd dataframe to dictionary and then map it.
cols = ['gained_from_sup', 'lost_to_sup']
lookup_name = df2.set_index('supplier_id')['supplier_name'].to_dict()
df1[cols] = df1[cols].replace(lookup_name)
Another method by not selecting columns
lookup_name = df2.set_index('supplier_id')['supplier_name'].to_dict()
df1 = df1.replace(lookup_name)