Home > Back-end >  Is there a way to replace columns in dataframes?
Is there a way to replace columns in dataframes?

Time:05-29

*I edited the question a little based on the answers I received!

This is my first data frame. In reality, it has over 30K columns = df

cl_id gained_from_sup lost_to_sup fixed_term
1001 3 5 NaN
1002 1 1 NaN
1003 4 2 12
1004 1 3 6

This is my second dataframe which has the supplier ID - this matches the numbers in gained from Sup and lost to Sup to the corresponding supplier name = df_supp

supplier_id supplier_name
1 KFC
2 Costa
3 Nandos
4 Starbucks
5 Burger King

After merging them using pd.merge and how’left’, left on gained from… right on supp Id..

It outputs the first df with df_supp to the end. It does match the keys correctly but is there a way to replace the iDS in df with the names from df2 since the table will not make sense when 2 Columns (gained_from and lost_from in df1 the supp_id when merged only matches the gained on.

This is my desired output:

cl_id gained_from_sup lost_to_sup fixed_term
1001 Nandos Burger King. NaN
1002 KFC KFC. NaN
1003 Starbucks Costa. 12
1004 KFC Nandos. 6

Or is there a better way to reformat the table to show the corresponding IDs and names for both gained_from_sup and lost_from_sup?

CodePudding user response:

DataFrame.replace

df1.replace(df2.set_index('supplier_id')['supplier_name'])

If you want to perform replacement in only subset of columns:

cols = ['gained_from_sup', 'lost_to_sup']    
df1.assign(**df1[cols].replace(df2.set_index('supplier_id')['supplier_name']))

   cl_id gained_from_sup  lost_to_sup
0   1001          Nandos  Burger King
1   1002             KFC          KFC
2   1003       Starbucks        Costa
3   1004             KFC       Nandos

CodePudding user response:

You can use a map here, creating a dictionary of df_supp:

df["gained_from_sup"] = df["gained_from_sup"].map(df_supp.set_index("supplier_id")["supplier_name"].to_dict())
df["lost_to_sup"] = df["lost_to_sup"].map(df_supp.set_index("supplier_id")["supplier_name"].to_dict())

CodePudding user response:

# setup
import pandas as pd
import numpy as np
df = pd.DataFrame({'cl_id' : [1001, 1002, 1003, 1004], 'gained_from_sup': [3, 1, 4, 1], 'lost_to_sup': [5, 1, 2, np.nan], 'fixed_term': [np.nan, np.nan, 12,6]})
df_supp = pd.DataFrame({'supplier_id': [1, 2, 3, 4, 5], 'supplier_name': ['KFC', 'Costa', 'Nandos', 'Starbucks', 'Burger King']})
# first merge
intermediate = df.merge(df_supp, left_on='gained_from_sup', right_on='supplier_id', how='left').drop(columns=['gained_from_sup', 'supplier_id'])
intermediate = intermediate.rename(columns= {'supplier_name': 'gained_from_sup'})
# second merge
final = intermediate.merge(df_supp, left_on='lost_to_sup', right_on='supplier_id', how='left').drop(columns=['lost_to_sup', 'supplier_id'])
final = final.rename(columns= {'supplier_name': 'lost_from_sup'})
# display result
final

CodePudding user response:

Best way to convert 2nd dataframe to dictionary and then map it.

cols = ['gained_from_sup', 'lost_to_sup']
lookup_name = df2.set_index('supplier_id')['supplier_name'].to_dict()
df1[cols] = df1[cols].replace(lookup_name)

Another method by not selecting columns

lookup_name = df2.set_index('supplier_id')['supplier_name'].to_dict()
df1 = df1.replace(lookup_name)
  • Related