Home > Blockchain >  Merging data between two dataframes for multiple columns
Merging data between two dataframes for multiple columns

Time:10-01

I've got two Excel Spreadsheets. The first has multiple columns, of which three (named differently) have common data. My second spreadsheet contains a key along with the data I want to bring into the first spreadsheet. For example:

Table 1:

Target Company ID Subsidiary ID Parent Company ID
1 2 3

Table 2:

Company ID Company Size
1 Small
2 Medium
3 Large

With Excel I would do a series of Vlookups, but Table 1 is huge and is taking a long time to calculate. I would like to generate a Table such as:

Target Company ID Target Size Subsidiary ID Sub Size Parent Company ID Parent Size
1 Small 2 Medium 3 Large

I've tried pd.merge, but it is not as targeted as a vlookup would be in assign a value to the specific column

CodePudding user response:

You can make a mapping of Company ID to Company Size from df2 by .set_index() and specify the column.

Then, use .map() to map (like looking up) the Company ID in df1 to the mapping for each column, as follows:

mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']

for col in df1.columns:
    df1[f'{col.split()[0]} Size'] = df1[col].map(mapping)

Result:

print(df1)

   Target Company ID  Subsidiary ID  Parent Company ID Target Size Subsidiary Size Parent Size
0                  1              2                  3       Small          Medium       Large

In case you have other columns in the dataframes, you can select the related columns by .filter() for all columns with column names end with ID, as follows:

mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']

# select column automatically by `.filter` or enter manually as a list
cols = df1.filter(regex=r'ID$').columns    

for col in cols:
    df1[f'{col.split()[0]} Size'] = df1[col].map(mapping)

In case you want to put the related columns together, you can further use .sort_index(), as follows:

df1 = df1.sort_index(axis=1)

Result:

print(df1)

   Parent Company ID Parent Size  Subsidiary ID Subsidiary Size  Target Company ID Target Size
0                  3       Large              2          Medium                  1       Small

CodePudding user response:

basic you want to marge the tables by left join 3 times each time tack a different column or just treat the 2 table as 3 tables like the following

table_1 = #read_file
table_2 = #read_file
targets = table_2[['Target Company ID','Target Size']]
Subsidiarys = table_2[['Subsidiary Company ID','Sub Size']]
Parents = table_2[['Parent Company ID','Parent Size']]
comb_table = pd.merge(table_1,targets,on=['Target Company ID'],how='left')
comb_table = pd.merge(comb_table ,Subsidiarys ,on=['Subsidiary Company ID'],how='left')
comb_table = pd.merge(comb_table ,Parents , on=['Parent Company ID'],how='left')
  • Related