I've got two Excel Spreadsheets. The first has multiple columns, of which three (named differently) have common data. My second spreadsheet contains a key along with the data I want to bring into the first spreadsheet. For example:
Table 1:
Target Company ID | Subsidiary ID | Parent Company ID |
---|---|---|
1 | 2 | 3 |
Table 2:
Company ID | Company Size |
---|---|
1 | Small |
2 | Medium |
3 | Large |
With Excel I would do a series of Vlookups, but Table 1 is huge and is taking a long time to calculate. I would like to generate a Table such as:
Target Company ID | Target Size | Subsidiary ID | Sub Size | Parent Company ID | Parent Size |
---|---|---|---|---|---|
1 | Small | 2 | Medium | 3 | Large |
I've tried pd.merge
, but it is not as targeted as a vlookup
would be in assign a value to the specific column
CodePudding user response:
You can make a mapping of Company ID
to Company Size
from df2
by .set_index()
and specify the column.
Then,
use .map()
to map (like looking up) the Company ID
in df1
to the mapping for each column, as follows:
mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']
for col in df1.columns:
df1[f'{col.split()[0]} Size'] = df1[col].map(mapping)
Result:
print(df1)
Target Company ID Subsidiary ID Parent Company ID Target Size Subsidiary Size Parent Size
0 1 2 3 Small Medium Large
In case you have other columns in the dataframes, you can select the related columns by .filter()
for all columns with column names end with ID
, as follows:
mapping = df2.drop_duplicates('Company ID').set_index('Company ID')['Company Size']
# select column automatically by `.filter` or enter manually as a list
cols = df1.filter(regex=r'ID$').columns
for col in cols:
df1[f'{col.split()[0]} Size'] = df1[col].map(mapping)
In case you want to put the related columns together, you can further use .sort_index()
, as follows:
df1 = df1.sort_index(axis=1)
Result:
print(df1)
Parent Company ID Parent Size Subsidiary ID Subsidiary Size Target Company ID Target Size
0 3 Large 2 Medium 1 Small
CodePudding user response:
basic you want to marge the tables by left join 3 times each time tack a different column or just treat the 2 table as 3 tables like the following
table_1 = #read_file
table_2 = #read_file
targets = table_2[['Target Company ID','Target Size']]
Subsidiarys = table_2[['Subsidiary Company ID','Sub Size']]
Parents = table_2[['Parent Company ID','Parent Size']]
comb_table = pd.merge(table_1,targets,on=['Target Company ID'],how='left')
comb_table = pd.merge(comb_table ,Subsidiarys ,on=['Subsidiary Company ID'],how='left')
comb_table = pd.merge(comb_table ,Parents , on=['Parent Company ID'],how='left')