I want to update dataframe 01 with the help of dataframe 02 and try to achieve the result as shown in the Output table.
Right now I am doing this with the help of Excel by applying vlookup inside vlookup [u can see in attached image on formula bar ] but it's a lengthy and time consuming process so I want to achieve this result using python.
step 01 : I apply vlookup inside dataframe 01 within B3 cell and map the value from dataframe 02 from column 3, and then once I got the value then I again apply the vlookup on the same value and map the value from dataframe 02 form column 2, and pull the employee code and then same. process for renaming columns to pull employee ids.
How I can achieve this using python.enter image description here
|Dataframe 01|
DPNO | UNIT MANAGER'S EMP_ID | Sales Manager's EMP_ID |
---|---|---|
DP - 1183800 | ||
DP - 1183800 | ||
DP - 1143367 | ||
DP - 1171318 | ||
DP - 808400 | ||
DP - 1143367 | ||
DP - 1204783 |
|Dataframe 02|
DP No | Employee Code | Unit Manager's Dp No | Sales Manager's Dp No |
---|---|---|---|
DP - 1080416 | INV879 | ||
DP - 1115172 | INV944 | DP - 1080416 | |
DP - 1851410 | INV1426 | DP - 1080416 | |
DP - 808400 | DP - 1851410 | DP - 1080416 | |
DP - 1143367 | DP - 1115172 | DP - 1080416 | |
DP - 1171318 | DP - 1115172 | DP - 1080416 | |
DP - 1183800 | DP - 1115172 | DP - 1080416 | |
DP - 1204783 | DP - 1115172 | DP - 1080416 |
|I want this type of OUTPUT Result|
DPNO | UNIT MANAGER'S EMP_ID | Sales Manager's EMP_ID |
---|---|---|
DP - 1183800 | INV944 | INV879 |
DP - 1183800 | INV944 | INV879 |
DP - 1143367 | INV944 | INV879 |
DP - 1171318 | INV944 | INV879 |
DP - 808400 | INV1426 | INV879 |
DP - 1143367 | INV944 | INV879 |
DP - 1204783 | INV944 | INV879 |
CodePudding user response:
You can solve this problem by doing a self-merge between dataframe #2 and itself. Specifically, you ask Pandas to match values in the "Unit Manager's Dp No" column with values in the "DP No" column. Once you have this mapping of DP No to managers employee ID, you can merge between dataframe #1 and the mapping.
It could also be done in the other order, but this seemed simpler to me.
Here's what the data looks like when imported into Pandas:
df1:
DPNO UNIT MANAGER'S EMP_ID Sales Manager's EMP_ID
0 DP - 1183800 NaN NaN
1 DP - 1183800 NaN NaN
2 DP - 1143367 NaN NaN
3 DP - 1171318 NaN NaN
4 DP - 808400 NaN NaN
5 DP - 1143367 NaN NaN
6 DP - 1204783 NaN NaN
df2:
DP No Employee Code Unit Manager's Dp No Sales Manager's Dp No
0 DP - 1080416 INV879 NaN NaN
1 DP - 1115172 INV944 NaN DP - 1080416
2 DP - 1851410 INV1426 NaN DP - 1080416
3 DP - 808400 NaN DP - 1851410 DP - 1080416
4 DP - 1143367 NaN DP - 1115172 DP - 1080416
5 DP - 1171318 NaN DP - 1115172 DP - 1080416
6 DP - 1183800 NaN DP - 1115172 DP - 1080416
7 DP - 1204783 NaN DP - 1115172 DP - 1080416
Here's the code which does the lookup:
def lookup_manager(manager_df, output_col_name, type_="unit"):
if type_ == "unit":
lookup_col = "Unit Manager's Dp No"
elif type_ == "sales":
lookup_col = "Sales Manager's Dp No"
else:
raise Exception(f"Unknown manager type {type_}")
manager_df = manager_df.merge(manager_df, how="left", left_on=lookup_col, right_on="DP No")
manager_df = manager_df[["DP No_x", "Employee Code_y"]]
manager_df.columns = ["DPNO", output_col_name]
return manager_df
unit_managers = lookup_manager(df2, "UNIT MANAGER'S EMP_ID", "unit")
sales_managers = lookup_manager(df2, "Sales Manager's EMP_ID", "sales")
df_combined = df1[['DPNO']]
df_combined = df_combined.merge(unit_managers, how="left", left_on="DPNO", right_on="DPNO")
df_combined = df_combined.merge(sales_managers, how="left", left_on="DPNO", right_on="DPNO")
Here's the output:
DPNO UNIT MANAGER'S EMP_ID Sales Manager's EMP_ID
0 DP - 1183800 INV944 INV879
1 DP - 1183800 INV944 INV879
2 DP - 1143367 INV944 INV879
3 DP - 1171318 INV944 INV879
4 DP - 808400 INV1426 INV879
5 DP - 1143367 INV944 INV879
6 DP - 1204783 INV944 INV879