Home > Software engineering >  how to apply Vlookup inside Vlookup formula using Python
how to apply Vlookup inside Vlookup formula using Python

Time:08-02

I want to update dataframe 01 with the help of dataframe 02 and try to achieve the result as shown in the Output table.

Right now I am doing this with the help of Excel by applying vlookup inside vlookup [u can see in attached image on formula bar ] but it's a lengthy and time consuming process so I want to achieve this result using python.

step 01 : I apply vlookup inside dataframe 01 within B3 cell and map the value from dataframe 02 from column 3, and then once I got the value then I again apply the vlookup on the same value and map the value from dataframe 02 form column 2, and pull the employee code and then same. process for renaming columns to pull employee ids.

How I can achieve this using python.enter image description here

|Dataframe 01|

DPNO UNIT MANAGER'S EMP_ID Sales Manager's EMP_ID
DP - 1183800
DP - 1183800
DP - 1143367
DP - 1171318
DP - 808400
DP - 1143367
DP - 1204783

|Dataframe 02|

DP No Employee Code Unit Manager's Dp No Sales Manager's Dp No
DP - 1080416 INV879
DP - 1115172 INV944 DP - 1080416
DP - 1851410 INV1426 DP - 1080416
DP - 808400 DP - 1851410 DP - 1080416
DP - 1143367 DP - 1115172 DP - 1080416
DP - 1171318 DP - 1115172 DP - 1080416
DP - 1183800 DP - 1115172 DP - 1080416
DP - 1204783 DP - 1115172 DP - 1080416

|I want this type of OUTPUT Result|

DPNO UNIT MANAGER'S EMP_ID Sales Manager's EMP_ID
DP - 1183800 INV944 INV879
DP - 1183800 INV944 INV879
DP - 1143367 INV944 INV879
DP - 1171318 INV944 INV879
DP - 808400 INV1426 INV879
DP - 1143367 INV944 INV879
DP - 1204783 INV944 INV879

CodePudding user response:

You can solve this problem by doing a self-merge between dataframe #2 and itself. Specifically, you ask Pandas to match values in the "Unit Manager's Dp No" column with values in the "DP No" column. Once you have this mapping of DP No to managers employee ID, you can merge between dataframe #1 and the mapping.

It could also be done in the other order, but this seemed simpler to me.

Here's what the data looks like when imported into Pandas:

df1:

           DPNO  UNIT MANAGER'S EMP_ID  Sales Manager's EMP_ID
0  DP - 1183800                    NaN                     NaN
1  DP - 1183800                    NaN                     NaN
2  DP - 1143367                    NaN                     NaN
3  DP - 1171318                    NaN                     NaN
4   DP - 808400                    NaN                     NaN
5  DP - 1143367                    NaN                     NaN
6  DP - 1204783                    NaN                     NaN

df2:

          DP No Employee Code Unit Manager's Dp No Sales Manager's Dp No
0  DP - 1080416        INV879                  NaN                   NaN
1  DP - 1115172        INV944                  NaN          DP - 1080416
2  DP - 1851410       INV1426                  NaN          DP - 1080416
3   DP - 808400           NaN         DP - 1851410          DP - 1080416
4  DP - 1143367           NaN         DP - 1115172          DP - 1080416
5  DP - 1171318           NaN         DP - 1115172          DP - 1080416
6  DP - 1183800           NaN         DP - 1115172          DP - 1080416
7  DP - 1204783           NaN         DP - 1115172          DP - 1080416

Here's the code which does the lookup:

def lookup_manager(manager_df, output_col_name, type_="unit"):
    if type_ == "unit":
        lookup_col = "Unit Manager's Dp No"
    elif type_ == "sales":
        lookup_col = "Sales Manager's Dp No"
    else:
        raise Exception(f"Unknown manager type {type_}")
    manager_df = manager_df.merge(manager_df, how="left", left_on=lookup_col, right_on="DP No")
    manager_df = manager_df[["DP No_x", "Employee Code_y"]]
    manager_df.columns = ["DPNO", output_col_name]
    return manager_df

unit_managers = lookup_manager(df2, "UNIT MANAGER'S EMP_ID", "unit")
sales_managers = lookup_manager(df2, "Sales Manager's EMP_ID", "sales")
df_combined = df1[['DPNO']]
df_combined = df_combined.merge(unit_managers, how="left", left_on="DPNO", right_on="DPNO")
df_combined = df_combined.merge(sales_managers, how="left", left_on="DPNO", right_on="DPNO")

Here's the output:

           DPNO UNIT MANAGER'S EMP_ID Sales Manager's EMP_ID
0  DP - 1183800                INV944                 INV879
1  DP - 1183800                INV944                 INV879
2  DP - 1143367                INV944                 INV879
3  DP - 1171318                INV944                 INV879
4   DP - 808400               INV1426                 INV879
5  DP - 1143367                INV944                 INV879
6  DP - 1204783                INV944                 INV879
  • Related