Home > Mobile >  How to get 2 data frames matched based on relation between two columns of the two dataframe
How to get 2 data frames matched based on relation between two columns of the two dataframe

Time:02-17

I have two dataframes with data in the form of

Date     Col1 Col2 Col3
1/1/2021  a    b    c
2/1/2021  d    e    f
3/1/2021  g    h    1

Date     Col4 Col5 Col6 
1/1/2021  a    b   c
2/1/2021  d    e   f
3/1/2021  g    h   i

I have a relation that says

Cola Colb
Col1 Col4
Col2 Col5
Col3 Col6

What I am trying to achieve is

Date     Cola Colb 
1/1/2021   a    a
2/1/2021   d    d
3/1/2021   g    g
1/1/2021   b    b
2/1/2021   e    e
3/1/2021   h    h
1/1/2021   c    c
2/1/2021   f    f
3/1/2021   l    l

My original plan was to use unpivot the initial data frame and use merge to get the data, but since the data set is huge, it is not a viable option. Can anyone suggest an efficient way to combine the data frames based on the relation between the columns?

CodePudding user response:

You could convert the mapper_df:

Cola    Colb
0   Col1    Col4
1   Col2    Col5
2   Col3    Col6

to a dictionary and modify the column names of df2. Then stack the DataFrames and join on "Date":

d = mapper_df.set_index('Cola')['Colb'].to_dict()
df2.columns = df2.columns.map({k: d.get(k, k) for k in df2.columns})
out = df1.set_index('Date').stack().to_frame().join(df2.set_index('Date').stack().to_frame(), lsuffix='_Cola', rsuffix='_Colb').droplevel(1)
out = out.rename(columns={c:c.split('_')[1] for c in out.columns})

Output:

         Cola Colb
Date              
1/1/2021    a    a
1/1/2021    b    b
1/1/2021    c    c
2/1/2021    d    d
2/1/2021    e    e
2/1/2021    f    f
3/1/2021    g    g
3/1/2021    h    h
3/1/2021    1    i
  • Related