I have two dataframes with data in the form of
Date Col1 Col2 Col3
1/1/2021 a b c
2/1/2021 d e f
3/1/2021 g h 1
Date Col4 Col5 Col6
1/1/2021 a b c
2/1/2021 d e f
3/1/2021 g h i
I have a relation that says
Cola Colb
Col1 Col4
Col2 Col5
Col3 Col6
What I am trying to achieve is
Date Cola Colb
1/1/2021 a a
2/1/2021 d d
3/1/2021 g g
1/1/2021 b b
2/1/2021 e e
3/1/2021 h h
1/1/2021 c c
2/1/2021 f f
3/1/2021 l l
My original plan was to use unpivot the initial data frame and use merge to get the data, but since the data set is huge, it is not a viable option. Can anyone suggest an efficient way to combine the data frames based on the relation between the columns?
CodePudding user response:
You could convert the mapper_df
:
Cola Colb
0 Col1 Col4
1 Col2 Col5
2 Col3 Col6
to a dictionary and modify the column names of df2
. Then stack
the DataFrames and join
on "Date":
d = mapper_df.set_index('Cola')['Colb'].to_dict()
df2.columns = df2.columns.map({k: d.get(k, k) for k in df2.columns})
out = df1.set_index('Date').stack().to_frame().join(df2.set_index('Date').stack().to_frame(), lsuffix='_Cola', rsuffix='_Colb').droplevel(1)
out = out.rename(columns={c:c.split('_')[1] for c in out.columns})
Output:
Cola Colb
Date
1/1/2021 a a
1/1/2021 b b
1/1/2021 c c
2/1/2021 d d
2/1/2021 e e
2/1/2021 f f
3/1/2021 g g
3/1/2021 h h
3/1/2021 1 i