Merge two pandas dataframes with multiple columns per row-CodePudding

I have a dataframe "df1" that look like this:

company id	company name	dealid_1	dealyear_1	dealid_2	dealyear_2
C1	ABC
C2	DEF

Where I want to fill the blank cells with data from another dataframe "df2" which looks like this:

deal id	deal year	company id	company name
D1	2010	C1	ABC
D2	2015	C1	ABC
D3	2012	C2	DEF
D4	2017	C2	DEF

So the final result for "df1" should be as follows:

company id	company name	dealid_1	dealyear_1	dealid_2	dealyear_2
C1	ABC	D1	2010	D2	2015
C2	DEF	D3	2012	D4	2017

Can anyone please help me with this?

Thank you!

CodePudding user response：

You can use:

df3 = (df2.drop(columns='company name')
          .assign(col=df2.groupby('company name').cumcount().add(1).astype(str))
          .pivot(index='company id', columns='col')
       )
df3.columns = df3.columns.map('_'.join)

out = df1[['company id', 'company name']].merge(df3, on='company id')

output:

  company id company name deal id_1 deal id_2  deal year_1  deal year_2
0         C1          ABC        D1        D2         2010         2015
1         C2          DEF        D3        D4         2012         2017

CodePudding user response：

Use GroupBy.cumcount for counter, pivoting by DataFrame.pivot with sorting second level of MultiIndex by DataFrame.sort_index, last flatten MultiIndex:

df3 = (df2.assign(g = df2.groupby(['company id','company name']).cumcount())
         .pivot(index=['company id','company name'], columns='g')
         .sort_index(axis=1, level=1))
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1]   1}')
print (df3.reset_index())
  company id company name deal id_1  deal year_1 deal id_2  deal year_2
0         C1          ABC        D1         2010        D2         2015
1         C2          DEF        D3         2012        D4         2017

For merge with first df use:

df = df1[['company id', 'company name']].join(df3, on=['company id', 'company name'])