Home > Mobile >  Merge two pandas dataframes with multiple columns per row
Merge two pandas dataframes with multiple columns per row

Time:05-24

I have a dataframe "df1" that look like this:

company id company name dealid_1 dealyear_1 dealid_2 dealyear_2
C1 ABC
C2 DEF

Where I want to fill the blank cells with data from another dataframe "df2" which looks like this:

deal id deal year company id company name
D1 2010 C1 ABC
D2 2015 C1 ABC
D3 2012 C2 DEF
D4 2017 C2 DEF

So the final result for "df1" should be as follows:

company id company name dealid_1 dealyear_1 dealid_2 dealyear_2
C1 ABC D1 2010 D2 2015
C2 DEF D3 2012 D4 2017

Can anyone please help me with this?

Thank you!

CodePudding user response:

You can use:

df3 = (df2.drop(columns='company name')
          .assign(col=df2.groupby('company name').cumcount().add(1).astype(str))
          .pivot(index='company id', columns='col')
       )
df3.columns = df3.columns.map('_'.join)

out = df1[['company id', 'company name']].merge(df3, on='company id')

output:

  company id company name deal id_1 deal id_2  deal year_1  deal year_2
0         C1          ABC        D1        D2         2010         2015
1         C2          DEF        D3        D4         2012         2017

CodePudding user response:

Use GroupBy.cumcount for counter, pivoting by DataFrame.pivot with sorting second level of MultiIndex by DataFrame.sort_index, last flatten MultiIndex:

df3 = (df2.assign(g = df2.groupby(['company id','company name']).cumcount())
         .pivot(index=['company id','company name'], columns='g')
         .sort_index(axis=1, level=1))
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1]   1}')
print (df3.reset_index())
  company id company name deal id_1  deal year_1 deal id_2  deal year_2
0         C1          ABC        D1         2010        D2         2015
1         C2          DEF        D3         2012        D4         2017

For merge with first df use:

df = df1[['company id', 'company name']].join(df3, on=['company id', 'company name'])
  • Related