I have a dataframe "df1" that look like this:
company id | company name | dealid_1 | dealyear_1 | dealid_2 | dealyear_2 |
---|---|---|---|---|---|
C1 | ABC | ||||
C2 | DEF |
Where I want to fill the blank cells with data from another dataframe "df2" which looks like this:
deal id | deal year | company id | company name |
---|---|---|---|
D1 | 2010 | C1 | ABC |
D2 | 2015 | C1 | ABC |
D3 | 2012 | C2 | DEF |
D4 | 2017 | C2 | DEF |
So the final result for "df1" should be as follows:
company id | company name | dealid_1 | dealyear_1 | dealid_2 | dealyear_2 |
---|---|---|---|---|---|
C1 | ABC | D1 | 2010 | D2 | 2015 |
C2 | DEF | D3 | 2012 | D4 | 2017 |
Can anyone please help me with this?
Thank you!
CodePudding user response:
You can use:
df3 = (df2.drop(columns='company name')
.assign(col=df2.groupby('company name').cumcount().add(1).astype(str))
.pivot(index='company id', columns='col')
)
df3.columns = df3.columns.map('_'.join)
out = df1[['company id', 'company name']].merge(df3, on='company id')
output:
company id company name deal id_1 deal id_2 deal year_1 deal year_2
0 C1 ABC D1 D2 2010 2015
1 C2 DEF D3 D4 2012 2017
CodePudding user response:
Use GroupBy.cumcount
for counter, pivoting by DataFrame.pivot
with sorting second level of MultiIndex
by DataFrame.sort_index
, last flatten MultiIndex
:
df3 = (df2.assign(g = df2.groupby(['company id','company name']).cumcount())
.pivot(index=['company id','company name'], columns='g')
.sort_index(axis=1, level=1))
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1] 1}')
print (df3.reset_index())
company id company name deal id_1 deal year_1 deal id_2 deal year_2
0 C1 ABC D1 2010 D2 2015
1 C2 DEF D3 2012 D4 2017
For merge with first df
use:
df = df1[['company id', 'company name']].join(df3, on=['company id', 'company name'])