Given data frames similar to the following:
df1 = pd.DataFrame({'Customer':['Customer1','Customer2','Customer3'],
'Status':[0,1,1]}
Customer Status
0 Customer1 0
1 Customer2 1
2 Customer3 1
df2 = pd.DataFrame({'Customer':['Customer1','Customer1','Customer1','Customer2','Customer2','Customer3'],
'Call': ['01-01','01-02','01-03','02-01','03-02','06-01']})
Customer Call
0 Customer1 01-01
1 Customer1 01-02
2 Customer1 01-03
3 Customer2 02-01
4 Customer2 03-02
5 Customer3 06-01
What is the most efficient method for me to merge the two into a third data frame in which the rows from df2 become columns added to df1. In the new df each row should be a unique customer and 'Call' from df2 is added as incrementing columns populated by NaN values as required?
I'd like to end up with something like:
Customer Status Call_1 Call_2 Call_3
0 Customer1 0 01-01 01-02 01-03
1 Customer2 1 02-01 03-02 NaN
2 Customer3 1 06-01 NaN NaN
I assume some combination of stack() and merge() is required but can't seem to figure it out.
Help appreciated
CodePudding user response:
Use DataFrame.join
with new DataFrame reshaped by GroupBy.cumcount
and Series.unstack
:
df = df1.join(df2.set_index(['Customer', df2.groupby('Customer').cumcount().add(1)])['Call']
.unstack().add_prefix('Call_'), 'Customer')
print (df)
Customer Status Call_1 Call_2 Call_3
0 Customer1 0 01-01 01-02 01-03
1 Customer2 1 02-01 03-02 NaN
2 Customer3 1 06-01 NaN NaN
CodePudding user response:
First pivot
df2
with a cumcount
de-duplication, then merge
:
out = df1.merge(df2.assign(n=df2.groupby('Customer').cumcount().add(1))
.pivot(index='Customer', columns='n', values='Call')
.add_prefix('Call_'),
left_on='Customer', right_index=True)
Output:
Customer Status Call_1 Call_2 Call_3
0 Customer1 0 01-01 01-02 01-03
1 Customer2 1 02-01 03-02 NaN
2 Customer3 1 06-01 NaN NaN