Home > Enterprise >  Pandas. merge/join/concat. Rows into columns
Pandas. merge/join/concat. Rows into columns

Time:02-04

Given data frames similar to the following:

df1 = pd.DataFrame({'Customer':['Customer1','Customer2','Customer3'],
             'Status':[0,1,1]}

Customer        Status
0   Customer1     0
1   Customer2     1
2   Customer3     1

df2 = pd.DataFrame({'Customer':['Customer1','Customer1','Customer1','Customer2','Customer2','Customer3'],
             'Call': ['01-01','01-02','01-03','02-01','03-02','06-01']})

    Customer    Call
0   Customer1   01-01
1   Customer1   01-02
2   Customer1   01-03
3   Customer2   02-01
4   Customer2   03-02
5   Customer3   06-01

What is the most efficient method for me to merge the two into a third data frame in which the rows from df2 become columns added to df1. In the new df each row should be a unique customer and 'Call' from df2 is added as incrementing columns populated by NaN values as required?

I'd like to end up with something like:

    Customer    Status  Call_1  Call_2  Call_3
0   Customer1   0       01-01   01-02   01-03
1   Customer2   1       02-01   03-02   NaN
2   Customer3   1       06-01   NaN     NaN

I assume some combination of stack() and merge() is required but can't seem to figure it out.

Help appreciated

CodePudding user response:

Use DataFrame.join with new DataFrame reshaped by GroupBy.cumcount and Series.unstack:

df = df1.join(df2.set_index(['Customer', df2.groupby('Customer').cumcount().add(1)])['Call']
                 .unstack().add_prefix('Call_'), 'Customer')
print (df)
    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN

CodePudding user response:

First pivot df2 with a cumcount de-duplication, then merge:

out = df1.merge(df2.assign(n=df2.groupby('Customer').cumcount().add(1))
                   .pivot(index='Customer', columns='n', values='Call')
                   .add_prefix('Call_'),
                left_on='Customer', right_index=True)

Output:

    Customer  Status Call_1 Call_2 Call_3
0  Customer1       0  01-01  01-02  01-03
1  Customer2       1  02-01  03-02    NaN
2  Customer3       1  06-01    NaN    NaN
  • Related