Home > database >  Pandas Merge Many Rows of One Dataframe into Fewer Rows of Second Dataframe
Pandas Merge Many Rows of One Dataframe into Fewer Rows of Second Dataframe

Time:11-23

Is there an easy way to merge two dataframes such that df2 adds all its rows with matching 'on' values as new columns in df1? Open to other methods of joining the data as well.

e.g. Matching on Course Offering Code and User Id

df1:
Course Offering Code User Id
001 1
001 2
df2:
Course Offering Code User Id Assignment grade%
001 1 A01 65
001 1 A02 85
001 1 A03 95
001 1 A04 64
001 2 A01 87
001 2 A02 86
001 2 A03 82
001 2 A04 90

I had tried pd.merge(df1, df2, on=['User Id', 'Course Offering Code']) and was hoping for the following:

desired_df

Course Offering Code User Id Assignment_x grade%_x Assignment_y grade%_y Assignment_z grade%_z Assignment_a grade%_a
001 1 A01 65 A02 85 A03 95 A04 64
001 2 A01 87 A02 86 A03 82 A04 90

CodePudding user response:

You can do this one with pandas.DataFrame.pivot :

def flatten_cols(df):
    df.columns = ['_'.join(map(str, x)) for x in df.columns]
    df = df[sorted(df.columns, key=lambda x: int(x.split("_")[-1]))]
    return df

out = (
        df1
          .merge(df2, on=["Course Offering Code", "User Id"], how="left")
          .assign(idx=lambda x: x.groupby(['Course Offering Code', 'User Id']).cumcount() 1)
          .pivot(index= ["Course Offering Code", "User Id"],
                 columns= "idx", values=["Assignment", "grade%"])
          .pipe(flatten_cols)
          .reset_index()
      )

# Output :

print(out.to_string())

  Course Offering Code  User Id Assignment_1 grade%_1 Assignment_2 grade%_2 Assignment_3 grade%_3 Assignment_4 grade%_4
0                  001        1          A01       65          A02       85          A03       95          A04       64
1                  001        2          A01       87          A02       86          A03       82          A04       90
  • Related