Is there an easy way to merge two dataframes such that df2 adds all its rows with matching 'on' values as new columns in df1? Open to other methods of joining the data as well.
e.g. Matching on Course Offering Code and User Id
df1:
Course Offering Code | User Id |
---|---|
001 | 1 |
001 | 2 |
df2:
Course Offering Code | User Id | Assignment | grade% |
---|---|---|---|
001 | 1 | A01 | 65 |
001 | 1 | A02 | 85 |
001 | 1 | A03 | 95 |
001 | 1 | A04 | 64 |
001 | 2 | A01 | 87 |
001 | 2 | A02 | 86 |
001 | 2 | A03 | 82 |
001 | 2 | A04 | 90 |
I had tried pd.merge(df1, df2, on=['User Id', 'Course Offering Code'])
and was hoping for the following:
desired_df
Course Offering Code | User Id | Assignment_x | grade%_x | Assignment_y | grade%_y | Assignment_z | grade%_z | Assignment_a | grade%_a |
---|---|---|---|---|---|---|---|---|---|
001 | 1 | A01 | 65 | A02 | 85 | A03 | 95 | A04 | 64 |
001 | 2 | A01 | 87 | A02 | 86 | A03 | 82 | A04 | 90 |
CodePudding user response:
You can do this one with pandas.DataFrame.pivot
:
def flatten_cols(df):
df.columns = ['_'.join(map(str, x)) for x in df.columns]
df = df[sorted(df.columns, key=lambda x: int(x.split("_")[-1]))]
return df
out = (
df1
.merge(df2, on=["Course Offering Code", "User Id"], how="left")
.assign(idx=lambda x: x.groupby(['Course Offering Code', 'User Id']).cumcount() 1)
.pivot(index= ["Course Offering Code", "User Id"],
columns= "idx", values=["Assignment", "grade%"])
.pipe(flatten_cols)
.reset_index()
)
# Output :
print(out.to_string())
Course Offering Code User Id Assignment_1 grade%_1 Assignment_2 grade%_2 Assignment_3 grade%_3 Assignment_4 grade%_4
0 001 1 A01 65 A02 85 A03 95 A04 64
1 001 2 A01 87 A02 86 A03 82 A04 90