I have 2 tables from google sheets. df1 where it has the users completion of different modules. And df2 that consists of user details.
df1:
df2:
I want to merge tables just like the image below. Currently, I am able to achieve this using array vlookup. Sometimes I also use index match. But it takes forever because in reality, df1 has atleast 50000 rows, that is 40 modules for roughly 5000 users. I am new to python and pandas.
output:
Thank you in advance!
CodePudding user response:
Use pivot
to reshape df1, then merge
:
df2.merge(df1.pivot(index='User Id', columns='Module', values='Status'), on='User Id')
CodePudding user response:
You can use set_index
and stack
do to:
# sample data
df1=pd.DataFrame({'UserId': [1,1,2,2],
'Module': ['Lesson1', 'Lesson2', 'Lesson1', 'Lesson2'],
'Status': ['Completed', 'InProgress', 'Completed', 'Completed']})
df2=pd.DataFrame({'UserId': [1,2],
'Name': ['John Smith', 'Jane']})
# merge and use unstack to reshape the data
df1=df1.merge(df2, on='UserId', how='left')
df1 = df1.set_index(['UserId', 'Name', 'Module']).unstack().reset_index()
# fix column names
df1.columns = df1.columns.get_level_values(0).tolist()[:2] df1.columns.get_level_values(1).tolist()[2:]
print(df1)
UserId Name Lesson1 Lesson2
0 1 John Smith Completed InProgress
1 2 Jane Completed Completed