I had a dataframe that looked like this:
id | stu1 | stu2 |
---|---|---|
11 | James | Rob |
22 | Bob | Tom |
33 | Ryan | Ian |
44 | Oliver | Saj |
55 | Burt | Ben |
66 | Chris | Rich |
To combine stu1 and stu2 into a single column with each name on a new row (and duplicating id for each name) I used this:
stu_df = stu_df.set_index('id').stack().reset_index(level=1, drop=True).to_frame('stu').reset_index()
Which gave this result:
id | stu |
---|---|
11 | James |
11 | Rob |
22 | Bob |
22 | Tom |
33 | Ryan |
33 | Ian |
44 | Oliver |
44 | Saj |
55 | Burt |
55 | Ben |
66 | Chris |
66 | Rich |
The original dataframe has been updated to include dates and looks like this:
date | id | stu1 | stu2 |
---|---|---|---|
03/06/2022 | 11 | James | Rob |
03/06/2022 | 22 | Bob | Tom |
03/06/2022 | 33 | Ryan | Ian |
03/06/2022 | 44 | Oliver | Saj |
03/06/2022 | 55 | Burt | Ben |
03/06/2022 | 66 | Chris | Rich |
I want to combine stu1 and stu2 in the same way (duplicating the date and id for each name) but can't get it to work.
The desired outcome is this:
date | id | stu |
---|---|---|
03/06/2022 | 11 | James |
03/06/2022 | 11 | Rob |
03/06/2022 | 22 | Bob |
03/06/2022 | 22 | Tom |
03/06/2022 | 33 | Ryan |
03/06/2022 | 33 | Ian |
03/06/2022 | 44 | Oliver |
03/06/2022 | 44 | Saj |
03/06/2022 | 55 | Burt |
03/06/2022 | 55 | Ben |
03/06/2022 | 66 | Chris |
03/06/2022 | 66 | Rich |
I'm new to Pandas and am struggling. Can anyone help? Thanks
CodePudding user response:
I would recommend using panda's melt function as follows :
import pandas as pd
# a sample of your dataframe
df = pd.DataFrame({
'date' : ['03/06/2022', '03/06/2022', '03/06/2022'],
'id' : [11,22,33],
'stu1' : ['James', 'Bob', 'Ryan' ],
'stu2' : ['James', 'Rob', 'Tom' ]})
# melt the dataframe
df_melted = df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'])
print(df_melted)
>>>
date id variable value
0 03/06/2022 11 stu1 James
1 03/06/2022 22 stu1 Bob
2 03/06/2022 33 stu1 Ryan
3 03/06/2022 11 stu2 James
4 03/06/2022 22 stu2 Rob
5 03/06/2022 33 stu2 Tom
If you do not care at all about the column variable that helps identify the origin of the name (from which column it came) then you can simply drop it.
You can also customize the name of the output variables (instead of variable and value). Here's an example to get your exact desired output :
df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'], value_name='stu').drop(['variable'], axis = 1)
Take a look at the original documentation pandas.melt for more info!