Home > OS >  Combing Two Columns into Single Column on New Rows in Pandas Dataframe
Combing Two Columns into Single Column on New Rows in Pandas Dataframe

Time:06-03

I had a dataframe that looked like this:

id stu1 stu2
11 James Rob
22 Bob Tom
33 Ryan Ian
44 Oliver Saj
55 Burt Ben
66 Chris Rich

To combine stu1 and stu2 into a single column with each name on a new row (and duplicating id for each name) I used this:

stu_df = stu_df.set_index('id').stack().reset_index(level=1, drop=True).to_frame('stu').reset_index()

Which gave this result:

id stu
11 James
11 Rob
22 Bob
22 Tom
33 Ryan
33 Ian
44 Oliver
44 Saj
55 Burt
55 Ben
66 Chris
66 Rich

The original dataframe has been updated to include dates and looks like this:

date id stu1 stu2
03/06/2022 11 James Rob
03/06/2022 22 Bob Tom
03/06/2022 33 Ryan Ian
03/06/2022 44 Oliver Saj
03/06/2022 55 Burt Ben
03/06/2022 66 Chris Rich

I want to combine stu1 and stu2 in the same way (duplicating the date and id for each name) but can't get it to work.

The desired outcome is this:

date id stu
03/06/2022 11 James
03/06/2022 11 Rob
03/06/2022 22 Bob
03/06/2022 22 Tom
03/06/2022 33 Ryan
03/06/2022 33 Ian
03/06/2022 44 Oliver
03/06/2022 44 Saj
03/06/2022 55 Burt
03/06/2022 55 Ben
03/06/2022 66 Chris
03/06/2022 66 Rich

I'm new to Pandas and am struggling. Can anyone help? Thanks

CodePudding user response:

I would recommend using panda's melt function as follows :

import pandas as pd

# a sample of your dataframe
df = pd.DataFrame({
    'date' : ['03/06/2022', '03/06/2022', '03/06/2022'],
    'id' : [11,22,33], 
    'stu1' : ['James', 'Bob', 'Ryan' ], 
    'stu2' : ['James', 'Rob', 'Tom' ]})

# melt the dataframe
df_melted = df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'])

print(df_melted)

>>>
       date     id  variable    value
0   03/06/2022  11  stu1        James
1   03/06/2022  22  stu1         Bob
2   03/06/2022  33  stu1         Ryan
3   03/06/2022  11  stu2        James
4   03/06/2022  22  stu2         Rob
5   03/06/2022  33  stu2         Tom

If you do not care at all about the column variable that helps identify the origin of the name (from which column it came) then you can simply drop it.

You can also customize the name of the output variables (instead of variable and value). Here's an example to get your exact desired output :

df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'], value_name='stu').drop(['variable'], axis = 1)

Take a look at the original documentation pandas.melt for more info!

  • Related