Home > Blockchain >  Converting the dataframe from long to wide using Pandas
Converting the dataframe from long to wide using Pandas

Time:04-15

my sample df looks like this

student_id,      year,       grade,    school   
1001           20002001        1         abc  
1001           20012002        2         abc 
1001           20022003        3         abc 
1001           20032004        4         abc  

1002           19992000        2         abc    
1002           20002001        3         abc         
1002           20012002        4         abc         

1003           20022003        4         abc          

1004           20022003        3         abc         
1004           20032004        4         abc         

this df is currently in long format. what I want to do is transform the data into a wide format, which should look like this:

student_id,    grade_19992000,    grade_20002001,    grade_20012002,    grade_20022003,    grade_20032004,    school_19992000,   school_19992000,   school_20012002,    school_20022003,    school_20032004     
1001                NaN                  1                  2                  3              4                       NaN                 abc                 abc                 abc                 abc
1002                2                    3                  4                 NaN            NaN                     abc                 abc                 abc                 NaN                 NaN 
1003                NaN                 NaN                NaN                 4             NaN                     NaN                 NaN                 NaN                  abc                    NaN 
1004                NaN                 NaN                NaN                 3             4                     NaN                 NaN                 abc                  abc                    NaN 

this is not duplicate nor related to other questions. I tried all available options.

What did I do?

  • I tried to use df.pivot(index='', columns'', values='') but it did not format the data as I wanted
  • for example: for student_id with the same year, it created a new columns with same column name for the other student_id and so on...

Could someone please provide some insight on how I can achieve my desired output?

CodePudding user response:

You can pivot; only you need 2 columns as values:

out = df.pivot('student_id', 'year', ['grade', 'school'])
out.columns = [f'{x}_{y}' for x,y in out.columns]

Output:

           grade_19992000 grade_20002001 grade_20012002 grade_20022003  \
student_id                                                               
1001                  NaN              1              2              3   
1002                    2              3              4            NaN   
1003                  NaN            NaN            NaN              4   
1004                  NaN            NaN            NaN              3   

           grade_20032004 school_19992000 school_20002001 school_20012002  \
student_id                                                                  
1001                    4             NaN             abc             abc   
1002                  NaN             abc             abc             abc   
1003                  NaN             NaN             NaN             NaN   
1004                    4             NaN             NaN             NaN   

           school_20022003 school_20032004  
student_id                                  
1001                   abc             abc  
1002                   NaN             NaN  
1003                   abc             NaN  
1004                   abc             abc  
  • Related