my sample df
looks like this
student_id, year, grade, school
1001 20002001 1 abc
1001 20012002 2 abc
1001 20022003 3 abc
1001 20032004 4 abc
1002 19992000 2 abc
1002 20002001 3 abc
1002 20012002 4 abc
1003 20022003 4 abc
1004 20022003 3 abc
1004 20032004 4 abc
this df
is currently in long
format. what I want to do is transform the data into a wide
format, which should look like this:
student_id, grade_19992000, grade_20002001, grade_20012002, grade_20022003, grade_20032004, school_19992000, school_19992000, school_20012002, school_20022003, school_20032004
1001 NaN 1 2 3 4 NaN abc abc abc abc
1002 2 3 4 NaN NaN abc abc abc NaN NaN
1003 NaN NaN NaN 4 NaN NaN NaN NaN abc NaN
1004 NaN NaN NaN 3 4 NaN NaN abc abc NaN
this is not duplicate nor related to other questions. I tried all available options.
What did I do?
- I tried to use
df.pivot(index='', columns'', values='')
but it did not format the data as I wanted - for example: for
student_id
with the sameyear
, it created a new columns with same column name for the otherstudent_id
and so on...
Could someone please provide some insight on how I can achieve my desired output?
CodePudding user response:
You can pivot
; only you need 2 columns as values:
out = df.pivot('student_id', 'year', ['grade', 'school'])
out.columns = [f'{x}_{y}' for x,y in out.columns]
Output:
grade_19992000 grade_20002001 grade_20012002 grade_20022003 \
student_id
1001 NaN 1 2 3
1002 2 3 4 NaN
1003 NaN NaN NaN 4
1004 NaN NaN NaN 3
grade_20032004 school_19992000 school_20002001 school_20012002 \
student_id
1001 4 NaN abc abc
1002 NaN abc abc abc
1003 NaN NaN NaN NaN
1004 4 NaN NaN NaN
school_20022003 school_20032004
student_id
1001 abc abc
1002 NaN NaN
1003 abc NaN
1004 abc abc