Suppose I have the following pandas dataframe X
in long format:
rank group ID
1 1 3
2 1 1
3 1 2
4 1 4
1 2 2
2 2 1
1 3 1
2 3 4
3 3 3
4 3 2
1 4 1
1 5 3
2 5 2
3 5 1
1 6 1
And I would like to reshape it to the following wide format according to the following rules:
- split the
ID
column into 4 columnsn1,n2,n3,n4
representing the 4 elements (person) in theID
column. - for column
ni
, i=1,2,3,4, the entry in thej
-th row is 5 minus the ranking ofi
-the person in thej
-th group. For example, in group 3, person 4 gets rank 2, hence the 3rd row of then4
column is 5-2=3.
If personi
doesn't exist in groupj
, then the j-th entry in columnni
isNA
.
So basically I want to create a "score system" for person i according to the ranking: the person who is ranked 1 gets the highest score and the person who is ranked 4 gets the lowest score (or NA if that no there aren't that many people in the group).
i.e.:
group n1 n2 n3 n4
1 3 2 4 1
2 3 4 NA NA
3 4 1 2 3
4 4 NA NA NA
5 2 3 4 NA
6 4 NA NA NA
I hope I have explained it in an understandable manner. Thank you.
CodePudding user response:
Reshape the dataframe using pivot
then subtract 5
from all the values and add prefix of n
to column names:
df.pivot('group', 'ID', 'rank').rsub(5).add_prefix('n')
ID n1 n2 n3 n4
group
1 3.0 2.0 4.0 1.0
2 3.0 4.0 NaN NaN
3 4.0 1.0 2.0 3.0
4 4.0 NaN NaN NaN
5 2.0 3.0 4.0 NaN
6 4.0 NaN NaN NaN