Home > Net >  Rows to multiple columns transformation in pandas
Rows to multiple columns transformation in pandas

Time:04-08

I have a dataframe like this:

A B C
One 12 13
One 10 9
One 3 6
Two 11 7
Two 14 12
Three 12 8
Three 1 9
Three 3 4

I need to transform the dataframe as this:

A B C B2 C2 B3 C3
One 12 13 10 9 3 6
Two 11 7 14 12
Three 12 8 1 9 3 4

I would be very grateful if you could help me.

Thank you

CodePudding user response:

groupby with cumcount

df.set_index(['A', df.groupby('A').cumcount()]).unstack()

          B                C           
          0     1    2     0     1    2
A                                      
One    12.0  10.0  3.0  13.0   9.0  6.0
Three  12.0   1.0  3.0   8.0   9.0  4.0
Two    11.0  14.0  NaN   7.0  12.0  NaN

Nailing down actual desired output

d0 = df.set_index(['A', df.groupby('A').cumcount()]).unstack()
d0.columns = [f'{tup[0]}{"" if tup[1] == 0 else tup[1]}' for tup in d0.columns]
d0.reset_index()

       A     B    B1   B2     C    C1   C2
0    One  12.0  10.0  3.0  13.0   9.0  6.0
1  Three  12.0   1.0  3.0   8.0   9.0  4.0
2    Two  11.0  14.0  NaN   7.0  12.0  NaN

CodePudding user response:

Try with stack and pivot:

stacked = df.set_index("A").stack().reset_index()
stacked["Column"] = stacked["level_1"].add(stacked.groupby(["A","level_1"]).cumcount().add(1).astype(str))
output = stacked.pivot("A","Column",0).rename_axis(None,axis=1)

>>> output
         B1    B2   B3    C1    C2   C3
A                                      
One    12.0  10.0  3.0  13.0   9.0  6.0
Three  12.0   1.0  3.0   8.0   9.0  4.0
Two    11.0  14.0  NaN   7.0  12.0  NaN
Input df:
df = pd.DataFrame({'A': ['One', 'One', 'One', 'Two', 'Two', 'Three', 'Three', 'Three'],
                   'B': [12, 10, 3, 11, 14, 12, 1, 3],
                   'C': [13, 9, 6, 7, 12, 8, 9, 4]})
  • Related