Home > Software design >  How to reshape dataframe by creating a grouping multiheader from specific column?
How to reshape dataframe by creating a grouping multiheader from specific column?

Time:09-20

I like to reshape a dataframe thats first column should be used to group the other columns by an additional header row.

Initial dataframe

df = pd.DataFrame(
    {
        'col1':['A','A','A','B','B','B'],
        'col2':[1,2,3,4,5,6],
        'col3':[1,2,3,4,5,6],
        'col4':[1,2,3,4,5,6],
        'colx':[1,2,3,4,5,6]
    }
)

Trial: Using pd.pivot() I can create an example, but this do not fit my expected one, it seems to be flipped in grouping:

df.pivot(columns='col1', values=['col2','col3','col4','colx'])

     col2      col3      col4      colx     
col1    A    B    A    B    A    B    A    B
0     1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
1     2.0  NaN  2.0  NaN  2.0  NaN  2.0  NaN
2     3.0  NaN  3.0  NaN  3.0  NaN  3.0  NaN
3     NaN  4.0  NaN  4.0  NaN  4.0  NaN  4.0
4     NaN  5.0  NaN  5.0  NaN  5.0  NaN  5.0
5     NaN  6.0  NaN  6.0  NaN  6.0  NaN  6.0

Expected output:

        A                               B
col1    col2    col3    col4    colx    col2    col3    col4    colx
0       1       1       1       1       4       4       4       4
1       2       2       2       2       5       5       5       5
2       3       3       3       3       6       6       6       6

CodePudding user response:

Create counter column by GroupBy.cumcount, then use DataFrame.pivot with swapping level of MultiIndex in columns by DataFrame.swaplevel, sorting it and last remove index and columns names by DataFrame.rename_axis:

df = (df.assign(g = df.groupby('col1').cumcount())
        .pivot(index='g', columns='col1')
        .swaplevel(0,1,axis=1)
        .sort_index(axis=1)
        .rename_axis(index=None, columns=[None, None]))

print(df)
     A                   B               
  col2 col3 col4 colx col2 col3 col4 colx
0    1    1    1    1    4    4    4    4
1    2    2    2    2    5    5    5    5
2    3    3    3    3    6    6    6    6

CodePudding user response:

As an alternative to the classical pivot, you can concat the output of groupby with a dictionary comprehension, ensuring alignment with reset_index:

out = pd.concat({k: d.drop(columns='col1').reset_index(drop=True)
                 for k,d in df.groupby('col1')}, axis=1)

output:

     A                   B               
  col2 col3 col4 colx col2 col3 col4 colx
0    1    1    1    1    4    4    4    4
1    2    2    2    2    5    5    5    5
2    3    3    3    3    6    6    6    6
  • Related