Reshaping Large Dataframe with Multiple index-CodePudding

dates = pd.date_range('1/1/2000', periods=8)
df2 = pd.DataFrame(np.random.randn(12,8), index=[np.array(['B1','B1','B1','B1', 'B2','B2','B2','B2','B3','B3','B3','B3']),np.array(['A', 'B', 'C', 'D','A', 'B', 'C', 'D','A', 'B', 'C', 'D'])], columns=dates)

Above an example of a dataframe. I would like to reshape it (long) such that B1,B2,B3 remain an index, and that the dates are also an index and that the Fields (A,B,C,D) are the column. The dataframe would then have a (24,4) structure.

I attempted different methods already (pivot, stack, etc.), but am not able to create the proper structure.

Do you have any ideas on how I could do this correctly ? Thank you very much in advance.

CodePudding user response：

You will need to stack the column then unstack the level for the new column as desire

df2.stack().unstack(level=1)

CodePudding user response：

IIUC this is what you're after.

Firstly I used .stack() to change the data frame into the below:

B1  A  2000-01-01    0.088655
       2000-01-02    1.044443
       ...
B3  D  2000-01-04    1.181847
       2000-01-05    0.342018

Length: 96, dtype: float64

Then I reset the index (.reset_index()) so I could use .pivot() to reshape this data into the (24, 4) dataframe as desired, with date and B1, B2 and B3 as the new index and A, B, C and D as columns. Full code below:

df3 = df2.stack().reset_index().pivot(index=['level_0', 'level_2'], columns='level_1', values=0)

Output:

level_1                    A         B         C         D
level_0 level_2                                           
B1      2000-01-01  0.088655  0.666326  0.610763  1.185990
        2000-01-02  1.044443 -0.956961  1.728994 -0.043546
        ...
B2      2000-01-01 -0.551395  0.410685 -1.302841  1.244117
        2000-01-02 -0.235051  1.307405 -1.482462  1.902654
        ...
B3      2000-01-01 -0.526457  0.708696 -1.654700 -0.328776
        2000-01-02  1.688148 -0.824612 -0.401748 -0.873823
        ...

NB: If you need to get rid of the level_x labels on the indexes, you can use:

df3 = df3.rename_axis([None, None], axis=0)
df3 = df3.rename_axis(None, axis=1)