dates = pd.date_range('1/1/2000', periods=8)
df2 = pd.DataFrame(np.random.randn(12,8), index=[np.array(['B1','B1','B1','B1', 'B2','B2','B2','B2','B3','B3','B3','B3']),np.array(['A', 'B', 'C', 'D','A', 'B', 'C', 'D','A', 'B', 'C', 'D'])], columns=dates)
Above an example of a dataframe. I would like to reshape it (long) such that B1,B2,B3 remain an index, and that the dates are also an index and that the Fields (A,B,C,D) are the column. The dataframe would then have a (24,4) structure.
I attempted different methods already (pivot, stack, etc.), but am not able to create the proper structure.
Do you have any ideas on how I could do this correctly ? Thank you very much in advance.
CodePudding user response:
You will need to stack the column then unstack the level for the new column as desire
df2.stack().unstack(level=1)
CodePudding user response:
IIUC this is what you're after.
Firstly I used .stack()
to change the data frame into the below:
B1 A 2000-01-01 0.088655
2000-01-02 1.044443
...
B3 D 2000-01-04 1.181847
2000-01-05 0.342018
Length: 96, dtype: float64
Then I reset the index (.reset_index()
) so I could use .pivot()
to reshape this data into the (24, 4) dataframe as desired, with date and B1, B2 and B3 as the new index and A, B, C and D as columns. Full code below:
df3 = df2.stack().reset_index().pivot(index=['level_0', 'level_2'], columns='level_1', values=0)
Output:
level_1 A B C D
level_0 level_2
B1 2000-01-01 0.088655 0.666326 0.610763 1.185990
2000-01-02 1.044443 -0.956961 1.728994 -0.043546
...
B2 2000-01-01 -0.551395 0.410685 -1.302841 1.244117
2000-01-02 -0.235051 1.307405 -1.482462 1.902654
...
B3 2000-01-01 -0.526457 0.708696 -1.654700 -0.328776
2000-01-02 1.688148 -0.824612 -0.401748 -0.873823
...
NB: If you need to get rid of the level_x labels on the indexes, you can use:
df3 = df3.rename_axis([None, None], axis=0)
df3 = df3.rename_axis(None, axis=1)