I have a pandas dataframe
df_flat = pd.DataFrame({'dim1': ['a', 'a', 'b', 'b'], 'dim2': ['x', 'y', 'x', 'y'], 'val': [2, 4, 6, 8]})
I want to transform this dataframe, unflatten for want of a better words and transform it to a np ND array such that is looks like:
df_unflatten = pd.DataFrame({'dim1': ['a', 'b'], 'x': [2, 6], 'y': [4, 8]}).set_index('dim1').to_numpy()
I want this method to be flexible, such that if I add another dimension my 'unflattened' dataframe would become a numpy ndarray.
Are there any in built pandas functions that can help me achieve this. I am aware of functions that do the opposite e.g. .flatten() . unstack() etc. but I could not find any which achieve what I desire.
CodePudding user response:
I think the term you're looking for is "unmelt" since to "melt" a DataFrame is to bring it into the form you called df_flat
. In order to achiece said unmelting, you can to as follows:
df = df_flat.set_index(['dim1', 'dim2'])['val'].unstack().reset_index()
# Output:
dim2 dim1 x y
0 a 2 4
1 b 6 8
For the flexible part, you can add more dimensions in the list as parameter for set_index
.
CodePudding user response:
Given your data:
df_flat = pd.DataFrame({
'dim1': ['a', 'a', 'b', 'b'],
'dim2': ['x', 'y', 'x', 'y'],
'val': [2, 4, 6, 8]})
df_unflatten = pd.DataFrame({
'dim1': ['a', 'b'],
'x': [2, 6],
'y': [4, 8]}).set_index('dim1')
Just unstack after setting indicies. Unstack without parameters uses the last multi-index for unstacking.
>>> new_df = df_flat.set_index(['dim1', 'dim2']).unstack()
>>> np.allclose(new_df, df_unflatten)
True